Position title
SRE Observability Engineer – Remote (Splunk/Datadog)
Description

Job Summary

IT sight Technologies is seeking a highly skilled and proactive SRE Observability Engineer to strengthen our reliability engineering and monitoring capabilities. This fully remote role focuses on designing, implementing, and optimizing observability solutions using Splunk and Datadog across complex distributed systems. The ideal candidate will combine strong Site Reliability Engineering (SRE) practices with deep monitoring expertise to ensure high availability, performance, and rapid incident response across our production environments.

You will work closely with DevOps, Platform Engineering, and Application teams to build scalable telemetry pipelines, actionable dashboards, and automated alerting strategies that improve system resilience and customer experience.

Key Responsibilities

  • Design, implement, and maintain enterprise-grade observability solutions using Splunk and Datadog.
  • Develop and optimize metrics, logs, and traces collection across cloud-native and hybrid environments.
  • Build meaningful dashboards, SLOs/SLIs, and alerting strategies to improve system visibility and reliability.
  • Partner with engineering teams to embed observability best practices into CI/CD pipelines and application architecture.
  • Lead incident detection, triage support, and post-incident analysis to drive continuous improvement.
  • Automate monitoring workflows and telemetry pipelines using scripting and Infrastructure as Code (IaC).
  • Perform performance analysis and capacity planning to prevent service degradation.
  • Establish and enforce observability standards, runbooks, and documentation.
  • Support on-call reliability initiatives and improve MTTR through better tooling and insights.

Required Skills and Qualifications

  • Strong hands-on experience with Splunk and/or Datadog in production environments.
  • Solid understanding of SRE principles, SLIs, SLOs, and error budgets.
  • Experience with distributed systems monitoring and microservices architectures.
  • Proficiency in at least one scripting/programming language (Python, Bash, or Go preferred).
  • Experience with cloud platforms such as AWS, Azure, or GCP.
  • Familiarity with containerization and orchestration (Docker, Kubernetes).
  • Knowledge of logging frameworks, metrics systems, and APM tools.
  • Strong analytical and problem-solving abilities.
  • Excellent written and verbal communication skills in English.

Experience

  • 4+ years in Site Reliability Engineering, Observability Engineering, or DevOps roles.
  • Proven experience implementing observability platforms at scale.
  • Experience supporting high-availability, customer-facing systems.
  • Prior remote or distributed team experience is a plus.

Working Hours

  • Fully Remote role.
  • Flexible working hours with partial overlap required with global engineering teams.
  • Participation in an on-call rotation may be required.

Knowledge, Skills, and Abilities

  • Deep understanding of modern monitoring and telemetry ecosystems.
  • Ability to translate system behavior into actionable insights.
  • Strong troubleshooting mindset under high-pressure incidents.
  • Experience with Infrastructure as Code tools (Terraform, CloudFormation, etc.).
  • Knowledge of CI/CD pipelines and DevOps workflows.
  • Ability to balance reliability, performance, and cost optimization.
  • Strong collaboration skills in cross-functional environments.

Benefits

  • Competitive salary package.
  • 100% remote work flexibility.
  • Health and wellness benefits.
  • Paid time off and company holidays.
  • Professional development and certification support.
  • Opportunity to work with modern cloud-native technologies.
  • Collaborative and innovation-driven culture.

Why Join

At IT sight Technologies, you will play a critical role in shaping the reliability and visibility of mission-critical platforms used globally. We foster a culture of ownership, continuous learning, and engineering excellence. This is an excellent opportunity to work on large-scale distributed systems while advancing your expertise in observability and SRE practices within a forward-thinking technology organization.

How to Apply

Interested candidates should submit their updated resume along with a brief cover letter highlighting relevant observability and SRE experience. Qualified applicants will be contacted by our talent acquisition team for the next steps in the selection process.

Employment Type
Full-time
Job Location
Sydney, Sydney, New South Wales, NA, AU
Remote work from: AU
Base Salary
$10-$20 Per hour
Date posted
2026-03-07
Valid through
April 6, 2026
Button
APPLY NOW
Close modal window

Thank you for submitting your application. We will contact you shortly!