Job Summary
InfoTech Solutions is seeking a highly motivated and experienced Site Reliability Engineer (SRE) to join our Remote Global Operations team. In this role, you will be responsible for ensuring the reliability, scalability, performance, and security of our mission-critical systems and platforms. You will work closely with software engineers, DevOps teams, and product stakeholders to design resilient architectures, automate operations, and proactively resolve system issues.
As an SRE at InfoTech Solutions, you will play a strategic role in balancing system stability with rapid innovation, enabling the organization to deliver world-class digital services to customers across multiple regions and time zones.
Key Responsibilities
-
Design, implement, and maintain highly available and scalable infrastructure across cloud and on-prem environments.
-
Monitor system performance, availability, and reliability using industry-standard monitoring and observability tools.
-
Automate operational processes, including deployments, scaling, backups, and incident response.
-
Lead incident management, root cause analysis (RCA), and post-incident reviews to prevent future issues.
-
Collaborate with development teams to improve system design, performance, and fault tolerance.
-
Implement and maintain CI/CD pipelines to ensure smooth and reliable software releases.
-
Develop and maintain system documentation, runbooks, and operational guidelines.
-
Ensure security best practices, compliance standards, and disaster recovery plans are followed.
-
Participate in on-call rotations to support global systems and customers.
Required Skills and Qualifications
-
Strong experience with Linux/Unix system administration.
-
Proficiency in cloud platforms such as AWS, Azure, or Google Cloud Platform (GCP).
-
Hands-on experience with containerization and orchestration tools (Docker, Kubernetes).
-
Strong scripting and automation skills using Python, Bash, or similar languages.
-
Experience with monitoring and observability tools such as Prometheus, Grafana, ELK, Datadog, or New Relic.
-
Knowledge of Infrastructure as Code (IaC) tools like Terraform, CloudFormation, or Ansible.
-
Solid understanding of networking concepts, DNS, load balancing, and security principles.
-
Excellent problem-solving skills and ability to work under pressure.
-
Strong communication and collaboration skills in a remote environment.
Experience
-
3–7 years of experience in Site Reliability Engineering, DevOps, Systems Engineering, or a related role.
-
Proven experience managing large-scale distributed systems.
-
Prior experience working in a 24/7 production environment is highly preferred.
-
Experience in global operations or supporting international teams is an advantage.
Working Hours
-
Fully remote role with flexible working hours.
-
Must be available for on-call rotations and occasional overlap with global teams across different time zones.
-
Standard work schedule: 40 hours per week, with flexibility based on operational needs.
Knowledge, Skills, and Abilities
-
Deep understanding of system reliability, availability, and performance engineering.
-
Ability to analyze complex technical problems and provide effective solutions.
-
Strong attention to detail and commitment to high-quality standards.
-
Ability to prioritize tasks and manage multiple projects simultaneously.
-
Passion for automation, continuous improvement, and operational excellence.
-
Strong documentation and knowledge-sharing mindset.
Benefits
-
Competitive salary with performance-based incentives.
-
Fully remote work with global exposure.
-
Health insurance and wellness programs.
-
Paid time off, holidays, and flexible leave policies.
-
Learning and development opportunities, including certifications and training.
-
Access to the latest tools, technologies, and cloud platforms.
-
Career growth opportunities within a fast-growing global organization.
Why Join InfoTech Solutions?
At InfoTech Solutions, we believe reliability is the foundation of innovation. You will join a diverse and talented team of engineers working on cutting-edge technologies that impact customers worldwide. We foster a culture of collaboration, learning, and ownership, where your ideas are valued and your expertise makes a real difference.
This is an opportunity to grow your career in a global environment, work on large-scale systems, and contribute to building highly reliable digital platforms that shape the future of technology.
How to Apply
Interested candidates are encouraged to submit their updated resume along with a brief cover letter highlighting relevant experience and technical expertise. Shortlisted candidates will be contacted for technical interviews and remote assessments.