09 Oct
EPAM Systems India Private
Secunderabad
Job Title : Senior Site Reliability Engineer
Skills :Site Reliability Engineering,Datadog,Dynatrace,Splunk,Grafana,Jenkins,Kubernetes,Amazon Web Services,Python,Linux
Location : Hyderabad,Bangalore,Pune,Gurgaon,Chennai,Mumbai
We are seeking a talented and motivatedSenior Site Reliability Engineer (SRE)to join our Organization.
The Senior SRE will play a crucial role in ensuring the Reliability, Scalability, Capacity Planning and performance of our infrastructure and applications. The ideal candidate will have a strong background in software engineering, system administration, Containerization and cloud technologies.
Responsibilities :
- Design, build, and maintain scalable, reliable,
and efficient cloud infrastructure and services on platforms like AWS, Azure, or Google Cloud
- Automate manual work using scripting/programming languages (Python/Bash/PowerShell, etc.) within cloud environments
- Implement and manage automation tools (Jenkins, GitLab, Ansible/Chef) and processes for streamlined deployment, monitoring, and management of systems and applications in the cloud
- Monitor system performance, troubleshoot issues proactively, and ensure high availability and performance
- Utilize observability tools (Prometheus, Grafana, ELK stack, Splunk, Dynatrace, Datadog) for monitoring, alerting, and logging to identify and address potential issues
- Participate in capacity planning and scalability assessments to support business growth and cloud resource optimization
- Manage containerization and orchestration technologies such as Docker and Kubernetes, particularly in cloud-native environments
- Ensure compliance with security best practices and standards in the cloud
- Evaluate and recommend new technologies and practices to improve system reliability,
performance, and efficiency
- Document processes, procedures, and configurations for knowledge sharing and system integrity
Requirements :
- 5-8 years of experience in a similar role
- Strong background in software engineering and system administration
- Proficiency with cloud platforms like AWS, Azure, or Google Cloud
- Experience with scripting/programming languages (Python/Bash/PowerShell)
- Experience with automation tools (Jenkins, GitLab, Ansible/Chef)
- Excellent communication and collaboration skills
- Hands-on experience with containerization and orchestration technologies (Docker, Kubernetes)
- Knowledge of security practices and standards in the cloud
- Familiarity with SLI, SLO, SLA, and Error Budget concepts
- Strong problem-solving skills and experience with Agile methodologies and DevOps practices
Nice to have :
- Certifications in cloud technologies (AWS, Azure, Google Cloud)
- Contributions to open-source projects
- Prior experience in a leadership role in an SRE team
▶️ Senior Site Reliability Engineer
🖊️ EPAM Systems India Private
📍 Secunderabad