Lead Site Reliability Engineer Azure, GCP, Terraforms

17 Oct

unitedhealth group information services

Secunderabad

17 Oct

unitedhealth group information services

Secunderabad

Job Description

#

Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by diversity and inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health equity on a global scale. Join us to start Caring. Connecting. Growing together.

## Primary Responsibilities:

- Collaborate with development and operations teams to design, implement,

and maintain observability frameworks that provide deep insights into system performance, particularly for data and ML pipelines
- Lead the establishment of Service Level Objectives (SLOs) and Service Level Indicators (SLIs), ensuring they align with business goals and drive continuous performance improvements
- Partner with stakeholders to understand system performance requirements and translate them into actionable performance engineering strategies
- Proactively identify performance bottlenecks and collaborate with teams to implement solutions that enhance system scalability and reliability
- Design and execute performance regression test suites, focusing on data-intensive and AI/ML GEN AI workloads, to ensure continuous performance optimization
- Own the reliability and performance metrics of our systems, driving a culture of performance excellence and proactive issue resolution
- Collaborate with subject matter experts to gain a deep understanding of domain-specific performance challenges,

particularly in data and Voice BOTS GEN AI pipelines
- Utilize tools like Rally and GitHub to monitor system performance, manage projects, and track issues, with a solid emphasis on performance-related metrics
- Define and monitor success metrics, ensuring our systems consistently meet or exceed performance and reliability targets
- Actively contribute to the continuous improvement of performance engineering practices across the team, fostering a culture of excellence in observability and system performance
- Provide Level 1 and Level 2 support to application owners by addressing questions about cloud security and application alerts and advising on remediation strategies
- Analyze and compare cloud alerting data from various sources to identify gaps, discrepancies, and root causes

- Assist in creating and maintaining documentation related to cloud scanning and vulnerability management processes, including configuration guides and standard operating procedures
- Collaborate with application development teams to address security issues early in the development process as part of our Shift Left initiative
- Promote cloud security awareness and best practices among application teams
- Stay current with the latest cloud security features, vulnerabilities, and best practices, and provide recommendations for improvement
- Update existing documentation to reflect lessons learned and feedback obtained from stakeholders
- Comply with the terms and conditions of the employment contract, company policies and procedures,

and any and all directives (such as, but not limited to, transfer and/or re-assignment to different work locations, change in teams and/or work shifts, policies in regards to flexibility of work benefits and/or work environment, alternative work arrangements, and other decisions that may arise due to the changing business environment). The Company may adopt, vary or rescind these policies and directives in its absolute discretion and without any limitation (implied or otherwise) on its ability to do so.

## Required Qualifications:

- Bachelor of Engineer Degree
- 8 experience as Devops/SRE lead
- Expertise in Cloud Solutions (Azure, GCP), Azure AD, Azure WVD, Cloud Run, Cloud IAM, Kubernetes, Containers, Terraform, Azure DevOps, Python, React

- Ability to lead client infrastructure & cloud migration engagements, including cloud migration strategy, application cloud suitability/readiness assessment, migration roadmap planning.
- Experience with IaaC and infrastructure deployment and configuration using automated tools such as TerraForm, Ansible, or CloudFormation
- Expertise in optimizing Cloud Resources to ensure cost-effective deployments without compromising performance and reliability
- Good knowledge on ML infrastructure and services including LLMs, Generative AI, and transformers like OpenAI, ChatGPT, DialogFlow LLMs
- Undergraduate degree or equivalent experience
- Solid work ethic

At UnitedHealth Group,

our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyoneof every race, gender, sexuality, age, location and incomedeserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes an enterprise priority reflected in our mission.

**Apply Internal Employee Application**

**Locations - Primary Location**: Hyderabad, Telangana, IN (Remote considered)

▶️ Lead Site Reliability Engineer Azure, GCP, Terraforms
🖊️ unitedhealth group information services
📍 Secunderabad

Lead Site Reliability Engineer Azure, GCP, Terraforms

Lead Site Reliability Engineer Azure, GCP, Terraforms

Subscribe to this job alert:

Enter Your E-mail address to receive the latest job offers for: lead site reliability engineer azure, gcp, terraforms

DevOps Site Reliability Engineer

DevOps Site Reliability Engineer

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Site Reliability Engineer III

Site Reliability Engineer III

Site Reliability Engineer III

Site Reliability Engineer III

Subscribe to this job alert:

Enter Your E-mail address to receive the latest job offers for: lead site reliability engineer azure, gcp, terraforms