12 Oct
Cyber Sphere
Navi Mumbai
SALARY : 40LPA - 60LPAWe are seeking a talented and experienced Site Reliability Engineer (SRE) to join our team. As an SRE, you will play a crucial role in ensuring the reliability, scalability, and performance of our Azure AI Services platform. You will work closely with cross-functional teams to design, implement, and maintain robust infrastructure and automation solutions
Qualifications :- 6+ years of Software Engineering experience- BS Engineering/Computer Science or equivalent experience required
Responsibilities :
- Design, deploy, and maintain a highly available and scalable data infrastructure on Azure open ai , databases and event driven services
- Monitor and optimize the performance of AI workloads
- Collaborate with cross-functional teams, including data engineers, data scientists, and developers, to provide technical guidance and support in implementing best practices.
- Ensure data governance policies and practices are followed to maintain data integrity, security, and compliance.
- Troubleshoot and resolve issues related to data infrastructure, working closely with operations and development teams.
- Implement automation and monitoring tools to streamline operations and improve system reliability.
- Plan and execute disaster recovery procedures and backup strategies for data platforms.
- Stay up to date with industry trends and emerging technologies related to data management, analytics, and cloud computing.
Requirements :
- Proven experience as an SRE or similar role, with a focus on data infrastructure and analytics.
- Strong expertise in managing and optimizing Azure open ai or event driven applications in azure
- In-depth knowledge of data governance principles, data security, and compliance requirements.
- Experience with performance optimization techniques for large-scale data processing and analytics workloads.
- Experience managing Azure cloud services, including compute, storage, networking, and security.
- Familiarity with AI services, particularly OpenAI, for implementing machine learning and natural language processing solutions.
- Proficiency in Terraform for infrastructure as code management and automation.
- Any database knowledge is required, including SQL and NoSQL databases, for data storage and management.
- Proficiency in scripting and automation using languages such as Python, PowerShell, or Bash.
- Familiarity with cloud platforms, preferably Microsoft Azure, and related services (Azure Data Factory, Azure Data Lake Analytics, etc.).
- Solid understanding of containerization technologies, such as Docker and Kubernetes.
- Strong problem-solving skills and the ability to troubleshoot complex issues in a distributed data environment.
- Excellent communication and collaboration skills to work effectively with cross-functional teams
Desirable Skills :Desirable - AWS / Azure /Kubernetes Certifications (ref:hirist.tech)
▶️ Site Reliability Engineer - Azure/OpenAI
🖊️ Cyber Sphere
📍 Navi Mumbai