▷ (Apply in 3 Minutes) Senior DevOps Engineer Kubernetes MLOps LLMOps

▷ (Apply in 3 Minutes) Senior DevOps Engineer Kubernetes MLOps LLMOps

30 Oct
|
Infraveo
|
Bhavnagar

30 Oct

Infraveo

Bhavnagar

This is a remote position. We are seeking a highly skilled Senior DevOps Engineer with deep expertise in Kubernetes complemented by significant experience in MLOps (Machine Learning Operations) and LLMOps (Large Language Model Operations). This role is ideal for someone who has a strong background in managing and architecting SaaS applications in Kubernetes and is passionate about building and optimizing infrastructure to support machine learning and AIdriven applications. Responsibilities: The Senior DevOps Engineer will play a critical role in ensuring that our systems are highly available reliable and scalable You will architect build and monitor cloudnative architectures with Kubernetes and related technologies particularly in the context of machine learning and AI workloads. You should have a deep understanding of the Software Development Life Cycle including Continuous Integration and Continuous Deployment (CI/CD) pipeline architecture particularly as it relates to deploying ML models and AI services in Kubernetes environments. You will assist in the design and operation of critical cloud infrastructure on AWS with a focus on supporting the unique requirements of machine learning and AIdriven applications. Examples include model training deployment and scaling. All of these examples would be leveraging AWS SageMaker. Collaborate closely with data scientists and ML engineers to create a streamlined automated build and deployment process for ML models and LLMs in Kubernetes. Implement and manage the infrastructure necessary for the continuous integration delivery and monitoring of ML models and AI services ensuring they are seamlessly integrated into our SaaS applications. Ensure the availability and performance of production systems that run MLdriven services proactively identifying and resolving issues that may impact model performance or availability. Optimize infrastructure for the efficient training deployment and scaling of ML models and LLMs leveraging Kubernetes GPU clusters and cloudnative tools including AWS SageMaker. Develop and maintain monitoring and alerting solutions tailored to ML and AI workloads ensuring that both the infrastructure and deployed models are performing as expected. Troubleshoot and resolve production incidents ensuring minimal downtime and quick recovery. Participate in oncall rotation as necessary.





Ensure the security and compliance of our production systems and data with a particular focus on protecting sensitive AI and ML data. Mentor and coach junior DevOps engineers. Requirements Bachelors degree in Computer Science Engineering or a related field. A minimum of 7 years of experience in maintaining optimal performance of online production environments utilizing bare metal cloud and container technologies. At least 4 years of experience managing production Kubernetes infrastructure with exposure to cloud vendor Kubernetes solutions such as EKS AKS and GKE. Strong experience with Docker for containerization including creating and managing Docker images and containers. Strong experience in architecting and managing SaaS applications in Kubernetes with specific experience in MLOps and LLMOps.





Deep understanding of the machine learning lifecycle including model training deployment monitoring and scaling particularly using AWS SageMaker. Experience with MLOps tools and frameworks such as Kubeflow MLflow or similar and their integration into Kubernetes environments. Familiarity with LLMOps including the deployment and management of LLMs in production environments. Solid experience in scripting languages such as Python. Experience with Infrastructure deployment and automation tools such as Terraform CloudFormation etc. Working knowledge of industrystandard build tooling and CI/CD using GitHub & Github Actions. Expertise in monitoring and logging solutions such as Prometheus and Grafana. Good understanding of networking and security concepts. Strong knowledge of Linux systems and shell scripting.





Strong communication and collaboration skills with experience working closely with data scientists and ML engineers. Experience working in an agile environment and understanding of agile methodologies. Certifications such as CKA (Certified Kubernetes Administrator) or CKAD (Certified Kubernetes Application Developer) are a plus Nice to Haves: Experience with workflow orchestration tools like Apache Airflow particularly for managing complex data pipelines and ML workflows. Experience with GitOps tools such as ArgoCD for managing Kubernetes deployments through versioncontrolled repositories. Familiarity with GPU acceleration technologies and their integration with Kubernetes for optimizing ML model training and inference.





Knowledge of data versioning tools and frameworks like DVC (Data Version Control) in the context of MLOps. Experience with cloud cost optimization strategies particularly in environments running intensive ML and AI workloads. Technologies we use: We use numerous AWS services and are expanding into Azure. AWS SageMaker is central to machine learning model training deployment and management processes. Terraform CloudFormation Ansible Kubernetes are leveraged for our infrastructure deployment and automation. Industrystandard build tooling and CI/CD using GitHub ArgoCD. A mix of opensource and proprietary technologies that are tailored to the problems at hand. Benefits Work from home. 5 days a week work shift. Bachelor/'s degree in Computer Science, Engineering, or a related field.





A minimum of 7 years of experience in maintaining optimal performance of online production environments, utilizing bare metal, cloud, and container technologies. At least 4 years of experience managing production Kubernetes infrastructure, with exposure to cloud vendor Kubernetes solutions such as EKS, AKS, and GKE. Strong experience with Docker for containerization, including creating and managing Docker images and containers. Strong experience in architecting and managing SaaS applications in Kubernetes, with specific experience in MLOps and LLMOps. Deep understanding of the machine learning lifecycle, including model training, deployment, monitoring, and scaling, particularly using AWS SageMaker. Experience with MLOps tools and frameworks, such as Kubeflow,





MLflow or similar, and their integration into Kubernetes environments. Familiarity with LLMOps, including the deployment and management of LLMs in production environments. - Solid experience in scripting languages such as Python. Experience with Infrastructure deployment and automation tools such as Terraform, CloudFormation, etc. Working knowledge of industry-standard build tooling and CI/CD using GitHub & Github Actions. Expertise in monitoring and logging solutions such as Prometheus and Grafana. Good understanding of networking and security concepts. Strong knowledge of Linux systems and shell scripting. Strong communication and collaboration skills, with experience working closely with data scientists and ML engineers.





Experience working in an agile environment and understanding of agile methodologies. Certifications such as CKA (Certified Kubernetes Administrator) or CKAD (Certified Kubernetes Application Developer) are a plus Education Bachelor's degree in Computer Science or a related field is preferred.

▶️ ▷ (Apply in 3 Minutes) Senior DevOps Engineer Kubernetes MLOps LLMOps
🖊️ Infraveo
📍 Bhavnagar

Subscribe to this job alert:
Enter Your E-mail address to receive the latest job offers for: ▷ (apply in 3 minutes) senior devops engineer kubernetes mlops llmops

Senior DevOps Engineer Kubernetes MLOps LLMOps

Senior DevOps Engineer Kubernetes MLOps LLMOps

This is a remote position. We are seeking a highly skilled Senior DevOps Engineer with deep expertise in Kubernetes complemented by significant experience in MLOps (Machine Learning Operations) and LLMOps (Large Language Model Operations). This role [...]
Bhavnagar
29 Oct
    Bhavnagar
    29 Oct

Apply in 3 Minutes! Senior Devops Engineer - Azure Kubernetes CI-CD

Apply in 3 Minutes! Senior Devops Engineer - Azure Kubernetes CI-CD

Job Description: Senior DevOps Engineer Location: Remote Company: NASDAQListed Leader in Food & Beverage Sector About the Role: We are seeking a Senior DevOps Engineer to join our dynamic team within a leading NASDAQlisted company in the food and bev [...]
Bhavnagar
29 Oct
    Bhavnagar
    29 Oct

Senior DevOps Engineer AWS Cloud Formation Continuous Integration and CICD [Apply in 3 Minutes]

Senior DevOps Engineer AWS Cloud Formation Continuous Integration and CICD [Apply in 3 Minutes]

This is a remote position. We are seeking a Senior DevOps Engineer (AWS Cloud Formation Continuous Integration and CI/CD) to join our team. As a member of the DevOps team you will be responsible for architecting and managing organization AI cloud inf [...]
Bhavnagar
29 Oct
    Bhavnagar
    29 Oct

Senior Devops Engineer - Azure Kubernetes CI-CD

Senior Devops Engineer - Azure Kubernetes CI-CD

Job Description: Senior DevOps Engineer Location: Remote Company: NASDAQListed Leader in Food & Beverage Sector About the Role: We are seeking a Senior DevOps Engineer to join our dynamic team within a leading NASDAQlisted company in the food and bev [...]
Bhavnagar
30 Oct
    Bhavnagar
    30 Oct
Subscribe to this job alert:
Enter Your E-mail address to receive the latest job offers for: ▷ (apply in 3 minutes) senior devops engineer kubernetes mlops llmops