Job Description
- Provide expert technical guidance on ML Ops best practices, including model deployment, scalability, monitoring, and automation.
- Design and implement robust machine learning pipelines to ensure seamless model integration into production environments.
- Develop systems to monitor, maintain, and optimize ML models, ensuring high availability, accuracy, and reliability over time.
- Collaborate with cross-functional teams, including data scientists, engineers, and business stakeholders, to align ML Ops strategies with organizational goals.
- Apply deep domain expertise across multiple functions to deliver tailored ML solutions for specific business needs.
- Build scalable infrastructure for deploying machine learning models, leveraging containerization (e.g., Docker) and orchestration (e.g., Kubernetes) technologies.
- Lead and mentor a team of 8–10 individuals, fostering a culture of collaboration, innovation, and continuous improvement.
- Drive the adoption of advanced ML Ops tools and frameworks, such as MLflow, Kubeflow, and TensorFlow Extended (TFX), to streamline processes.
- Implement CI/CD pipelines for ML model deployment and manage infrastructure as code using tools like Terraform or CloudFormation.
- Ensure compliance with data privacy and security standards in all ML Ops implementations.
- Continuously explore emerging ML Ops technologies and methodologies to enhance operational efficiency and effectiveness.
Requirements
- 6+ years of experience in a Senior ML Ops role or a similar position, with a proven track record of success in deploying ML solutions at scale.
- Advanced expertise in machine learning model deployment, monitoring, and lifecycle management.
- Proficiency in programming languages such as Python, Java, or Scala, with strong scripting skills.
- Hands-on experience with cloud platforms (e.g., AWS, Azure, Google Cloud) for managing and deploying ML workflows.
- Deep understanding of containerization and orchestration tools (e.g., Docker, Kubernetes) and their application in ML Ops.
- Experience with data engineering and processing tools, including Apache Spark, Hadoop, and Airflow.
- Strong knowledge of ML Ops frameworks like MLflow, Kubeflow, or TFX, and familiarity with monitoring tools like Prometheus or Grafana.
- Proven ability to lead and manage teams, with at least 2 years of experience in a leadership role.
- Excellent problem-solving skills and the ability to communicate complex technical concepts to non-technical stakeholders.
- Entrepreneurial mindset with the ability to innovate and adapt to evolving business needs.
Preferred Skills
- Knowledge of compliance and regulatory standards related to data privacy and AI ethics.