Staff Engineer – ML Platform

January 8, 2026
$8000 - $10000 / month

Job Description

As an ML Platform Engineer, you will design, build, and maintain the infrastructure and tools that enable the rapid development, deployment, and monitoring of machine learning and generative AI models at scale. Working closely with data scientists, ML engineers, and product teams, you will create seamless workflows from experimentation to production, ensuring reliability, efficiency, and operational excellence across all ML and AI systems. This role directly supports the company’s mission to deliver intelligent, data-driven experiences to millions of customers, partners, and riders in the region.

Key Responsibilities

  • Design, build, and maintain scalable ML platforms and tooling that support the full ML lifecycle, from data ingestion to model deployment and monitoring for traditional and generative AI models
  • Develop standardized ML workflows, templates, and experiment tracking to enable rapid development and reproducible results
  • Implement CI/CD pipelines, containerization, and model registries to ensure governance, scalability, and reliability
  • Collaborate with genAI specialists to integrate transformers, embeddings, vector databases, and real-time retrieval-augmented generation (RAG) systems
  • Automate model training, inference, deployment, and versioning workflows to maintain consistency and operational efficiency
  • Ensure reliability, observability, and scalability of production ML workloads through monitoring, alerting, and performance evaluation
  • Integrate production infrastructure using tools like TensorFlow Serving, NVIDIA Triton, Seldon, Kubernetes, and cloud platforms (AWS/GCP)
  • Optimize ML and genAI infrastructure for efficient inference, fine-tuning, prompt management, and large-scale model updates
  • Partner with data engineering, product, infrastructure, and genAI teams to align platform initiatives with business goals and innovation priorities
  • Contribute to internal documentation, training, and onboarding to promote platform adoption and continuous improvement

Skill & Experience

  • Strong software engineering background with experience building distributed systems and ML/AI platforms
  • Expert in Python and familiar with ML frameworks (TensorFlow, PyTorch), infrastructure tooling (MLflow, Kubeflow, Ray), and APIs (Hugging Face, OpenAI, LangChain)
  • Hands-on experience with MLOps practices: CI/CD, Docker, Kubernetes, model registries, and IaC tools (Terraform, Helm)
  • Proven experience with cloud infrastructure (AWS or GCP), Kubernetes clusters (EKS/GKE), serverless architectures, and managed ML services (Vertex AI, SageMaker)
  • Deep knowledge of generative AI: transformers, embeddings, prompt engineering, fine-tuning, vector databases, and retrieval-augmented generation (RAG)
  • Experience designing real-time inference pipelines and integrating feature stores, streaming data platforms (Kafka, Kinesis), and observability tools
  • Strong SQL and data warehouse skills for complex queries, joins, aggregations, and transformations
  • Solid understanding of ML monitoring: model drift, decay, latency optimization, cost management, and scaling genAI applications efficiently
  • Bachelor’s degree in Computer Science, Engineering, or related field; advanced degree is a plus
  • 5+ years in ML platform engineering or infrastructure roles, with 2+ years in a tech lead or senior role
  • Proven track record of building and operating ML infrastructure at scale, supporting generative AI use-cases
  • Strategic thinker with strong problem-solving skills, effective technical decision-making, and ownership mindset
  • Excellent communication and collaboration skills across diverse, cross-functional teams

Location