Sumo Logic

Sumo Logic’s vision is to make the world's digital experiences reliable and secure.

Senior Machine Learning Engineer – MLOps, LLMOps

Machine Learning EngineerMachine Learning EngineerFull TimeRemoteTeam 501-1,000Since 2010H1B SponsorCompany SiteLinkedIn

Location

California

Posted

9 days ago

Salary

Not specified

Bachelor Degree4 yrs expEnglishAirflowAWSAzureCloudDockerGoogle Cloud PlatformGrafanaG RPCJavaJenkinsKubernetesMicroservicesPrometheusPythonRustTensorflowTerraformGo

Job Description

• Design and implement scalable MLOps/LLMOps platforms supporting the full ML lifecycle: data versioning, model training, evaluation, deployment, and monitoring • Build and maintain CI/CD pipelines for ML models and LLM applications with automated testing, validation, and rollback capabilities • Develop infrastructure-as-code (IaC) for reproducible, version-controlled ML environments • Architect model serving infrastructure with auto-scaling, A/B testing, and canary deployment capabilities • Build platforms for LLM fine-tuning, prompt management, and experimentation at scale • Implement evaluation frameworks for LLM performance, quality, safety, and cost optimization • Design and deploy enterprise-grade AI agents and copilots with robust monitoring and guardrails • Establish LLM observability: token usage tracking, latency monitoring, prompt/response logging, and cost attribution • Own uptime, reliability, and performance of ML/LLM services (SLIs/SLOs) • Implement comprehensive monitoring, alerting, and incident response for ML systems • Participate in on-call rotations and drive post-incident reviews to improve system resilience • Build automation and tooling to reduce toil and accelerate ML development velocity • Partner with ML Engineers and Data Scientists to translate research into production-ready systems • Collaborate with platform and infrastructure teams on cloud architecture and resource optimization • Mentor team members on MLOps best practices, production ML patterns, and operational excellence • Drive technical decisions on tooling, frameworks, and architectural patterns

Job Requirements

  • Education: B.S./M.S./Ph.D. in Computer Science, Engineering, or related technical field
  • Experience: 4+ years of software engineering experience with 2+ years focused on MLOps/LLMOps
  • MLOps Expertise:
  • Production experience with ML model serving frameworks (e.g., TensorFlow Serving, TorchServe, Triton)
  • Hands-on with ML experiment tracking and model registry tools (MLflow, Weights & Biases, Kubeflow)
  • Proficiency in workflow orchestration (Airflow, Prefect, Kubeflow Pipelines, Metaflow)
  • LLMOps Expertise:
  • Experience with LLM deployment, fine-tuning, and evaluation frameworks (e.g., vLLM, LangChain, LlamaIndex)
  • Knowledge of prompt engineering, RAG architectures, and LLM application patterns
  • Familiarity with LLM observability tools (e.g., LangSmith, Arize, WhyLabs)
  • Cloud & Infrastructure:
  • Strong experience with major cloud providers (AWS, GCP, or Azure) and ML-specific services (SageMaker, Vertex AI, Azure ML, Bedrock)
  • Proficiency in containerization (Docker, Kubernetes) and infrastructure-as-code (Terraform, CloudFormation, Pulumi)
  • Experience with microservices architecture and API development (REST, gRPC)
  • Software Engineering:
  • Strong programming skills in Python, terraform and Helm; familiarity with Go, Java, or Rust is a plus
  • Deep understanding of CI/CD practices and tools (GitHub Actions, GitLab CI, Jenkins, ArgoCD)
  • Experience with monitoring and observability stacks (Prometheus, Grafana, DataDog, ELK)
  • Operational Excellence:
  • Track record of managing production systems with defined SLIs/SLOs
  • Experience with on-call rotations, incident management, and reliability engineering practices.

Benefits

  • Compensation varies based on a variety of factors which include (but aren’t limited to) role level, skills and competencies, qualifications, knowledge, location, and experience.
  • In addition to base pay, certain roles are eligible to participate in our bonus or commission plans, as well as our benefits offerings, and equity awards.

Related Job Pages

More Machine Learning Engineer Jobs

Machine Learning Engineer9 days ago
Full TimeRemoteTeam 1,001-5,000

The Dam Engineer will have experience with Dams in a number of different areas of focus which can include Structural Engineering of Dams, the H&H design criteria for Dams or with the Geotechnical Engineering Standards for Dams. This role offers an exceptional career opportunity f...

United States

Senior Applied Machine Learning Engineer

LVT (LiveView Technologies)

LVT is redefining how businesses operate in the physical world, moving beyond traditional security solutions to deliver AI-driven, actionable intelligence that makes sites smarter, safer, and more secure. Since pioneering our first mobile, solar-powered units, our commitment to scrappy, hands-on innovation has made us an established leader and one of the fastest-growing companies in intelligent site technology. Named one of the Financial Times’ Fastest Growing Companies 2025 #10 on the Inc. 5000 Rocky Mountain Regional list for 2025 CEO Ryan Porter named an EY Entrepreneur of the Year 2025 CTO Steve Lindsey inducted into the Silicon Slopes CTO Hall of Fame in 2024 Named one of The Software Report’s Top 100 Software Companies of 2023 Winner of the Security Today Govies Award for 2025

Machine Learning Engineer9 days ago
Full TimeRemoteTeam 501-1,000

The role involves leading, designing, and implementing Machine Learning and AI solutions, specifically leveraging LLMs, VLMs, and Agentic AI for security and monitoring systems. Responsibilities also include researching new ML technologies and guiding department leads on appropriate ML problem selection and model choices.

United States

MLOps Engineer

LVT (LiveView Technologies)

LVT is redefining how businesses operate in the physical world, moving beyond traditional security solutions to deliver AI-driven, actionable intelligence that makes sites smarter, safer, and more secure. Since pioneering our first mobile, solar-powered units, our commitment to scrappy, hands-on innovation has made us an established leader and one of the fastest-growing companies in intelligent site technology. Named one of the Financial Times’ Fastest Growing Companies 2025 #10 on the Inc. 5000 Rocky Mountain Regional list for 2025 CEO Ryan Porter named an EY Entrepreneur of the Year 2025 CTO Steve Lindsey inducted into the Silicon Slopes CTO Hall of Fame in 2024 Named one of The Software Report’s Top 100 Software Companies of 2023 Winner of the Security Today Govies Award for 2025

Machine Learning Engineer9 days ago
Full TimeRemoteTeam 501-1,000

The MLOps Engineer will provide technical leadership and mentorship while designing, building, and maintaining scalable ML Ops infrastructure for both cloud and edge deployments. Responsibilities include developing data pipelines, overseeing model deployment, implementing observability solutions, and driving continuous improvement in ML Ops processes.

United States

Lead AI or ML Engineer

UnitedHealth Group

At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone–of every race, gender, sexuality, age, location and income–deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes — an enterprise priority reflected in our mission. OptumCare is an Equal Employment Opportunity employer under applicable law and qualified applicants will receive consideration for employment without regard to race, national origin, religion, age, color, sex, sexual orientation, gender identity, disability, or protected veteran status, or any other characteristic protected by local, state, or federal laws, rules, or regulations. OptumCare is a drug-free workplace. Candidates are required to pass a drug test before beginning employment.

Machine Learning Engineer9 days ago
Full TimeRemoteTeam 10,001

This role involves providing technical leadership and guidance to development teams while ensuring alignment with architectural standards and best practices. Technical Leadership: Provide technical guidance and leadership to development teams, ensuring alignment with architectura...

United States