My client is a flexible, AI-forward facilities and HVAC/R management platform built for multi-site retail, grocery, restaurant, and consumer brands. We help operators control vendor chaos, reduce cost leakage, improve SLA performance, and gain portfolio-wide visibility across hundreds or thousands of locations.
Staff ML Engineer
Location
United States
Posted
3 days ago
Salary
Not specified
Seniority
Lead
Job Description
Role Description
We are seeking a Staff Applied Backend / ML Engineer to lead the design and implementation of production-grade, AI-driven systems that power intelligent products at scale. This is a deeply hands-on senior technical role responsible for:
- Architecting distributed backend systems
- Designing and deploying machine learning pipelines
- Translating ambiguous business problems into robust, measurable solutions
- Shipping hardened production systems — not just prototypes
You will design high-performance APIs and data workflows, productionize ML models (training, evaluation, inference, monitoring), and ensure reliability, security, and scalability across cloud environments. This role is ideal for an engineer who thrives at the intersection of systems architecture, applied ML, and product impact, and can move from whiteboard design to production deployment without handoffs.
What You Will Do
-
Backend & Distributed Systems
- Architect and implement scalable backend services (REST / gRPC, async systems, event-driven pipelines)
- Design data-intensive workflows and ETL / ELT systems
- Build high-throughput, low-latency APIs
- Own service reliability, observability, and performance tuning
- Define API contracts and integration patterns across services
-
Applied Machine Learning
- Design ML pipelines end-to-end (data ingestion → feature engineering → training → evaluation → deployment)
- Productionize models (batch and real-time inference)
- Build embedding pipelines, ranking systems, and classification / extraction models
- Implement experimentation frameworks and offline/online evaluation loops
- Establish model monitoring (drift detection, performance tracking, alerts)
-
Cloud & Infrastructure
- Deploy services across cloud environments (GCP, Azure, AWS)
- Design CI/CD pipelines for ML and backend systems
- Implement infrastructure-as-code practices
- Ensure system security, compliance, and scalability
-
Technical Leadership
- Set engineering standards and architectural direction
- Mentor engineers across backend and ML domains
- Collaborate with product, data, and leadership teams
- Break down ambiguous problems into measurable execution plans
- Drive long-term AI and platform strategy
Qualifications
- 6+ years of backend engineering experience
- 2+ years of applied ML in production environments
- Strong Python or Java expertise
- Experience designing distributed systems at scale
- Hands-on experience deploying ML models in production
- Experience with SQL + NoSQL databases
- Familiarity with event-driven architectures (Kafka, Pub/Sub, streaming systems)
Requirements
- Feature engineering and data preprocessing
- Embeddings, ranking, or similarity systems
- Model evaluation frameworks
- Model serving (batch and real-time)
- Experimentation / A/B testing systems
- Production monitoring and observability
- Cloud-native architecture (GCP / Azure / AWS)
- Containerization (Docker) and orchestration (Kubernetes)
- CI/CD pipelines
- API performance optimization
- Security best practices for distributed systems
Nice to Have
- Experience with large language models (LLMs) in production
- Experience building data platforms or workflow orchestration systems
- Experience with vector databases
- Experience in B2B, industrial, or commerce platforms
- Familiarity with infrastructure-as-code (Terraform, Helm)
What Success Looks Like
Within 6 months, you will have:
- Designed and shipped a production ML-backed service
- Established reliability and observability standards
- Improved performance and cost-efficiency of core systems
- Mentored engineers and elevated technical quality across the team
Job Requirements
- 6+ years of backend engineering experience
- 2+ years of applied ML in production environments
- Strong Python or Java expertise
- Experience designing distributed systems at scale
- Hands-on experience deploying ML models in production
- Experience with SQL + NoSQL databases
- Familiarity with event-driven architectures (Kafka, Pub/Sub, streaming systems)
- Feature engineering and data preprocessing
- Embeddings, ranking, or similarity systems
- Model evaluation frameworks
- Model serving (batch and real-time)
- Experimentation / A/B testing systems
- Production monitoring and observability
- Cloud-native architecture (GCP / Azure / AWS)
- Containerization (Docker) and orchestration (Kubernetes)
- CI/CD pipelines
- API performance optimization
- Security best practices for distributed systems
- Nice to Have
- Experience with large language models (LLMs) in production
- Experience building data platforms or workflow orchestration systems
- Experience with vector databases
- Experience in B2B, industrial, or commerce platforms
- Familiarity with infrastructure-as-code (Terraform, Helm)
- What Success Looks Like
- Within 6 months, you will have:
- Designed and shipped a production ML-backed service
- Established reliability and observability standards
- Improved performance and cost-efficiency of core systems
- Mentored engineers and elevated technical quality across the team
Related Guides
Related Job Pages
More Machine Learning Engineer Jobs
Machine Learning Lead driving Lexis + AI initiative at LexisNexis
Senior Staff Machine Learning Engineer
InstacartInstacart invites the world to share love through food. This is how homemade is made.
Lead technical vision as Senior Staff Machine Learning Engineer at Instacart
Machine Learning Lead ****Remote CST & EST Preferred
Reed TechnologyLexisNexis Legal & Professional® provides legal, regulatory, and business information and analytics that help customers increase their productivity, improve decision-making, achieve better outcomes, and advance the rule of law around the world.
This role involves researching cutting-edge AI technology applicability and designing agentic AI systems while delivering high-impact AI-powered features for customers. Responsibilities also include creating performance- and cost-optimized services and building AWS platform infrastructure using Terraform.
Machine Learning Lead ****Remote CST & EST Preferred
RemitlyLexisNexis® Risk Solutions provides customers with solutions and decision tools that combine public and industry specific content with advanced technology and analytics to assist them in evaluating and predicting risk and enhancing operational efficiency. We use the power of data and advanced analytics to help our customers make better, timelier decisions. By bringing clarity to information, we ultimately help make communities safer, insurance rates more accurate, commerce more transparent, business decisions easier and processes more efficient. You can learn more about LexisNexis Risk at the link below: LexisNexis Risk Solutions
This role involves driving the design, development, and operationalization of scalable, robust solutions for the Lexis + AI initiative, working closely with data scientists and engineers. Responsibilities include researching cutting-edge AI technology, designing agentic AI systems, developing evaluation frameworks, and mentoring developers on best practices.

