Senior Machine Learning Engineer
Location
United States
Posted
2 days ago
Salary
$150K - $161K / year
Seniority
Senior
Job Description
- Deep expertise in building and operating production-grade ML and data platforms using Spark (PySpark + SQL), Databricks (Azure), and Delta Lake, with strong hands-on experience in MLOps practices including MLflow model lifecycle management, feature store architecture (offline + online), CI/CD for ML workflows, and scalable model deployment. Proven ability to design reliable, cost-efficient distributed data systems, optimize Spark workloads, and implement robust governance, observability, and access controls across ML data pipelines. Strong cloud engineering fundamentals in Azure, including orchestration, infrastructure reliability, and integration with services such as CosmosDB and downstream analytics systems.
- The Data Engineering and Machine Learning teams at Zip exist to make data and ML production-ready, trusted, and scalable across the business. Our mission is to elevate the quality, reliability, and accessibility of data assets while enabling innovative AI-driven applications that create measurable customer and commercial impact.
- We operate with an ownership mindset — engineers here don’t just build pipelines, they own platforms end-to-end. Great talent on this team thrives in ambiguity, designs with scale and reliability in mind, and proactively improves standards rather than maintaining the status quo. We work collaboratively across Data Science, Analytics, and Engineering, balancing speed with engineering discipline. We value pragmatic problem-solvers who think in systems, prioritize observability and maintainability, and are motivated by building infrastructure that empowers others to move faster and smarter.
Start your adventure with Zip
We’re hiring a Senior Machine Learning Platform Engineer to build and operate the infrastructure that powers production-grade machine learning at Zip. In this role, you’ll own the ML lifecycle end-to-end — from feature pipelines and model registry standards to CI/CD and scalable model serving on Databricks (Azure). You’ll ensure our ML systems are reliable, observable, and built to scale as we expand AI-driven capabilities across the business.
Our goals include enhancing the discipline within our data engineering practices, strengthening our collaboration with the Data Analytics and Data Science teams, and elevating the quality of our data assets. These changes are designed to better position us to leverage the full potential of our data, allowing us to explore new and innovative applications, including the use of AI.
Interesting problems you’ll get to solve
Own the ML Lifecycle (MLOps)
- Build and maintain feature pipelines (batch + streaming)
- Manage offline and online feature store patterns (CosmosDB-backed online lookup)
- Administer and enforce standards around MLflow model registry, versioning, and promotion workflows
- Deploy and operate model serving endpoints
- Implement CI/CD for ML pipelines and model deployment
- Participate in on-call rotation for platform-owned systems
Build Production-Grade Spark Systems
- Develop pipelines using PySpark and Spark SQL
- Optimize joins, partitioning, and shuffle-heavy workloads
- Improve reliability and cost-efficiency of Spark jobs
- Ensure pipelines are modular, testable, and production-ready
- Support streaming workloads using Delta Live Tables
Operate and Improve the Platform
- Administer Databricks clusters, jobs, policies, and permissions
- Improve observability, alerting, and operational standards
- Contribute to Lakehouse Federation initiatives (Databricks ↔ Snowflake via Iceberg)
- Integrate ML services into downstream architecture
- Implement governance, access controls (RBAC), and data quality/observability standards across ML data pipelines
What you’ll bring to the team
Experience
- 8+ years of experience in Machine Learning with a strong focus on production-grade ML and distributed data systems
- Demonstrated experience owning and operating ML systems end-to-end in production environments
Strong Spark Capability (Core Requirement)
- Advanced experience with PySpark and Spark SQL
- Strong understanding of Spark execution (joins, shuffles, partitioning)
- Experience building and optimizing reliable, scalable data pipelines
- Strong data engineering fundamentals including medallion architecture design, incremental/idempotent ETL patterns, and Delta Lake optimization (partitioning)
MLOps & ML Systems
- Experience operating ML systems in production
- Hands-on experience with MLflow (tracking + model registry)
- Experience managing feature stores (offline + online)
- Experience deploying and monitoring model serving endpoints
- Experience implementing CI/CD for ML workflows
Cloud & Platform Experience
- Experience working in Azure
- Production experience with Databricks and Delta Lake
- Experience integrating with CosmosDB or similar NoSQL key-value stores
- Experience designing orchestrated, production-grade data workflows (Databricks Workflows, Airflow, or ADF) with dependency management, backfills, and failure recovery
Nice to Have
- Delta Live Tables and streaming pipelines
- Iceberg or Lakehouse Federation experience
- Snowflake experience
- Vector databases or LLM infrastructure
- Infrastructure-as-code experience
What you’ll get in return
Zip is a place where you’ll get out what you put in. The newness of our sector means we need to move at pace and embrace change, and our promise to you when you join the team is that you’ll feel empowered and trusted to make big things happen quickly.
We want you to feel welcome and as though you have the support to be yourself, and care for yourself at work. Because it’s important to us that you make the most of the opportunities you’ll get to grow your skills and your career, and be surrounded by smart, friendly people and leaders that have your back.
We think these are just some of the best things about being a Zipster. We will also offer you:
- Flexible working culture
- Incentive programs
- Unlimited PTO
- Generous paid parental leave
- Leading family support policies
- Company-sponsored 401k match
- Learning and wellness subscription stipend
- Beautiful Union Square office with a casual dress code
- Industry-leading, employer-sponsored insurance for you and your dependents, with several 100% Zip-covered choices available
#LI-Remote
Zip is committed to a straightforward and transparent pay structure. The actual base salary will be determined by various individualized factors, including job-related knowledge, skills, experience, location, internal equity, as well as other objective business considerations.
The annual base Pay Range for this position is $150,000- 161,000K. This range reflects our US national compensation band (USN). Additional premium percentages may apply based on our tiered premium strategy.
Subject to those same considerations, the total compensation package for this position may also include other elements, including a bonus and/or commission awards, in addition to a full range of medical, financial, and/or other benefits.
If hired, employees will be in an 'at-will position' and the Company reserves the right to modify base salary (as well as any other discretionary payment or compensation or benefit program) at any time, including for reasons related to individual performance, Company or individual department/team performance, and market factors.
Be a part of a team that reflects the diversity of our customers
We pride ourselves on being a workplace that provides equal opportunities to people of all ages, cultural backgrounds, sexual orientations, gender identities, abilities, veteran status, and everything else that makes you unique.
Equally, we’re committed to ensuring our recruitment processes are accessible and inclusive. Please let us know If there are any adjustments that need to be made to ensure you have a fair and equitable experience.
And finally…get to know us
Zip Co Limited (ASX: ZIP) is a digital financial services company, offering innovative, people-centered products.
Operating in two core markets - Australia and New Zealand (ANZ) and the US, Zip offers access to point-of-sale credit and digital payment services, connecting millions of customers with its global network of tens of thousands of merchants. We’re proud to be a values-led business and our values - Customer First, Own it, Stronger Together and Change the Game - guide us in everything we do.
I acknowledge by clicking "Submit Application", that the information provided is true and correct. I also understand that any willful dishonesty may render for refusal of this application or immediate termination of employment. By providing your information, you acknowledge that you have read our Zip Applicant and Candidate Privacy Notice and authorize Zip to process your data subject to those terms. Zip participates in the federal government’s E-Verify program.
Before you apply, give Zip a try -> rebrand.ly/check-zip-out
Job Requirements
- 8+ years of experience in Machine Learning with a strong focus on production-grade ML and distributed data systems
- Demonstrated experience owning and operating ML systems end-to-end in production environments
- Strong Spark Capability (Core Requirement)
- Advanced experience with PySpark and Spark SQL
- Strong understanding of Spark execution (joins, shuffles, partitioning)
- Experience building and optimizing reliable, scalable data pipelines
- Strong data engineering fundamentals including medallion architecture design, incremental/idempotent ETL patterns, and Delta Lake optimization (partitioning)
- Experience operating ML systems in production
- Hands-on experience with MLflow (tracking + model registry)
- Experience managing feature stores (offline + online)
- Experience deploying and monitoring model serving endpoints
- Experience implementing CI/CD for ML workflows
- Experience working in Azure
- Production experience with Databricks and Delta Lake
- Experience integrating with CosmosDB or similar NoSQL key-value stores
- Experience designing orchestrated, production-grade data workflows (Databricks Workflows, Airflow, or ADF) with dependency management, backfills, and failure recovery
- Nice to Have
- Delta Live Tables and streaming pipelines
- Iceberg or Lakehouse Federation experience
- Snowflake experience
- Vector databases or LLM infrastructure
- Infrastructure-as-code experience
Benefits
- Flexible working culture
- Incentive programs
- Unlimited PTO
- Generous paid parental leave
- Leading family support policies
- Company-sponsored 401k match
- Learning and wellness subscription stipend
- Beautiful Union Square office with a casual dress code
- Industry-leading, employer-sponsored insurance for you and your dependents, with several 100% Zip-covered choices available
Related Guides
Related Job Pages
More Machine Learning Engineer Jobs
MLOps Engineer
InovalonEmpowering data-driven healthcare for payers, providers, pharmacies, and life sciences organizations.
As an ML Ops Engineer, you will design, build, and operate the infrastructure and tooling that power end-to-end ML workflows on AWS, including SageMaker, Bedrock, and Snowflake Cortex. You will partner closely with data scientists, ML engineers, and platform teams to ensure model...
Bioinformatics Machine Learning Intern
RefinedScienceAdvance care by bringing together the best science, data and minds to discover pathways to life beyond disease.
Bioinformatics Machine Learning Intern at RefinedScience analyzing single-cell data
Sr AI or ML Engineer Remote Nationwide or Hybrid in NJ, MN, DC
OptumOptum, part of the UnitedHealth Group family of businesses, is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together. At Optum, we support your well-being with an understanding team, extensive benefits and rewarding opportunities. By joining us, you’ll have the resources to drive system transformation while we help you take care of your future. We recognize the power of connection to drive change, improve efficiency and make a difference in health care. Join a team where your skills and ideas can make an impact and where collaboration is key to creating technology that produces healthier outcomes.
Lead design, development, deployment, and maintenance of AI/ML solutions (including LLMs and Generative AI) for healthcare. Drive adoption of responsible AI, build conversational/chatbot systems, integrate AI into cloud services, ensure production reliability, mentor engineers, and align solutions with enterprise AI strategy and product goals.
Machine Learning Software Engineer
Variational AIFor over six years, Variational AI has been advancing the state-of-the-art and delivering projects to customers including Merck, Rakovina, and ImmVue Therapeutics. To learn more about us, you can find some of our recent work at variationalai.substack.com .
Variational AI is searching for a machine-learning software engineer to join us in our quest to radically accelerate the development of new drugs through machine learning excellence. You will help improve our existing code base by: Improving memory and compute efficiency Developi...


