Runway

Business financials got stuck in the 15th century so we're showing them today’s computers 🖥

Member of Technical Staff, Inference

Machine Learning EngineerMachine Learning EngineerFull TimeRemoteTeam 11-50H1B SponsorCompany SiteLinkedIn

Location

United States + 180 moreAll locations: United States, Canada, Brazil, Colombia, Argentina, Chile, Venezuela, Bolivarian Republic Of, Bolivia, Plurinational State Of, Ecuador, French Guiana, Guyana, Paraguay, Peru, Suriname, Uruguay, Mexico, Costa Rica, El Salvador, Guatemala, Honduras, Nicaragua, Panama, Dominican Republic, Puerto Rico, Bahamas, Guadeloupe, Haiti, Jamaica, Martinique, Montserrat, United Kingdom, Germany, France, Estonia, Portugal, Hungary, Poland, Ukraine, Romania, Bulgaria, Czech Republic, Slovakia, Belarus, Moldova, Republic Of, Sweden, Greece, Belgium, Italy, Ireland, Switzerland, Netherlands, Finland, Malta, Denmark, Lithuania, Croatia, Spain, Austria, Bosnia And Herzegovina, Iceland, Luxembourg, Macedonia, The Former Yugoslav Republic Of, Montenegro, Norway, Serbia, Slovenia, Albania, Cyprus, Latvia, Monaco, South Africa, Egypt, Algeria, Angola, Benin, Botswana, Burkina Faso, Burundi, Cameroon, Cape Verde, Central African Republic, Chad, Congo, Côte D'ivoire, Congo, The Democratic Republic Of The, Equatorial Guinea, Eritrea, Ethiopia, Gabon, Gambia, Ghana, Guinea, Guinea-bissau, Kenya, Lesotho, Liberia, Libyan Arab Jamahiriya, Madagascar, Malawi, Mali, Mauritania, Mauritius, Mayotte, Morocco, Mozambique, Namibia, Niger, Nigeria, Réunion, Rwanda, Senegal, Seychelles, Sierra Leone, Somalia, Sudan, Swaziland, Tanzania, United Republic Of, Togo, Tunisia, Uganda, Zambia, Zimbabwe, Georgia, Turkey, Israel, United Arab Emirates, Armenia, Azerbaijan, Bahrain, Iraq, Jordan, Kuwait, Lebanon, Oman, Qatar, Saudi Arabia, Palestinian Territory, Occupied, Yemen, India, Japan, Philippines, Pakistan, Thailand, Singapore, Viet Nam, Taiwan, Province Of China, Indonesia, Cambodia, Lao People's Democratic Republic, Malaysia, Myanmar, Korea, Republic Of, China, Afghanistan, Bangladesh, Bhutan, Kazakhstan, Kyrgyzstan, Maldives, Mongolia, Nepal, Sri Lanka, Tajikistan, Turkmenistan, Uzbekistan, Australia, Papua New Guinea, Kiribati, Palau, French Polynesia, Tuvalu, New Zealand

Posted

58 days ago

Salary

$240K - $290K / year

No structured requirement data.

Job Description

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

We're looking for an ML infrastructure engineer to bridge the gap between research and production at Runway. You'll work directly with our research teams to productionize cutting-edge generative models—taking checkpoints from training to staging to production, ensuring reliability at scale, and building the infrastructure that enables fast iteration.

You'll be embedded within research teams, providing platform support throughout the entire model development lifecycle. Your work will directly impact how quickly we can ship new models and features to millions of users.

A peek at our technical stack

  • API endpoints for real-time collaboration and media asset management written in TypeScript, running in ECS containers on AWS Fargate.
  • Leverage multiple AWS-native components, such as S3, CloudFront, Lambda, Kinesis, and SQS.
  • Inference backend written in Python (PyTorch, TorchScript), deployed across multiple clusters/cloud providers.
  • Use Kubernetes for container orchestration, with k8s-native components such as Flyte, Kueue, and Kyverno for efficient job orchestration.
  • Invest in Prometheus and Grafana for monitoring, and Terraform to manage infrastructure.

Qualifications

  • 4+ years of experience running ML model inference at scale in production environments.
  • Strong experience with PyTorch and multi-GPU inference for large models.
  • Experience with Kubernetes for ML workloads—deploying, scaling, and debugging GPU-based services.
  • Comfortable working across multiple cloud providers and managing GPU driver compatibility.
  • Experience with monitoring and observability for ML systems (errors, throughput, GPU utilization).
  • Self-starter who can work embedded with research teams and move fast.
  • Strong systems thinking and pragmatic approach to production reliability.
  • Humility and open-mindedness; at Runway we love to learn from one another.

Requirements

  • Experience building custom inference frameworks or serving systems (Nice to Have).
  • Deep understanding of distributed training and inference patterns (FSDP, data parallelism, tensor parallelism) (Nice to Have).
  • Ability to debug low-level issues: NCCL networking problems, CUDA errors, memory leaks, performance bottlenecks (Nice to Have).
  • Experience with diffusion models or video generation systems (Nice to Have).
  • Knowledge of real-time or latency-sensitive ML applications (Nice to Have).

Benefits

  • Salary range: $240,000 - $290,000.
  • Commitment to creating a space where employees can bring their full selves to work and have equal opportunity to succeed.

Company Description

Runway strives to recruit and retain exceptional talent from diverse backgrounds while ensuring pay equity for our team. Our salary ranges are based on competitive market rates for our size, stage, and industry, and salary is just one part of the overall compensation package we provide.

There are many factors that go into salary determinations, including relevant experience, skill level and qualifications assessed during the interview process, and maintaining internal equity with peers on the team. The range shared below is a general expectation for the function as posted, but we are also open to considering candidates who may be more or less experienced than outlined in the job description. In this case, we will communicate any updates in the expected salary range.

Lastly, the provided range is the expected salary for candidates in the U.S. Outside of those regions, there may be a change in the range, which again, will be communicated to candidates.

We're excited to be recognized as a best place to work by Crain's, InHerSight, BuiltIn NYC, and INC.

Job Requirements

  • 4+ years of experience running ML model inference at scale in production environments.
  • Strong experience with PyTorch and multi-GPU inference for large models.
  • Experience with Kubernetes for ML workloads—deploying, scaling, and debugging GPU-based services.
  • Comfortable working across multiple cloud providers and managing GPU driver compatibility.
  • Experience with monitoring and observability for ML systems (errors, throughput, GPU utilization).
  • Self-starter who can work embedded with research teams and move fast.
  • Strong systems thinking and pragmatic approach to production reliability.
  • Humility and open-mindedness; at Runway we love to learn from one another.
  • Experience building custom inference frameworks or serving systems (Nice to Have).
  • Deep understanding of distributed training and inference patterns (FSDP, data parallelism, tensor parallelism) (Nice to Have).
  • Ability to debug low-level issues: NCCL networking problems, CUDA errors, memory leaks, performance bottlenecks (Nice to Have).
  • Experience with diffusion models or video generation systems (Nice to Have).
  • Knowledge of real-time or latency-sensitive ML applications (Nice to Have).

Benefits

  • Salary range: $240,000 - $290,000.
  • Commitment to creating a space where employees can bring their full selves to work and have equal opportunity to succeed.

Related Job Pages

More Machine Learning Engineer Jobs

ML Engineer, Foundation Model Evaluation

Waymo

Waymo is an autonomous driving technology company creating a new way forward in mobility.

Machine Learning Engineer58 days ago
Full TimeRemoteTeam 1,001-5,000Since 2016H1B Sponsor

The mission of the Waymo AI Foundations team is to develop machine learning solutions addressing open problems in autonomous driving, towards the goal of safely operating Waymo vehicles in dozens of cities and under all driving conditions. This role follows a hybrid work schedule...

PythonPyTorchJAXTensorFlowC++Data pipelinesDistributed systems
United States
$170K - $216K / year

Principal Machine Learning Engineer

Grace Hill

Helping owners and operators of real estate increase property performance, reduce operating risk and grow top talent.

Machine Learning Engineer59 days ago
Full TimeRemoteTeam 51-200H1B Sponsor

Principal Machine Learning Engineer for HelloData's automated market analysis platform

BigQueryCloudGoogle Cloud PlatformNode.jsPandasPostgresPythonPyTorchScikit-LearnTypeScript
United States
$175K - $250K / year

Senior ML Engineer – Neural Rendering

Torc Robotics

Leading autonomous vehicle technology since 2007, Torc develops automated Level 4, Class 8 trucks with Daimler.

Machine Learning Engineer59 days ago
Full TimeRemoteTeam 501-1,000Since 2007H1B Sponsor

Senior ML Engineer focusing on Neural Rendering at Torc Robotics

CloudPythonPyTorch
Michigan
$199.2K - $298.8K / year
Machine Learning Engineer59 days ago
Full TimeRemote

We’re looking for a Senior Machine Learning Engineer to join our Machine Learning team and help apply ML and AI solutions to real business problems at scale. In this role, you’ll work on high-impact initiatives that directly support airSlate’s AI strategy — from customer ...

PythonAWSSageMakerBERTGPTLLaMASupervised LearningUnsupervised LearningSemi-Supervised LearningReinforcement LearningDeep LearningLLMEmbeddings
United States + 3 moreAll locations: United States, Poland, Ukraine, Romania