The data-centric AI platform powered by programmatic labeling and foundation models

Applied Research Engineer

Research EngineerResearch EngineerFull TimeRemoteTeam 51-200Since 2019H1B SponsorCompany Site LinkedIn

Location

United States

Posted

4 days ago

Salary

$150K - $180K / year

GPU ClustersAWSKubernetesSlurmPythonDistributed TrainingML Experiment TrackingData VersioningModel VersioningCloud InfrastructureJOB OrchestrationCluster ManagementNetwork ConfigurationCost ManagementFault ToleranceAuto RecoveryML Training FrameworksSoftware EngineeringVersion ControlModular DesignAutomation

Job Description

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

As an Applied Research Engineer at Snorkel AI, you will own the infrastructure that powers our model training and evaluation work. This is a hands-on role where you will build and operate GPU cluster infrastructure, training pipelines, and the tooling that allows our research and engineering teams to run experiments reliably and at scale. You will work closely with research scientists and engineers, translating training requirements into robust, reproducible systems—and proactively removing infrastructure blockers before they slow down the work that matters most.

Snorkel AI operates in a fast-paced, high-impact environment. We are looking for someone who takes pride in operational excellence, loves solving complex distributed systems problems, and thrives when given real ownership.

Location: Redwood City or San Francisco — OR REMOTE

Main Responsibilities

Set up and manage GPU cluster infrastructure on major cloud providers (e.g., AWS HyperPod) for distributed model training, including networking, provisioning, and cost tracking.
Build and operate job orchestration and scheduling systems (e.g., Kubernetes, Slurm, or cloud-native equivalents) to reliably launch and manage training, rollout, and evaluation jobs across multi-node clusters.
Integrate and maintain ML training frameworks and post-training pipelines, ensuring they run stably and reproducibly at scale.
Set up and maintain experiment tracking, dataset versioning, and model artifact management to support fast iteration.
Monitor and optimize cluster health, inter-node communication, and resource utilization; implement fault tolerance and auto-recovery so long-running jobs survive node failures.
Work closely with research scientists and ML engineers to understand requirements, unblock experiments, and evolve infrastructure as our training workloads needs change.

Qualifications

Hands-on experience managing GPU clusters on major cloud providers, including provisioning, network configuration, and cost management.
Experience with distributed compute orchestration tools such as Kubernetes, Slurm, or equivalent cluster management systems.
Working knowledge of distributed training concepts: parallelism strategies, memory optimization techniques, and inter-node communication.
Experience with setting up, managing, and integrating ML experiment tracking and data/model versioning tools.
Strong Python proficiency and solid software engineering fundamentals such as version control, modular design, and automation.
Ability to work in a fast-moving, iterative environment and take end-to-end ownership of ambiguous infrastructure problems.
Hands-on experience with post-training workflows such as supervised fine-tuning (SFT) or reinforcement learning (RLHF, GRPO, or similar) is a strong plus, but not required.

Salary Range

$150,000.00 – $180,000.00

Benefits

Joining Snorkel AI means becoming part of a company that has market proven solutions, robust funding, and is scaling rapidly—offering a unique combination of stability and the excitement of high growth. As a member of our team, you’ll have meaningful opportunities to shape priorities and initiatives, influence key strategic decisions, and directly impact our ongoing success. Whether you’re looking to deepen your technical expertise, explore leadership opportunities, or learn new skills across multiple functions, you’re fully supported in building your career in an environment designed for growth, learning, and shared success.

Job Requirements

Hands-on experience managing GPU clusters on major cloud providers, including provisioning, network configuration, and cost management.
Experience with distributed compute orchestration tools such as Kubernetes, Slurm, or equivalent cluster management systems.
Working knowledge of distributed training concepts: parallelism strategies, memory optimization techniques, and inter-node communication.
Experience with setting up, managing, and integrating ML experiment tracking and data/model versioning tools.
Strong Python proficiency and solid software engineering fundamentals such as version control, modular design, and automation.
Ability to work in a fast-moving, iterative environment and take end-to-end ownership of ambiguous infrastructure problems.
Hands-on experience with post-training workflows such as supervised fine-tuning (SFT) or reinforcement learning (RLHF, GRPO, or similar) is a strong plus, but not required.
Salary Range
$150,000.00 – $180,000.00

Benefits

Joining Snorkel AI means becoming part of a company that has market proven solutions, robust funding, and is scaling rapidly—offering a unique combination of stability and the excitement of high growth. As a member of our team, you’ll have meaningful opportunities to shape priorities and initiatives, influence key strategic decisions, and directly impact our ongoing success. Whether you’re looking to deepen your technical expertise, explore leadership opportunities, or learn new skills across multiple functions, you’re fully supported in building your career in an environment designed for growth, learning, and shared success.

Related Categories

Research Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More US Remote Jobs

More Research Engineer Jobs

Web-Scraping & bot-defense researcher

MWDN

MWDN connects exceptional tech talent with leading companies across Israel, the USA, Great Britain, and Western Europe. We aim to ensure our employees enjoy a rewarding and secure experience while collaborating with prestigious international clients. MWDN is ranked among the top 5 IT employers in our region by DOU, and we pride ourselves on our transparency and commitment to our team.

Research Engineer4 days ago

Full TimeRemote

Our client is an innovative product company that transforms how businesses access and utilize data. Using over 1 million global IPs and a database of more than 650 million verified profiles, they empower companies to make data-driven decisions with precision and ease. They are lo...

View details: Web-Scraping & bot-defense researcher

United States

Apply

Speech Research Intern -3

Centific

Centific is a frontier AI data foundry that curates diverse, high-quality data, using our purpose-built technology platforms to empower the Magnificent Seven and our enterprise clients with safe, scalable AI deployment. Our team includes more than 150 PhDs and data scientists, along with more than 4,000 AI practitioners and engineers. We harness the power of an integrated solution ecosystem—comprising industry-leading partnerships and 1.8 million vertical domain experts in more than 230 markets. Our zero-distance innovation™ solutions for GenAI can reduce GenAI costs by up to 80% and bring solutions to market 50% faster. Our mission is to bridge the gap between AI creators and industry leaders by bringing best practices in GenAI to unicorn innovators and enterprise customers.

Research Engineer5 days ago

Full TimeRemoteTeam 5,001-10,000

The intern will be responsible for designing and evaluating speech-first models, focusing on Spoken Language Models that interact conversationally over audio, moving concepts from prototype to practical demonstrations. Key tasks include developing end-to-end speech dialogue systems and aligning speech encoders with text backbones using lightweight adapters.

View details: Speech Research Intern -3

United States

Apply

R&D Associate

Engineered Advisory

Research Engineer6 days ago

Full TimeRemoteTeam 51-200

This is a remote position. Basic Function: Work as part of a multi-disciplinary team to support Research and Development (R&D) tax credit consulting services. The Associate works closely with Project Managers, Directors, and Clients to ensure accurate information is documented an...

View details: R&D Associate

United States

Apply

Research Engineer

Rwazi, Inc.

Research Engineer6 days ago

Full TimeRemote

The Research Engineer is responsible for building research prototypes, developing evaluation tooling for decision systems, and implementing experimental system architectures to enable fast iteration between theory and validation. This role also involves maintaining the technical research infrastructure necessary for applied research initiatives.

PythonSoftware EngineeringExperimental DesignSimulationBenchmarkingSystem ArchitecturePrototypingAI Pipelines

View details: Research Engineer

United States

Apply

Applied Research Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More Research Engineer Jobs

Web-Scraping & bot-defense researcher

Speech Research Intern -3

R&D Associate

Research Engineer