Cantina Labs is a social AI company, developing a suite of advanced real-time models that push the boundaries of expression, personality, and realism. We bring characters to life, transforming how people tell stories, connect, and create. We build and power ecosystems. Cantina, our flagship social AI platform, is just the beginning. If you're excited about the potential AI has to shape human creativity and social interactions, join us in building the future!

AI Research Engineer

AI EngineerMachine Learning EngineerFull TimeRemoteMid Level

Location

United States + 1 more

Posted

1 day ago

Salary

$170K - $210K / year

Seniority

Mid Level

PyTorchDeepSpeedFSDPDDPAWS GroundTruthtorch.compileTritonTensorRTONNXdiffusion modelstransformersVAEsflow-based modelsPython

Job Description

Role Description

We are looking for a talented AI Research Engineer to join our computer vision research team. In this role, you will work closely with our research team, implementing, training, and evaluating state-of-the-art image and video generation models. You will own the engineering execution that turns research ideas into working systems:

Building robust data pipelines
Running and stabilizing large-scale training
Implementing models from papers
Optimizing for speed/efficiency
Running rigorous evaluations

This is a high-impact implementation and execution role. This role is ideal for engineers who enjoy building reliable ML systems and scaling research ideas into production-quality training pipelines. The ideal candidate is someone who gets deep satisfaction from:

Making complex systems work
Translating research ideas into reliable, scalable code
Debugging training instabilities
Delivering measurable improvements in training stability, model quality, and inference efficiency

This is an excellent opportunity to work closely with experienced researchers, gain deep hands-on exposure to cutting-edge model training techniques, latest research methods in diffusion/transformer-based generation, large-scale experimentation, and efficiency innovations, all while contributing directly to production-grade models.

Qualifications

2–5 years of hands-on experience building and training ML systems, with strong ownership of results
Fluency in PyTorch: comfortable reading, writing, and debugging both training and inference code
Experience training or fine-tuning generative models (diffusion models, transformers, VAEs, or similar) from scratch or near-scratch
Solid understanding of distributed training workflows and practical debugging of large training runs
Demonstrated ability to read and implement AI research papers in computer vision
Familiarity with cutting-edge computer vision models and research literature in the image and video domain
Experience building data pipelines for large-scale image or video datasets
Strong debugging skills: comfortable diagnosing both engineering bugs and training failures
Strong engineering mindset: writing clean, reliable, debuggable code; profiling tools; handling numerical issues at scale

Requirements

Build and maintain end-to-end data pipelines for large-scale image and video datasets: collection, filtering, augmentation, conditioning alignment, and efficient storage/sampling
Implement model architectures (diffusion, autoregressive, flow-based, diffusion transformers, etc.) and maintain high-throughput PyTorch training loops for large-scale image and video diffusion models
Run and manage large-scale training experiments on multi-GPU and multi-node setups (DDP, FSDP, DeepSpeed)
Debug training instabilities, loss spikes, and convergence issues
Apply quantization, pruning, and knowledge distillation techniques to compress models without sacrificing quality
Collaborate with researchers and translate state-of-the-art research papers into working implementations in our internal codebase (e.g., new attention mechanisms, sampling schedules, or conditioning methods)
Build and maintain evaluation pipelines of image quality, video consistency, and perceptual metrics
Set up and maintain human annotation and evaluation pipelines using services like AWS GroundTruth
Profile and optimize training speed, GPU memory utilization, and iteration time
Implement inference optimizations to reduce latency and compute cost
Work with acceleration toolchains such as torch.compile, Triton, TensorRT, or ONNX where appropriate

Benefits

Competitive salary and generous company equity
Medical, dental, and vision insurance – 99.99% of premiums covered by Cantina
42 days of paid time off, including:

15 PTO days
10 sick days
15 company holidays
2 floating holidays

Generous parental leave & fertility support
401(k) retirement savings plan
Lifestyle spending account – $500/month to use however you’d like
Complimentary lunch and snacks for in-office employees
One Medical membership, and more!

Job Requirements

2–5 years of hands-on experience building and training ML systems, with strong ownership of results
Fluency in PyTorch: comfortable reading, writing, and debugging both training and inference code
Experience training or fine-tuning generative models (diffusion models, transformers, VAEs, or similar) from scratch or near-scratch
Solid understanding of distributed training workflows and practical debugging of large training runs
Demonstrated ability to read and implement AI research papers in computer vision
Familiarity with cutting-edge computer vision models and research literature in the image and video domain
Experience building data pipelines for large-scale image or video datasets
Strong debugging skills: comfortable diagnosing both engineering bugs and training failures
Strong engineering mindset: writing clean, reliable, debuggable code; profiling tools; handling numerical issues at scale
Build and maintain end-to-end data pipelines for large-scale image and video datasets: collection, filtering, augmentation, conditioning alignment, and efficient storage/sampling
Implement model architectures (diffusion, autoregressive, flow-based, diffusion transformers, etc.) and maintain high-throughput PyTorch training loops for large-scale image and video diffusion models
Run and manage large-scale training experiments on multi-GPU and multi-node setups (DDP, FSDP, DeepSpeed)
Debug training instabilities, loss spikes, and convergence issues
Apply quantization, pruning, and knowledge distillation techniques to compress models without sacrificing quality
Collaborate with researchers and translate state-of-the-art research papers into working implementations in our internal codebase (e.g., new attention mechanisms, sampling schedules, or conditioning methods)
Build and maintain evaluation pipelines of image quality, video consistency, and perceptual metrics
Set up and maintain human annotation and evaluation pipelines using services like AWS GroundTruth
Profile and optimize training speed, GPU memory utilization, and iteration time
Implement inference optimizations to reduce latency and compute cost
Work with acceleration toolchains such as torch.compile, Triton, TensorRT, or ONNX where appropriate

Benefits

Competitive salary and generous company equity
Medical, dental, and vision insurance – 99.99% of premiums covered by Cantina
42 days of paid time off, including:
15 PTO days
10 sick days
15 company holidays
2 floating holidays
Generous parental leave & fertility support
401(k) retirement savings plan
Lifestyle spending account – $500/month to use however you’d like
Complimentary lunch and snacks for in-office employees
One Medical membership, and more!

Related Categories

AI Engineer Machine Learning Engineer AI Research Scientist LLM Engineer Computer Vision Engineer NLP Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More US Remote Jobs

More AI Engineer Jobs

Lead AI Software Engineer

Streamline Healthcare Solutions

Streamline’s innovative technology solutions help behavioral health organizations advance the lives of those they serve.

AI Engineer1 day ago

Full TimeRemoteTeam 201-500H1B Sponsor

Company Site LinkedIn

This senior role involves co-designing AI solutions with the AI Architect and owning the end-to-end implementation of product-level solutions, including RAG, model training, evaluation, and production deployment within enterprise standards. Responsibilities also include establishing and operating MLOps, owning service reliability, security, and cost, and ensuring HIPAA compliance and Responsible AI practices.

PythonPyTorchTensorFlowAzureSQL ServerHIPAALLMRAGOpenAIDockerKubernetesMLOpsLangChainHugging FaceT-SQL

View details: Lead AI Software Engineer

United States

$150K - $200K / year

Apply

Director, AI Platforms – Global

Vantage Data Centers

Experience | Scalability | Efficiency By Design

AI Engineer1 day ago

Full TimeRemoteTeam 1,001-5,000Since 2010H1B Sponsor

Company Site LinkedIn

Director, AI Platforms leading AI strategy and operations at Vantage Data Centers

Cyber Security

View details: Director, AI Platforms – Global

California + 4 more

$200K - $210K / year

Apply

Software Engineer, Data & AI

Kinetic

AI Engineer1 day ago

Full TimeRemoteTeam 51-200

Engineers will design, develop, and implement robust, scalable, and cloud-native data and AI solutions contributing directly to the company's mission. They will write clean, well-tested code, participate in reviews, and contribute across the stack including backend services and AI features.

PythonAWSApache AirflowTerraformCI/CDLLMCloud ArchitectureETL

View details: Software Engineer, Data & AI

United States

$140K - $180K / year

Apply

AI Engineer

Zing Health

AI Engineer1 day ago

Full TimeRemoteTeam 11-50

The AI Engineer will be responsible for building AI-enabled applications and developing workflows leveraging Azure OpenAI and Azure AI services, focusing on practical implementation rather than research. This includes integrating these AI services with enterprise platforms like Salesforce and Genesys using APIs and tools such as Azure Functions or Logic Apps.

PythonJavaScriptREST APIAzureMicrosoft AzureSQLSalesforceGenesysPower AutomateLogic AppsSoftware EngineeringCloud EngineeringApplication Development

View details: AI Engineer

United States

$115K - $125K / year

Apply

AI Research Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More AI Engineer Jobs

Lead AI Software Engineer

Director, AI Platforms – Global

Software Engineer, Data & AI

AI Engineer