World's most capable AI for software development
Member of Engineering – Pre-training, Synthetic Data
Location
United States
Posted
45 days ago
Salary
Not specified
Job Description
Job Requirements
- Strong machine learning and engineering background
- Experience with Large Language Models (LLM)
- Understanding of how LLMs learn
- Data ablations and scaling laws
- Post-training techniques
- Training reasoning and agentic models
- Experience with implementing cost-efficient, complex pipelines to generate synthetical datasets at scale optimizing for data quality, correctness, diversity, etc.
- Experience with evals tracking model capabilities (general knowledge, reasoning, math, coding, long-context, etc)
- Experience in building trillion-scale pretraining datasets, and familiarity with concepts like data curation, deduplication, data mixing, tokenization, curriculum, impact of data repetition, etc.
- Excellent programming skills in Python
- Strong prompt engineering skills
- Experience working with large-scale GPU clusters and distributed data pipelines
- Strong obsession with data quality
- Research experience: Author of scientific papers on any of the topics: applied deep learning, LLMs, source code generation, etc. - is a nice to have
- Can freely discuss the latest papers and descend to fine details
- Is reasonably opinionated
Benefits
- Fully remote work & flexible hours
- 37 days/year of vacation & holidays
- Health insurance allowance for you and dependents
- Company-provided equipment
- Wellbeing, always-be-learning and home office allowances
- Frequent team get togethers
- Great diverse & inclusive people-first culture
Related Guides
Related Job Pages
More Software Engineer Jobs
Do you want to be at the forefront of intelligence-driven cybersecurity? We at Centripetal are innovators of disruptive cybersecurity solutions. Our CleanINTERNET managed service operationalizes billions of threat indicators in real-time to prevent over 90% of known threats again...
Sr./Staff/Principal Software Engineer (Frontend/Fullstack)
NursaReimagining the healthcare staffing industry by connecting clinicians and facilities directly to improve patient care.
Design and develop web applications, mentor teammates, shape architectural decisions, write clean code, conduct testing, and troubleshoot issues.
Senior Developer implementing Microservices and RESTful APIs for technology solutions
Senior ABAP Developer leading technical delivery in cloud ERP project