World's most capable AI for software development
Member of Engineering – Pre-training, Data Engineering
Location
United States
Posted
43 days ago
Salary
Not specified
Job Description
Job Requirements
- Strong background in building production-grade, distributed data systems for machine learning, with experience in:
- Orchestration: Slurm, Airflow, or Dagster
- Observability & Reliability: CI/CD, Grafana, Prometheus, etc.
- Infra: Git, Docker, k8s, cloud managed services
- Batched inference (ex: vLLM)
- Performance obsession, especially with large-scale GPU clusters and distributed pipelines
- Expert-level python knowledge and ability to write clean and maintainable code
- Strong algorithmic foundations
- Proficiency with libraries like Polars, Dask, or PySpark
- Nice to have:
- Experience in building trillion-scale SOTA pretraining datasets
- Experience translating research to production at scale
- Experience with OCR, web crawling, or evals
- Prior experience pre-training LLMs
Benefits
- Fully remote work & flexible hours
- 37 days/year of vacation & holidays
- Health insurance allowance for you and dependents
- Company-provided equipment
- Wellbeing, always-be-learning and home office allowances
- Frequent team get togethers
- Great diverse & inclusive people-first culture
Related Guides
Related Categories
Related Job Pages
More Data Engineer Jobs
Data Engineer
OneImagingHelping employers and employees save up to 80% on health plan and out-of-pocket medical imaging costs.
Data Engineer building scalable data systems for OneImaging's analytics solutions
Senior Data Engineer
Tava HealthA mental health benefit for every employee. Because healthy minds matter.
Senior Data Engineer building data infrastructure at Tava Health
Senior Manager, Data Engineering
CircleCircle helps businesses and developers harness the power of stablecoins for payments and internet commerce worldwide.
Senior Data Engineering Manager leading a team at fintech company
Data Engineering Manager
Franciscan HealthHospitals and healthcare services in Indianapolis, Lafayette, northwest and western Indiana and south-suburban Chicago.
Manager of Data Engineering leading a team to build scalable data pipelines