poolside

World's most capable AI for software development

Member of Engineering – Pre-training, Data Engineering

Data EngineerData EngineerFull TimeRemoteTeam 51-200Since 2023H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

43 days ago

Salary

Not specified

EnglishAirflowCloudDockerGrafanaKubernetesPrometheusPy SparkPython

Job Description

• Build and maintain high-performance pipelines for trillions of tokens. • Deliver diverse and high quality datasets for pre-training foundation models. • Closely work with other teams such as Pretraining, Posttraining, Evals and Product to to ensure alignment on the quality of the models delivered.

Job Requirements

  • Strong background in building production-grade, distributed data systems for machine learning, with experience in:
  • Orchestration: Slurm, Airflow, or Dagster
  • Observability & Reliability: CI/CD, Grafana, Prometheus, etc.
  • Infra: Git, Docker, k8s, cloud managed services
  • Batched inference (ex: vLLM)
  • Performance obsession, especially with large-scale GPU clusters and distributed pipelines
  • Expert-level python knowledge and ability to write clean and maintainable code
  • Strong algorithmic foundations
  • Proficiency with libraries like Polars, Dask, or PySpark
  • Nice to have:
  • Experience in building trillion-scale SOTA pretraining datasets
  • Experience translating research to production at scale
  • Experience with OCR, web crawling, or evals
  • Prior experience pre-training LLMs

Benefits

  • Fully remote work & flexible hours
  • 37 days/year of vacation & holidays
  • Health insurance allowance for you and dependents
  • Company-provided equipment
  • Wellbeing, always-be-learning and home office allowances
  • Frequent team get togethers
  • Great diverse & inclusive people-first culture

Related Categories

Related Job Pages

More Data Engineer Jobs

Data Engineer

OneImaging

Helping employers and employees save up to 80% on health plan and out-of-pocket medical imaging costs.

Data Engineer44 days ago
Full TimeRemoteTeam 1-10H1B No Sponsor

Data Engineer building scalable data systems for OneImaging's analytics solutions

AWSCloudEC2JavaMongoDBPostgresPySparkPythonScalaSQLTerraform
United States

Senior Data Engineer

Tava Health

A mental health benefit for every employee. Because healthy minds matter.

Data Engineer44 days ago
Full TimeRemoteTeam 11-50H1B No Sponsor

Senior Data Engineer building data infrastructure at Tava Health

AirflowBigQueryPythonSQL
Alabama + 17 moreAll locations: Alabama, Arizona, California, Connecticut, Florida, Idaho, Maine, Nevada, New Jersey, New York, North Carolina, Oregon, Maryland, Massachusetts, Tennessee, Texas, Utah, Virginia

Senior Manager, Data Engineering

Circle

Circle helps businesses and developers harness the power of stablecoins for payments and internet commerce worldwide.

Data Engineer44 days ago
Full TimeRemoteTeam 501-1,000H1B Sponsor

Senior Data Engineering Manager leading a team at fintech company

AirflowJavaPythonScalaSQL
California
$225K - $290K / year

Data Engineering Manager

Franciscan Health

Hospitals and healthcare services in Indianapolis, Lafayette, northwest and western Indiana and south-suburban Chicago.

Data Engineer44 days ago
Full TimeRemoteTeam 10,001+Since 1875H1B Sponsor

Manager of Data Engineering leading a team to build scalable data pipelines

AWSAzureCloudETLPythonSQLTableau
United States
$117.3K - $161.3K / year