You.com

Search less, find more.

Senior AI Scientist

AI Research ScientistMachine Learning EngineerFull TimeRemoteTeam 11-50Since 2021H1B No SponsorCompany SiteLinkedIn

Location

California

Posted

35 days ago

Salary

$160K - $200K / year

Bachelor Degree1 yr expEnglishBootstrapPython

Job Description

• Define and own what “good” means for search-augmented and agentic AI systems by designing evaluation frameworks that measure real-world quality, reliability, and user-relevant behavior beyond standard benchmarks. • Invent and validate novel evaluation methodologies for non-deterministic systems (LLMs, agents, RAG), including behavioral evals, long-tail and adversarial test sets, and task-specific metrics. • Develop rigorous statistical frameworks for model comparison, regression detection, and uncertainty estimation, ensuring evaluation results are defensible and decision-ready. • Build and maintain scalable evaluation systems—datasets, gold standards, eval harnesses, scoring pipelines, and analysis tooling—that can be reused across products and customers. • Lead customer-facing evaluation research, working directly with enterprise customers to translate domain-specific quality requirements into credible, actionable evals that support product decisions and sales outcomes. • Drive competitive evaluations and internal quality reviews, surfacing meaningful performance differences, trade-offs, and failure modes to inform product strategy and prioritization. • Partner with engineering and product teams to integrate evals into development loops, release gating, and ongoing quality monitoring. • Mentor and set standards for evaluation practice, reviewing eval designs, guiding other scientists, and shaping the long-term evals roadmap as systems become more agentic and complex. • End-to-End Project Leadership: Lead the development of new AI-driven projects, encompassing ideation, prototyping, research, infrastructure design, scalability, monitoring, and evaluation. • Rapid Iteration: Adapt quickly to user feedback and evolving requirements, ensuring continuous improvement in a fast-paced environment.

Job Requirements

  • Strong grounding in applied ML and statistics, with experience evaluating non-deterministic AI systems (LLMs, agents, RAG, search).
  • Deep experience with AI evaluation, including metric design, gold dataset creation, head-to-head comparisons, slicing, and error analysis.
  • Statistical rigor in model comparison, using methods such as paired tests, bootstrap confidence intervals, and robustness analyses.
  • Proficiency in Python for evaluation and analysis, including building eval harnesses, data pipelines, scoring logic, and reproducible analysis workflows.
  • Ability to translate vague product or customer goals into measurable evaluation criteria, and to challenge metrics or conclusions that don’t reflect real quality.
  • Comfort engaging directly with customers and cross-functional stakeholders, explaining evaluation results, trade-offs, and limitations clearly.
  • Strong written and verbal communication, including documenting methodologies and contributing to external publications or talks.

Benefits

  • Hubs in San Francisco and New York City offering regular in-person gatherings and co-working sessions
  • Flexible PTO with U.S. holidays observed and a week shutdown in December to rest and recharge*
  • A competitive health insurance plan covers 100% of the policyholder and 75% for dependents*
  • 12 weeks of paid parental leave in the US*
  • 401k program, 3% match - vested immediately!*
  • $500 work-from-home stipend to be used up to a year of your start date*
  • $1,200 per year Health & Wellness Allowance to support your personal goals*
  • The chance to collaborate with a team at the forefront of AI research

Related Job Pages