Weekday (YC W21)

We are a Y-Combinator-backed startup building your AI-powered Recruiter Agent

Engineering Expert (PhD) - AI Systems Evaluation

Research ScientistResearch ScientistPart TimeRemoteTeam 11-50Since 2021H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

19 days ago

Salary

Not specified

English

Job Description

This role is for one of our clients

Compensation: $73.29 per hour

PhD-level engineers are sought to support high-impact collaborations with advanced AI research teams. This role focuses on improving the accuracy, rigor, and reliability of general-purpose conversational AI systems, particularly in engineering-related contexts.

AI systems used in professional engineering scenarios must demonstrate strong applied reasoning, quantitative accuracy, and alignment with real-world systems. This project centers on evaluating and enhancing how models interpret, reason about, and explain engineering concepts across multiple disciplines.

Job Requirements

  • Key Responsibilities
  • Develop and refine prompts to guide AI behavior in engineering-specific scenarios
  • Evaluate model-generated responses for technical correctness, applied reasoning, completeness, and practical relevance
  • Fact-check technical claims using authoritative public sources and domain expertise
  • Annotate outputs by identifying conceptual gaps, flawed assumptions, and factual inaccuracies
  • Assess clarity, structure, and appropriateness of explanations for various audiences
  • Ensure responses align with expected conversational standards and system-level guidelines
  • Apply structured evaluation frameworks, taxonomies, and benchmarking standards consistently
  • Required Qualifications
  • PhD in Engineering or a closely related field
  • Deep expertise in one or more of the following domains:
  • Mechanical & Physical Systems Engineering
  • Electrical, Electronic & Computer Engineering
  • Chemical, Materials & Process Engineering
  • Civil, Environmental & Infrastructure Engineering
  • Strong familiarity with large language models (LLMs) and their practical applications
  • Excellent written communication skills with the ability to clearly explain complex technical concepts
  • High attention to detail and ability to detect subtle technical inaccuracies
  • Experience reviewing, editing, or critiquing technical or academic writing
  • Preferred Experience
  • Applied research, industry engineering workflows, or systems design
  • Experience with reinforcement learning from human feedback (RLHF), model evaluation, or structured data annotation
  • Teaching, mentoring, or explaining engineering concepts to non-expert audiences
  • Familiarity with structured evaluation rubrics, benchmarks, or quality assurance frameworks
  • What Success Looks Like
  • You consistently identify technical inaccuracies, incomplete reasoning, or flawed assumptions in engineering-related AI outputs
  • Your structured feedback measurably improves the rigor, clarity, and correctness of model responses
  • You produce consistent, reproducible evaluation artifacts that strengthen model performance over time
  • Engineering-focused AI systems demonstrate greater reliability and trustworthiness as a result of your evaluations
  • Contract & Payment Terms
  • Engagement will be structured as an independent contractor agreement
  • Fully remote with flexible scheduling
  • Projects may be extended, shortened, or concluded early based on performance and evolving needs
  • Assignments will not require access to confidential or proprietary information from any employer, client, or institution
  • Payments are processed weekly via Stripe or Wise based on services rendered
  • Visa sponsorship is not available; H1-B and STEM OPT candidates cannot be supported at this time

Related Categories

Related Job Pages

More Research Scientist Jobs

Research Scientist19 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

The role involves evaluating and documenting metrics and approaches for fault-tolerant quantum computing, working closely with DARPA's Quantum Benchmarking Initiative team.

Modeling And Simulation SoftwareQuantum ComputingSystems Engineering
United States
$75K - $150K / year

Principal Scientist, Web Video & Graphics Acceleration – WebGPU

Adobe

Changing the world through digital experiences.

Research Scientist20 days ago
Full TimeRemoteTeam 10,001+Since 1982H1B Sponsor

Principal Scientist leading GPU-accelerated graphics in Adobe web video applications

JavaScriptTypeScript
Colorado
$190.2K - $360.5K / year

Research Scientist – Engineer

Offchain Labs

We power fast, private decentralized applications

Research Scientist20 days ago
Full TimeRemoteTeam 11-50Since 2018H1B No Sponsor

Research Scientist/Engineer at Offchain Labs tackling blockchain scalability and security

Distributed Systems
California + 1 moreAll locations: California, Colorado

Applied Research Scientist, LLM Evaluation – Post-Training

Innodata Inc.

Innodata solves your toughest data engineering challenges using artificial intelligence and human expertise.

Research Scientist20 days ago
Full TimeRemoteTeam 1,001-5,000H1B No Sponsor

Applied Research Scientist for LLM Evaluation & Post-Training at Innodata

PythonPyTorchTensorflow
New Jersey