We are a Y-Combinator-backed startup building your AI-powered Recruiter Agent
Engineering Expert (PhD) - AI Systems Evaluation
Location
United States
Posted
19 days ago
Salary
Not specified
Job Description
This role is for one of our clients
Compensation: $73.29 per hour
PhD-level engineers are sought to support high-impact collaborations with advanced AI research teams. This role focuses on improving the accuracy, rigor, and reliability of general-purpose conversational AI systems, particularly in engineering-related contexts.
AI systems used in professional engineering scenarios must demonstrate strong applied reasoning, quantitative accuracy, and alignment with real-world systems. This project centers on evaluating and enhancing how models interpret, reason about, and explain engineering concepts across multiple disciplines.
Job Requirements
- Key Responsibilities
- Develop and refine prompts to guide AI behavior in engineering-specific scenarios
- Evaluate model-generated responses for technical correctness, applied reasoning, completeness, and practical relevance
- Fact-check technical claims using authoritative public sources and domain expertise
- Annotate outputs by identifying conceptual gaps, flawed assumptions, and factual inaccuracies
- Assess clarity, structure, and appropriateness of explanations for various audiences
- Ensure responses align with expected conversational standards and system-level guidelines
- Apply structured evaluation frameworks, taxonomies, and benchmarking standards consistently
- Required Qualifications
- PhD in Engineering or a closely related field
- Deep expertise in one or more of the following domains:
- Mechanical & Physical Systems Engineering
- Electrical, Electronic & Computer Engineering
- Chemical, Materials & Process Engineering
- Civil, Environmental & Infrastructure Engineering
- Strong familiarity with large language models (LLMs) and their practical applications
- Excellent written communication skills with the ability to clearly explain complex technical concepts
- High attention to detail and ability to detect subtle technical inaccuracies
- Experience reviewing, editing, or critiquing technical or academic writing
- Preferred Experience
- Applied research, industry engineering workflows, or systems design
- Experience with reinforcement learning from human feedback (RLHF), model evaluation, or structured data annotation
- Teaching, mentoring, or explaining engineering concepts to non-expert audiences
- Familiarity with structured evaluation rubrics, benchmarks, or quality assurance frameworks
- What Success Looks Like
- You consistently identify technical inaccuracies, incomplete reasoning, or flawed assumptions in engineering-related AI outputs
- Your structured feedback measurably improves the rigor, clarity, and correctness of model responses
- You produce consistent, reproducible evaluation artifacts that strengthen model performance over time
- Engineering-focused AI systems demonstrate greater reliability and trustworthiness as a result of your evaluations
- Contract & Payment Terms
- Engagement will be structured as an independent contractor agreement
- Fully remote with flexible scheduling
- Projects may be extended, shortened, or concluded early based on performance and evolving needs
- Assignments will not require access to confidential or proprietary information from any employer, client, or institution
- Payments are processed weekly via Stripe or Wise based on services rendered
- Visa sponsorship is not available; H1-B and STEM OPT candidates cannot be supported at this time
Related Guides
Related Categories
Related Job Pages
More Research Scientist Jobs
The role involves evaluating and documenting metrics and approaches for fault-tolerant quantum computing, working closely with DARPA's Quantum Benchmarking Initiative team.
Principal Scientist, Web Video & Graphics Acceleration – WebGPU
AdobeChanging the world through digital experiences.
Principal Scientist leading GPU-accelerated graphics in Adobe web video applications
Research Scientist/Engineer at Offchain Labs tackling blockchain scalability and security
Applied Research Scientist, LLM Evaluation – Post-Training
Innodata Inc.Innodata solves your toughest data engineering challenges using artificial intelligence and human expertise.
Applied Research Scientist for LLM Evaluation & Post-Training at Innodata