This opportunity is available through a leading AI-driven work platform.
PhD Rater
Location
United States
Posted
8 days ago
Salary
Not specified
Job Description
Role Description
This role involves supporting a frontier-model evaluation initiative focused on advanced STEM reasoning and agentic workflows.
- Design challenging real-world STEM problems for model evaluation
- Implement benchmark tasks inside agentic development environments using Python
- Create reproducible tasks with executable tests and clearly defined specifications
- Analyse model and agent outputs to identify reasoning gaps and failure modes
- Evaluate how AI systems perform on complex data science, machine learning, finance, and coding tasks
- Document benchmark tasks, environments, and evaluation outcomes
Qualifications
- Active or recently completed PhD from a top-tier U.S.-based university
- Deep expertise in data science, machine learning, finance, and/or Python-based programming
- Strong research background in advanced STEM domains
- Experience designing complex technical problems or research benchmarks
- Ability to analyse model reasoning traces and diagnose deeper system behaviour issues
- Strong analytical and research documentation skills
Requirements
- PhD in Computer Science, Data Science, Machine Learning, Finance, or related STEM fields
Nice to Have
- Experience working with agentic frameworks or LLM tooling ecosystems
- Familiarity with frameworks such as LangChain, AutoGen, MetaGPT, CrewAI, LlamaIndex, BabyAGI, or related systems
- Contributions to open-source software or research projects
- Experience analysing complex model behaviour or agent workflows
Benefits
- Independent contractor role
- Fully remote with flexible scheduling
- Part-time research engagement with expected availability of 30+ hours per week
- Competitive rates between $50–$100/hour depending on expertise
- Weekly payments via Stripe or Wise
- Projects may extend or adjust depending on scope and performance
- No access to confidential or proprietary information from employers or institutions
Company Description
This opportunity is available through a leading AI-driven work platform.
Job Requirements
- Active or recently completed PhD from a top-tier U.S.-based university
- Deep expertise in data science, machine learning, finance, and/or Python-based programming
- Strong research background in advanced STEM domains
- Experience designing complex technical problems or research benchmarks
- Ability to analyse model reasoning traces and diagnose deeper system behaviour issues
- Strong analytical and research documentation skills
- PhD in Computer Science, Data Science, Machine Learning, Finance, or related STEM fields
- Nice to Have
- Experience working with agentic frameworks or LLM tooling ecosystems
- Familiarity with frameworks such as LangChain, AutoGen, MetaGPT, CrewAI, LlamaIndex, BabyAGI, or related systems
- Contributions to open-source software or research projects
- Experience analysing complex model behaviour or agent workflows
Benefits
- Independent contractor role
- Fully remote with flexible scheduling
- Part-time research engagement with expected availability of 30+ hours per week
- Competitive rates between $50–$100/hour depending on expertise
- Weekly payments via Stripe or Wise
- Projects may extend or adjust depending on scope and performance
- No access to confidential or proprietary information from employers or institutions
Related Guides
Related Categories
Related Job Pages
More Research Engineer Jobs
Senior Threat Research Engineer
Material SecurityMaterial protects accounts even after they’re compromised or harmful messages get through.
The role involves improving the capability to detect email-based threats by leveraging analytical skills to track adversaries that bypass existing security systems. Responsibilities include authoring detection rules, researching attacker campaigns, and maturing internal detection and response programs.
The Director, Evaluation + Research is responsible for leading the organization’s research, data collection, and evaluation efforts that support our mission of uplifting Native arts, cultures, and communities. This role will play a critical part in assessing program effectiveness...
Staff AI Innovation Engineer designing AI-powered solutions for Airbnb
Research Engineer building technical infrastructure for decision research at Rwazi