24-MAG

This opportunity is available through a leading AI-driven work platform.

PhD Rater

Research EngineerResearch EngineerContractRemote

Location

United States

Posted

8 days ago

Salary

Not specified

PythonMachine LearningData ScienceMathematicsStatisticsFinanceBenchmarking

Job Description

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

This role involves supporting a frontier-model evaluation initiative focused on advanced STEM reasoning and agentic workflows.

  • Design challenging real-world STEM problems for model evaluation
  • Implement benchmark tasks inside agentic development environments using Python
  • Create reproducible tasks with executable tests and clearly defined specifications
  • Analyse model and agent outputs to identify reasoning gaps and failure modes
  • Evaluate how AI systems perform on complex data science, machine learning, finance, and coding tasks
  • Document benchmark tasks, environments, and evaluation outcomes

Qualifications

  • Active or recently completed PhD from a top-tier U.S.-based university
  • Deep expertise in data science, machine learning, finance, and/or Python-based programming
  • Strong research background in advanced STEM domains
  • Experience designing complex technical problems or research benchmarks
  • Ability to analyse model reasoning traces and diagnose deeper system behaviour issues
  • Strong analytical and research documentation skills

Requirements

  • PhD in Computer Science, Data Science, Machine Learning, Finance, or related STEM fields

Nice to Have

  • Experience working with agentic frameworks or LLM tooling ecosystems
  • Familiarity with frameworks such as LangChain, AutoGen, MetaGPT, CrewAI, LlamaIndex, BabyAGI, or related systems
  • Contributions to open-source software or research projects
  • Experience analysing complex model behaviour or agent workflows

Benefits

  • Independent contractor role
  • Fully remote with flexible scheduling
  • Part-time research engagement with expected availability of 30+ hours per week
  • Competitive rates between $50–$100/hour depending on expertise
  • Weekly payments via Stripe or Wise
  • Projects may extend or adjust depending on scope and performance
  • No access to confidential or proprietary information from employers or institutions

Company Description

This opportunity is available through a leading AI-driven work platform.

Job Requirements

  • Active or recently completed PhD from a top-tier U.S.-based university
  • Deep expertise in data science, machine learning, finance, and/or Python-based programming
  • Strong research background in advanced STEM domains
  • Experience designing complex technical problems or research benchmarks
  • Ability to analyse model reasoning traces and diagnose deeper system behaviour issues
  • Strong analytical and research documentation skills
  • PhD in Computer Science, Data Science, Machine Learning, Finance, or related STEM fields
  • Nice to Have
  • Experience working with agentic frameworks or LLM tooling ecosystems
  • Familiarity with frameworks such as LangChain, AutoGen, MetaGPT, CrewAI, LlamaIndex, BabyAGI, or related systems
  • Contributions to open-source software or research projects
  • Experience analysing complex model behaviour or agent workflows

Benefits

  • Independent contractor role
  • Fully remote with flexible scheduling
  • Part-time research engagement with expected availability of 30+ hours per week
  • Competitive rates between $50–$100/hour depending on expertise
  • Weekly payments via Stripe or Wise
  • Projects may extend or adjust depending on scope and performance
  • No access to confidential or proprietary information from employers or institutions

Related Categories

Related Job Pages

More Research Engineer Jobs

Senior Threat Research Engineer

Material Security

Material protects accounts even after they’re compromised or harmful messages get through.

Research Engineer9 days ago
Full TimeRemoteTeam 11-50Since 2017H1B No Sponsor

The role involves improving the capability to detect email-based threats by leveraging analytical skills to track adversaries that bypass existing security systems. Responsibilities include authoring detection rules, researching attacker campaigns, and maturing internal detection and response programs.

United States
$190K - $235K / year
Full TimeRemote

The Director, Evaluation + Research is responsible for leading the organization’s research, data collection, and evaluation efforts that support our mission of uplifting Native arts, cultures, and communities. This role will play a critical part in assessing program effectiveness...

United States

Staff AI Innovation Engineer

Airbnb

Airbnb is a community based on connection and belonging.

Research Engineer10 days ago
Full TimeRemoteTeam 5,001-10,000Since 2007H1B Sponsor

Staff AI Innovation Engineer designing AI-powered solutions for Airbnb

United States
$180K - $225K / year

Research Engineer

Rwazi

Decision AI for enterprise teams.

Research Engineer15 days ago
Full TimeRemoteTeam 11-50Since 2021

Research Engineer building technical infrastructure for decision research at Rwazi

United States