This opportunity is available through a leading AI-driven work platform.

PhD Rater

Research EngineerResearch EngineerContractRemote

Location

United States

Posted

8 days ago

Salary

Not specified

PythonMachine LearningData ScienceMathematicsStatisticsFinanceBenchmarking

Job Description

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

This role involves supporting a frontier-model evaluation initiative focused on advanced STEM reasoning and agentic workflows.

Design challenging real-world STEM problems for model evaluation
Implement benchmark tasks inside agentic development environments using Python
Create reproducible tasks with executable tests and clearly defined specifications
Analyse model and agent outputs to identify reasoning gaps and failure modes
Evaluate how AI systems perform on complex data science, machine learning, finance, and coding tasks
Document benchmark tasks, environments, and evaluation outcomes

Qualifications

Active or recently completed PhD from a top-tier U.S.-based university
Deep expertise in data science, machine learning, finance, and/or Python-based programming
Strong research background in advanced STEM domains
Experience designing complex technical problems or research benchmarks
Ability to analyse model reasoning traces and diagnose deeper system behaviour issues
Strong analytical and research documentation skills

Requirements

PhD in Computer Science, Data Science, Machine Learning, Finance, or related STEM fields

Nice to Have

Experience working with agentic frameworks or LLM tooling ecosystems
Familiarity with frameworks such as LangChain, AutoGen, MetaGPT, CrewAI, LlamaIndex, BabyAGI, or related systems
Contributions to open-source software or research projects
Experience analysing complex model behaviour or agent workflows

Benefits

Independent contractor role
Fully remote with flexible scheduling
Part-time research engagement with expected availability of 30+ hours per week
Competitive rates between $50–$100/hour depending on expertise
Weekly payments via Stripe or Wise
Projects may extend or adjust depending on scope and performance
No access to confidential or proprietary information from employers or institutions

Company Description

This opportunity is available through a leading AI-driven work platform.

Job Requirements

Active or recently completed PhD from a top-tier U.S.-based university
Deep expertise in data science, machine learning, finance, and/or Python-based programming
Strong research background in advanced STEM domains
Experience designing complex technical problems or research benchmarks
Ability to analyse model reasoning traces and diagnose deeper system behaviour issues
Strong analytical and research documentation skills
PhD in Computer Science, Data Science, Machine Learning, Finance, or related STEM fields
Nice to Have
Experience working with agentic frameworks or LLM tooling ecosystems
Familiarity with frameworks such as LangChain, AutoGen, MetaGPT, CrewAI, LlamaIndex, BabyAGI, or related systems
Contributions to open-source software or research projects
Experience analysing complex model behaviour or agent workflows

Benefits

Independent contractor role
Fully remote with flexible scheduling
Part-time research engagement with expected availability of 30+ hours per week
Competitive rates between $50–$100/hour depending on expertise
Weekly payments via Stripe or Wise
Projects may extend or adjust depending on scope and performance
No access to confidential or proprietary information from employers or institutions

Related Categories

Research Engineer

Related Job Pages

Remote Python Jobs (US)More US Remote Jobs

More Research Engineer Jobs

Senior Threat Research Engineer

Material Security

Material protects accounts even after they’re compromised or harmful messages get through.

Research Engineer9 days ago

Full TimeRemoteTeam 11-50Since 2017H1B No Sponsor

Company Site LinkedIn

The role involves improving the capability to detect email-based threats by leveraging analytical skills to track adversaries that bypass existing security systems. Responsibilities include authoring detection rules, researching attacker campaigns, and maturing internal detection and response programs.

View details: Senior Threat Research Engineer

United States

$190K - $235K / year

Apply

Director, Evaluation & Research

First Peoples Fund

Research Engineer9 days ago

Full TimeRemote

The Director, Evaluation + Research is responsible for leading the organization’s research, data collection, and evaluation efforts that support our mission of uplifting Native arts, cultures, and communities. This role will play a critical part in assessing program effectiveness...

View details: Director, Evaluation & Research

United States

Apply