RYZ Labs is a startup studio built in 2021 by three lifelong entrepreneurs. The founders of RYZ have worked at some of the world's largest tech companies and some of the most iconic consumer brands. They have lived and worked in Argentina for many years and have decades of experience in Latam. Passion for the early phases of company creation Attracting the brightest talents to build industry-defining companies in a post-pandemic world Remote and distributed teams throughout the US and Latam Use of cutting-edge technologies in cloud computing Aim to provide diverse product solutions for different industries Plans to build a large number of startups in the upcoming years Our Values and What to Expect Customer First Mentality - every decision we make should be made through the lens of the customer. Bias for Action - urgency is critical, expect that the timeline to get something done is accelerated. Ownership - step up if you see an opportunity to help, even if not your core responsibility. Humility and Respect - be willing to learn, be vulnerable, and treat everyone who interacts with RYZ with respect. Frugality - being frugal and cost-conscious helps us do more with less. Deliver Impact - get things done most efficiently. Raise our Standards - always be looking to improve our processes, our team, and our expectations. The status quo is not good enough and never should be.

AI Evaluation Engineer

AI EngineerMachine Learning EngineerFull TimeRemoteMid LevelTeam 51-200

Location

United States + 24 more

Posted

3 days ago

Salary

Not specified

Seniority

Mid Level

PythonLLMRAGPrompt EngineeringStatistical MetricsPrecisionRecallF1 ScoreAdversarial TestingModel EvaluationLangSmithWeights & BiasesArizeDeepEvalPromptfooRAGAS

Job Description

Role Description

RYZ Labs is looking for an experienced AI Evaluation Engineer to join one of our clients’ teams.

Design and implement evaluation pipelines to measure the performance and reliability of AI models.
Develop automated testing frameworks to assess model outputs at scale.
Analyze model performance using both traditional statistical metrics and AI-specific evaluation methods.
Evaluate AI systems built on modern architectures such as LLM-based applications and Retrieval-Augmented Generation (RAG).
Identify potential issues related to accuracy, hallucinations, bias, safety, and model drift.
Conduct adversarial testing to uncover vulnerabilities and ensure safe model behavior.
Collaborate with engineering and AI teams to improve prompt design, model outputs, and system performance.
Monitor model performance in production and help define best practices for AI evaluation and observability.

Qualifications

Proficiency in Python and experience building scripts or pipelines to evaluate model outputs.
Experience working with AI/ML systems, particularly large language models (LLMs) or generative AI applications.
Familiarity with concepts such as prompt engineering, prompt optimization, and LLM evaluation.
Understanding of evaluation metrics such as precision, recall, F1-score, and AI-specific metrics related to model quality and safety.
Experience evaluating RAG systems or knowledge retrieval pipelines is a plus.
Experience with modern AI evaluation or observability tools is a plus (e.g., DeepEval, Promptfoo, RAGAS, LangSmith, Arize, Weights & Biases).
Strong analytical mindset with the ability to interpret model behavior and propose improvements.

Requirements

Experience performing adversarial testing or red-teaming of AI systems.
Familiarity with AI safety, bias detection, and model alignment practices.
Experience working in production environments deploying or monitoring AI systems.

Company Description

RYZ Labs is a startup studio founded in 2021 by two lifelong entrepreneurs. The founders of RYZ have worked at some of the world's largest tech companies and some of the most iconic consumer brands. They have lived and worked in Argentina for many years and have decades of experience in Latam. What brought them together was their passion for the early phases of company creation and the idea of attracting the brightest talents in order to build industry-defining companies in a post-pandemic world.

Our teams are remote and distributed throughout the US and Latam.
They use the latest cutting-edge cloud computing technologies to create scalable and resilient applications.
We aim to provide diverse product solutions for different industries and plan to build a large number of startups in the upcoming years.
At RYZ, you will find yourself working with autonomy and efficiency, owning every step of your development.
We provide an environment of opportunities, learning, growth, expansion, and challenging projects.
You will deepen your experience while sharing and learning from a team of great professionals and specialists.

Job Requirements

Proficiency in Python and experience building scripts or pipelines to evaluate model outputs.
Experience working with AI/ML systems, particularly large language models (LLMs) or generative AI applications.
Familiarity with concepts such as prompt engineering, prompt optimization, and LLM evaluation.
Understanding of evaluation metrics such as precision, recall, F1-score, and AI-specific metrics related to model quality and safety.
Experience evaluating RAG systems or knowledge retrieval pipelines is a plus.
Experience with modern AI evaluation or observability tools is a plus (e.g., DeepEval, Promptfoo, RAGAS, LangSmith, Arize, Weights & Biases).
Strong analytical mindset with the ability to interpret model behavior and propose improvements.
Experience performing adversarial testing or red-teaming of AI systems.
Familiarity with AI safety, bias detection, and model alignment practices.
Experience working in production environments deploying or monitoring AI systems.

Related Categories

AI Engineer Machine Learning Engineer AI Research Scientist LLM Engineer Computer Vision Engineer NLP Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More US Remote Jobs

More AI Engineer Jobs

AI Engineer

bswift India

AI Engineer3 days ago

Full TimeRemote

As an AI Engineer, you will be responsible for designing and implementing modern AI solutions that leverage Large Language Models (LLMs) and generative techniques to deliver advanced agentic solutions. This role requires deep technical expertise and creativity to build secure, en...

PythonSQLLLMLangChainRAGVector databasesEmbeddingsAWSServerlessCI/CDGitREST APIMicroservicesAgile

View details: AI Engineer

United States + 1 more

Apply

Senior Engineer – AI / Agentic Development

Ubiminds

We deliver custom-curated digital product teams with Staff Augmentation & Employer-of-Record Services in Brazil.

AI Engineer3 days ago

Full TimeRemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

Ready to take the next step in your international career? We can support you! In this opportunity, you will join an innovative engineering environment focused on AI-driven developer productivity and automation. The team is working on transforming the software development lifecycl...

ClaudeLangChainLangGraphMicrosoft Agent Framework.NETSQLAngular

View details: Senior Engineer – AI / Agentic Development

United States + 180 more

Apply

Automotive Engineering & Python Expert - Freelance AI Trainer

Mindrift

Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid. Project time expectations: Tasks are estimated to require around 10–20 hours per week during active phases, based on project requirements; This is an estimate, not a guaranteed workload, and applies only while the project is active. Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.

AI Engineer3 days ago

Part TimeRemote

Company Site

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is pr...

PythonNumPySciPyPandasAutomotive EngineeringMechatronicsMechanical EngineeringNumerical Analysis

View details: Automotive Engineering & Python Expert - Freelance AI Trainer

Texas

Apply