RYZ Labs logo
RYZ Labs

RYZ Labs is a startup studio built in 2021 by three lifelong entrepreneurs. The founders of RYZ have worked at some of the world's largest tech companies and some of the most iconic consumer brands. They have lived and worked in Argentina for many years and have decades of experience in Latam. Passion for the early phases of company creation Attracting the brightest talents to build industry-defining companies in a post-pandemic world Remote and distributed teams throughout the US and Latam Use of cutting-edge technologies in cloud computing Aim to provide diverse product solutions for different industries Plans to build a large number of startups in the upcoming years Our Values and What to Expect Customer First Mentality - every decision we make should be made through the lens of the customer. Bias for Action - urgency is critical, expect that the timeline to get something done is accelerated. Ownership - step up if you see an opportunity to help, even if not your core responsibility. Humility and Respect - be willing to learn, be vulnerable, and treat everyone who interacts with RYZ with respect. Frugality - being frugal and cost-conscious helps us do more with less. Deliver Impact - get things done most efficiently. Raise our Standards - always be looking to improve our processes, our team, and our expectations. The status quo is not good enough and never should be.

AI Evaluation Engineer

AI EngineerMachine Learning EngineerFull TimeRemoteMid LevelTeam 51-200

Location

United States + 24 moreAll locations: United States, Brazil, Colombia, Argentina, Chile, Venezuela, Bolivarian Republic Of, Bolivia, Plurinational State Of, Ecuador, French Guiana, Guyana, Paraguay, Peru, Suriname, Uruguay, Mexico, Costa Rica, El Salvador, Guatemala, Honduras, Nicaragua, Panama, Dominican Republic, Puerto Rico

Posted

3 days ago

Salary

Not specified

Seniority

Mid Level

PythonLLMRAGPrompt EngineeringStatistical MetricsPrecisionRecallF1 ScoreAdversarial TestingModel EvaluationLangSmithWeights & BiasesArizeDeepEvalPromptfooRAGAS

Job Description

Role Description

RYZ Labs is looking for an experienced AI Evaluation Engineer to join one of our clients’ teams.

  • Design and implement evaluation pipelines to measure the performance and reliability of AI models.
  • Develop automated testing frameworks to assess model outputs at scale.
  • Analyze model performance using both traditional statistical metrics and AI-specific evaluation methods.
  • Evaluate AI systems built on modern architectures such as LLM-based applications and Retrieval-Augmented Generation (RAG).
  • Identify potential issues related to accuracy, hallucinations, bias, safety, and model drift.
  • Conduct adversarial testing to uncover vulnerabilities and ensure safe model behavior.
  • Collaborate with engineering and AI teams to improve prompt design, model outputs, and system performance.
  • Monitor model performance in production and help define best practices for AI evaluation and observability.

Qualifications

  • Proficiency in Python and experience building scripts or pipelines to evaluate model outputs.
  • Experience working with AI/ML systems, particularly large language models (LLMs) or generative AI applications.
  • Familiarity with concepts such as prompt engineering, prompt optimization, and LLM evaluation.
  • Understanding of evaluation metrics such as precision, recall, F1-score, and AI-specific metrics related to model quality and safety.
  • Experience evaluating RAG systems or knowledge retrieval pipelines is a plus.
  • Experience with modern AI evaluation or observability tools is a plus (e.g., DeepEval, Promptfoo, RAGAS, LangSmith, Arize, Weights & Biases).
  • Strong analytical mindset with the ability to interpret model behavior and propose improvements.

Requirements

  • Experience performing adversarial testing or red-teaming of AI systems.
  • Familiarity with AI safety, bias detection, and model alignment practices.
  • Experience working in production environments deploying or monitoring AI systems.

Company Description

RYZ Labs is a startup studio founded in 2021 by two lifelong entrepreneurs. The founders of RYZ have worked at some of the world's largest tech companies and some of the most iconic consumer brands. They have lived and worked in Argentina for many years and have decades of experience in Latam. What brought them together was their passion for the early phases of company creation and the idea of attracting the brightest talents in order to build industry-defining companies in a post-pandemic world.

  • Our teams are remote and distributed throughout the US and Latam.
  • They use the latest cutting-edge cloud computing technologies to create scalable and resilient applications.
  • We aim to provide diverse product solutions for different industries and plan to build a large number of startups in the upcoming years.
  • At RYZ, you will find yourself working with autonomy and efficiency, owning every step of your development.
  • We provide an environment of opportunities, learning, growth, expansion, and challenging projects.
  • You will deepen your experience while sharing and learning from a team of great professionals and specialists.

Job Requirements

  • Proficiency in Python and experience building scripts or pipelines to evaluate model outputs.
  • Experience working with AI/ML systems, particularly large language models (LLMs) or generative AI applications.
  • Familiarity with concepts such as prompt engineering, prompt optimization, and LLM evaluation.
  • Understanding of evaluation metrics such as precision, recall, F1-score, and AI-specific metrics related to model quality and safety.
  • Experience evaluating RAG systems or knowledge retrieval pipelines is a plus.
  • Experience with modern AI evaluation or observability tools is a plus (e.g., DeepEval, Promptfoo, RAGAS, LangSmith, Arize, Weights & Biases).
  • Strong analytical mindset with the ability to interpret model behavior and propose improvements.
  • Experience performing adversarial testing or red-teaming of AI systems.
  • Familiarity with AI safety, bias detection, and model alignment practices.
  • Experience working in production environments deploying or monitoring AI systems.

Related Job Pages

More AI Engineer Jobs

AI Engineer3 days ago
Full TimeRemote

As an AI Engineer, you will be responsible for designing and implementing modern AI solutions that leverage Large Language Models (LLMs) and generative techniques to deliver advanced agentic solutions. This role requires deep technical expertise and creativity to build secure, en...

PythonSQLLLMLangChainRAGVector databasesEmbeddingsAWSServerlessCI/CDGitREST APIMicroservicesAgile
United States + 1 moreAll locations: United States, United Arab Emirates
Ubiminds logo

Senior Engineer – AI / Agentic Development

Ubiminds

We deliver custom-curated digital product teams with Staff Augmentation & Employer-of-Record Services in Brazil.

AI Engineer3 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

Ready to take the next step in your international career? We can support you! In this opportunity, you will join an innovative engineering environment focused on AI-driven developer productivity and automation. The team is working on transforming the software development lifecycl...

ClaudeLangChainLangGraphMicrosoft Agent Framework.NETSQLAngular
United States + 180 moreAll locations: United States, Canada, Brazil, Colombia, Argentina, Chile, Venezuela, Bolivarian Republic Of, Bolivia, Plurinational State Of, Ecuador, French Guiana, Guyana, Paraguay, Peru, Suriname, Uruguay, Mexico, Costa Rica, El Salvador, Guatemala, Honduras, Nicaragua, Panama, Dominican Republic, Puerto Rico, Bahamas, Guadeloupe, Haiti, Jamaica, Martinique, Montserrat, United Kingdom, Germany, France, Estonia, Portugal, Hungary, Poland, Ukraine, Romania, Bulgaria, Czech Republic, Slovakia, Belarus, Moldova, Republic Of, Sweden, Greece, Belgium, Italy, Ireland, Switzerland, Netherlands, Finland, Malta, Denmark, Lithuania, Croatia, Spain, Austria, Bosnia And Herzegovina, Iceland, Luxembourg, Macedonia, The Former Yugoslav Republic Of, Montenegro, Norway, Serbia, Slovenia, Albania, Cyprus, Latvia, Monaco, South Africa, Egypt, Algeria, Angola, Benin, Botswana, Burkina Faso, Burundi, Cameroon, Cape Verde, Central African Republic, Chad, Congo, Côte D'ivoire, Congo, The Democratic Republic Of The, Equatorial Guinea, Eritrea, Ethiopia, Gabon, Gambia, Ghana, Guinea, Guinea-bissau, Kenya, Lesotho, Liberia, Libyan Arab Jamahiriya, Madagascar, Malawi, Mali, Mauritania, Mauritius, Mayotte, Morocco, Mozambique, Namibia, Niger, Nigeria, Réunion, Rwanda, Senegal, Seychelles, Sierra Leone, Somalia, Sudan, Swaziland, Tanzania, United Republic Of, Togo, Tunisia, Uganda, Zambia, Zimbabwe, Georgia, Turkey, Israel, United Arab Emirates, Armenia, Azerbaijan, Bahrain, Iraq, Jordan, Kuwait, Lebanon, Oman, Qatar, Saudi Arabia, Palestinian Territory, Occupied, Yemen, India, Japan, Philippines, Pakistan, Thailand, Singapore, Viet Nam, Taiwan, Province Of China, Indonesia, Cambodia, Lao People's Democratic Republic, Malaysia, Myanmar, Korea, Republic Of, China, Afghanistan, Bangladesh, Bhutan, Kazakhstan, Kyrgyzstan, Maldives, Mongolia, Nepal, Sri Lanka, Tajikistan, Turkmenistan, Uzbekistan, Australia, Papua New Guinea, Kiribati, Palau, French Polynesia, Tuvalu, New Zealand
Mindrift logo

Automotive Engineering & Python Expert - Freelance AI Trainer

Mindrift

Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid. Project time expectations: Tasks are estimated to require around 10–20 hours per week during active phases, based on project requirements; This is an estimate, not a guaranteed workload, and applies only while the project is active. Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.

AI Engineer3 days ago
Part TimeRemote

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is pr...

PythonNumPySciPyPandasAutomotive EngineeringMechatronicsMechanical EngineeringNumerical Analysis
Texas
Mindrift logo

Automotive Engineering & Python Expert - Freelance AI Trainer

Mindrift

Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid. Project time expectations: Tasks are estimated to require around 10–20 hours per week during active phases, based on project requirements; This is an estimate, not a guaranteed workload, and applies only while the project is active. Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.

AI Engineer3 days ago
Part TimeRemote

Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is pr...

PythonNumPySciPyPandasAutomotive Engineering
Texas