Innodata Inc

Innodata (NASDAQ: INOD) is a leading data engineering company. With more than 2,000 customers and operations in 13 cities around the world, we are an AI technology solutions provider-of-choice for 4 out of 5 of the world’s biggest technology companies, as well as leading companies across financial services, insurance, technology, law, and medicine. By combining advanced machine learning and artificial intelligence (ML/AI) technologies, a global workforce of subject matter experts, and a high-security infrastructure, we’re helping usher in the promise of AI. Our global workforce includes over 7,000 employees in the United States, Canada, United Kingdom, the Philippines, India, Sri Lanka, Israel and Germany. We’re poised for a period of explosive growth over the next few years.

Language Data Scientist

Data ScientistData ScientistFull TimeRemoteTeam 5,001-10,000

Location

United States

Posted

7 days ago

Salary

$85K - $95K / year

No structured requirement data.

Job Description

Job Title: Language Data Scientist

Location: Fully Remote within the U.S. (excluding California, Washington, Alaska, Colorado, Montana, New York, Puerto Rico, Nevada, Nebraska)

Employment Type: Full-Time (40 hours per week) Fixed-Term

Who we are: 

Innodata (NASDAQ: INOD) is a leading data engineering company. With more than 2,000 customers and operations in 13 cities around the world, we are an AI technology solutions provider-of-choice for 4 out of 5 of the world’s biggest technology companies, as well as leading companies across financial services, insurance, technology, law, and medicine. 

By combining advanced machine learning and artificial intelligence (ML/AI) technologies, a global workforce of subject matter experts, and a high-security infrastructure, we’re helping usher in the promise of AI. Innodata offers a powerful combination of both digital data solutions and easy-to-use, high-quality platforms. 

Our global workforce includes over 7,000 employees in the United States, Canada, United Kingdom, the Philippines, India, Sri Lanka, Israel and Germany. We’re poised for a period of explosive growth over the next few years. 

 

About the Role:

Innodata is building a team of Language Data Scientists and Gen AI experts to help our customers advance GenAI applications. You will work hands-on with multi-modal and multi-lingual datasets and collaborate with cross-functional partners. You will use your experience with human and synthetic data workflows to drive innovation and continuous improvement. The ideal candidate must have the right mix of skills in (computational) linguistics and human evaluation tasks, data science, and data engineering. 

Key Responsibilities: 

  • Design/improve workflows to create data for AI/ML training and evaluation. Includes human annotation and data collection workflows, as well as synthetic ones. 

  • Dive deep into existing workflows and processes to gather data and insights, make recommendations, and drive improvement through innovation and cross-functional collaboration with customers   

  • Critically assess annotation tooling and workflows   

  • Quantitatively analyze large datasets, perform statistical analysis, calculate metrics, and make recommendations to improve accuracy and performance 

  • Work closely with client stakeholders on understanding goals, gathering requirements, proposing solutions and executing them. 



Qualifications: 

  • Knowledge of how components of GenAI products or services combine to work 

  • Collaborating with cross-functional teams to define AI project requirements and objectives, ensuring alignment with overall business goals 

  • MA in (computational) linguistics, data science, computer science (AI / ML / NLU), quantitative social sciences or a related scientific / quantitative field, PhD strongly preferred 

  • Language and language data expertise: Extensive experience working with human language data and designing human evaluation tasks, including multi-phase and complex workflows. 

    • Deep understanding of language and its relationship with culture  

    • Ability to identify ambiguity and subjectivity in language  

    • Ability to work with multi-lingual and multi-modal projects  

  • Quantitative Analysis Skills: Advanced knowledge of statistics, metrics (e.g. f1 score, inter-rater reliability metrics), and data analysis methods such as sampling. 

  • Technical skills: 

    • Experience with Natural Language Processing (NLP) techniques and tools, such as SpaCy, NLTK, or Hugging Face. 

    • Proficiency in Python to 

      • handle / transform large datasets (e.g. pre- and postprocessing data, pandas) 

      • perform quantitative analyses 

      • visualize data (for example matplotlib, seaborn) 

  • Data processing: 

    • Deep understanding of data pipelines to support ML and NLP workflows,  

    • Knowledge of efficient data collection, transformation, and storage 

    • Knowledge of data structures, algorithms, and data engineering principles 

  • Excellent interpersonal skills for effective cross-functional stakeholder engagement 

  • Excellent problem-solving skills, with the ability to think critically and creatively to develop innovative AI solutions 

  • Ability to work independently and collaborate as part of a team 

  • Adaptable to changing technologies and methodologies 

  • Ability to translate experience, research and development information to understand client products and services. 

Preferred Qualifications:

  • Conducting research to stay up-to-date with the latest advancements in generative AI, machine learning, and deep learning techniques   

  • Knowledge of optimizing existing generative AI models for improved performance, scalability, and efficiency 

  • Experience of developing and maintaining ML/AI pipelines, including data preprocessing, feature extraction, model training, and evaluation   

  • Model Fine-Tuning: Knowledge of Fine-tuning pre-trained models to adapt them to specific tasks and datasets, improving their performance and relevance 

  • Developing clear and concise documentation, including technical specifications, user guides, and presentations, to communicate complex AI concepts to both technical and nontechnical stakeholders 

  • Contributing to establishing best practices and standards for generative AI development with customers and within the organization 

  • Providing technical mentorship and guidance to junior team members   

  • Understanding of techniques such as GPT, VAE, and GANs 

Salary Range: Up to $95k USD

Rates at Innodata vary depending on a wide array of factors, which may include but are not limited to the role, skill set, educational background and geographic location. 

Related Categories

Related Job Pages

More Data Scientist Jobs

Full TimeRemoteTeam 449Since 2009

This leader is responsible for designing, leading, and executing enterprise data integration and migration initiatives, focusing on SQL and SSIS based data movement and ensuring data accuracy during complex system transformations. They will partner with stakeholders to define requirements, map data across systems, and guide the evolution towards modern iPaaS platforms.

United States
$160K - $170K / year

Workday HR Data Lead

GE Aerospace

GE Aerospace is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, national or ethnic origin, sex, sexual orientation, gender identity or expression, age, disability, protected veteran status or other characteristics protected by law. GE Aerospace will only employ those who are legally authorized to work in the United States for this opening. Any offer of employment is conditioned upon the successful completion of a drug screen (as applicable). Relocation Assistance Provided: Yes #LI-Remote - This is a remote position.

Data Scientist7 days ago
Full TimeRemoteTeam 10,001+H1B No Sponsor

HR Data Lead ensuring integrity of HR foundational data

United States
$136K - $185K / year

Data Scientist II

LivePerson

Forget the AI hype. We’re living in the age of conversation. Authentic, ongoing conversations are what fuel relationships, earn loyalty, and ultimately, drive growth. From the dawn of chat and messaging to the conversational AI era, LivePerson has been connecting businesses and customers through conversation for nearly three decades. Our award-winning Conversational AI platform, Conversational Cloud®, is built using large language models fine-tuned by billions of real customer conversations. With safety and security guardrails designed for the world’s largest enterprises, you remain firmly in control of the conversation.

Data Scientist7 days ago
Full TimeRemoteTeam 1,000Since 1995

The Data Scientist II will advance Conversational AI by applying research and experimentation with Large Language Models (LLMs) to real-world customer interactions, focusing on designing effective prompting and orchestration strategies. This role involves analyzing large-scale conversational datasets to improve RAG systems and building robust experimentation frameworks for evaluating non-deterministic model outputs.

United States
$120K - $145K / year

Pipeline - Collibra Data Governance Specialist

FormativGroup

FormativGroup operates within the critical middle layer of business technology, where applications and systems connect infrastructure to business processes. We are specialists who help the middle market take full advantage of their technology investments with deep, industry-centric expertise, all in one place, to unify fragmented systems. With deep technical expertise across cloud architecture, system integration, AI, and data strategy, we bridge the gap between business goals and modern platforms. FormativGroup is an equal opportunity employer providing opportunities to applicants and employees without regard to race, color, religion, age, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status. ADA Specifications: Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions of this position.

Data Scientist7 days ago
Full TimeRemoteTeam 51-200

The specialist will implement, administer, and enhance enterprise data governance using Collibra Data Intelligence Cloud, focusing on configuring the platform, customizing operating models, and managing integrations. Key duties also involve defining governance policies, supporting data stewardship, and ensuring governance practices meet compliance requirements.

United States
$120K - $150K / year