AI Video Research Engineer Intern

Computer Vision EngineerMachine Learning EngineerFull TimeRemoteTeam 201-500

Location

United States + 144 moreAll locations: United States, Canada, Brazil, Colombia, Argentina, Chile, Venezuela, Bolivarian Republic Of, Bolivia, Plurinational State Of, Ecuador, French Guiana, Guyana, Paraguay, Peru, Suriname, Uruguay, Mexico, Costa Rica, El Salvador, Guatemala, Honduras, Nicaragua, Panama, Dominican Republic, Puerto Rico, Bahamas, Guadeloupe, Haiti, Jamaica, Martinique, Montserrat, United Kingdom, Germany, France, Estonia, Portugal, Hungary, Poland, Ukraine, Romania, Bulgaria, Czech Republic, Slovakia, Belarus, Moldova, Republic Of, Sweden, Greece, Belgium, Italy, Ireland, Switzerland, Netherlands, Finland, Malta, Denmark, Lithuania, Croatia, Spain, Austria, Bosnia And Herzegovina, Iceland, Luxembourg, Macedonia, The Former Yugoslav Republic Of, Montenegro, Norway, Serbia, Slovenia, Albania, Cyprus, Latvia, Monaco, South Africa, Egypt, Algeria, Angola, Benin, Botswana, Burkina Faso, Burundi, Cameroon, Cape Verde, Central African Republic, Chad, Congo, Côte D'ivoire, Congo, The Democratic Republic Of The, Equatorial Guinea, Eritrea, Ethiopia, Gabon, Gambia, Ghana, Guinea, Guinea-bissau, Kenya, Lesotho, Liberia, Libyan Arab Jamahiriya, Madagascar, Malawi, Mali, Mauritania, Mauritius, Mayotte, Morocco, Mozambique, Namibia, Niger, Nigeria, Réunion, Rwanda, Senegal, Seychelles, Sierra Leone, Somalia, Sudan, Swaziland, Tanzania, United Republic Of, Togo, Tunisia, Uganda, Zambia, Zimbabwe, Georgia, Turkey, Israel, United Arab Emirates, Armenia, Azerbaijan, Bahrain, Iraq, Jordan, Kuwait, Lebanon, Oman, Qatar, Saudi Arabia, Palestinian Territory, Occupied, Yemen

Posted

4 days ago

Salary

Not specified

Py TorchDeep LearningComputer VisionMachine LearningVideo ProcessingPythonDistributed TrainingMultimodal LearningGenerative ModelingLarge Scale Datasets

Job Description

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

We are seeking highly motivated MSc or PhD interns to work on video generation and multimodal video foundation models. Interns will focus on one or more components of the foundation model lifecycle and are encouraged to propose creative, research-driven ideas that advance the state of the art.

  • Contribute to the development and improvement of open-source video foundation models.
  • Analyze limitations and design scalable solutions.
  • This is a research-focused internship with opportunities to publish at top-tier computer vision and machine learning conferences.
  • Work with petabyte-scale video datasets and large distributed GPU clusters with thousands of GPUs.

Responsibilities

  • Research and improve open-source video and multimodal video generation foundation models.
  • Focus on one or more areas such as:
    • Pre-training
    • Supervised fine-tuning
    • Post-training
    • Inference
    • Architecture design
    • Evaluation
  • Benchmark models against current state-of-the-art, identify bottlenecks, and propose novel improvements.
  • Work with large-scale video datasets and distributed training systems.
  • Collaborate with researchers and engineers on projects with clear research and publication potential.

Qualifications

  • MSc or PhD candidate in Computer Science, Machine Learning, Computer Vision, or a related technical field.
  • Research topic or experience in image generation, video generation, or multimodal learning.
  • Awareness of open-source video foundation models and their current limitations.
  • Proficiency with PyTorch and modern deep learning workflows.
  • Strong analytical thinking, creativity, and collaboration skills.
  • Prior first-author related publications in CVPR, ICCV, ECCV, NeurIPS, or ICLR.

Preferred Qualifications

  • Demonstrated related work, such as research codebase or benchmarks released on GitHub or similar platforms.
  • Experience with large-scale or distributed training.
  • Hands-on experience with diffusion-based, transformer-based, or hybrid video generation models.

Important Information for Candidates

  • Apply only through our official channels.
  • We do not use third-party platforms or agencies for recruitment unless clearly stated.
  • Verify the recruiter’s identity through verified LinkedIn profiles.
  • Be cautious of unusual communication methods; we do not conduct interviews over WhatsApp, Telegram, or SMS.
  • Double-check email addresses; all communication will come from emails ending in @tether.to or @tether.io.
  • We will never request payment or financial details.

Team

AI Research

Job Requirements

  • MSc or PhD candidate in Computer Science, Machine Learning, Computer Vision, or a related technical field.
  • Research topic or experience in image generation, video generation, or multimodal learning.
  • Awareness of open-source video foundation models and their current limitations.
  • Proficiency with PyTorch and modern deep learning workflows.
  • Strong analytical thinking, creativity, and collaboration skills.
  • Prior first-author related publications in CVPR, ICCV, ECCV, NeurIPS, or ICLR.
  • Preferred Qualifications
  • Demonstrated related work, such as research codebase or benchmarks released on GitHub or similar platforms.
  • Experience with large-scale or distributed training.
  • Hands-on experience with diffusion-based, transformer-based, or hybrid video generation models.
  • Important Information for Candidates
  • Apply only through our official channels.
  • We do not use third-party platforms or agencies for recruitment unless clearly stated.
  • Verify the recruiter’s identity through verified LinkedIn profiles.
  • Be cautious of unusual communication methods; we do not conduct interviews over WhatsApp, Telegram, or SMS.
  • Double-check email addresses; all communication will come from emails ending in @tether.to or @tether.io.
  • We will never request payment or financial details.
  • Team
  • AI Research

Related Job Pages

More Computer Vision Engineer Jobs

Full TimeRemoteTeam 5,001-10,000

This role involves assisting origination staff with condominium document requests and processing. Communicate clearly and effectively with customers, loan agents, and condo approval team. Prepare the file by reviewing condominium document requests and contacting HOA/property mana...

United States
Computer Vision Engineer11 days ago
Full TimeRemoteTeam 11-50Since 2018H1B Sponsor

We are looking for a Principal ML Scientist to advance the state of our computer vision systems for warehouse inventory scanning. You will work across the full ML lifecycle — from research and model architecture through training, deployment, and production monitoring — with a...

machine learningcomputer visionobject detectionimage segmentationOCRPyTorchTensorFlowPythonOpenCVGPU computingONNXTensorRTquantizationCNN
United States
Full TimeRemoteTeam 10,001+Since 1993H1B Sponsor

Deep Learning Engineer for NVIDIA's Autonomous Vehicles team

PythonPyTorchTensorflow
California + 1 moreAll locations: California, Washington
$184K - $287.5K / year
Computer Vision Engineer13 days ago
Part TimeRemote

Special Instructions Dear Applicant, The South Texas College Office of Human Resources will not be held responsible for redacting any confidential or sensitive information from the documents that you attach to your application. Confidential and sensitive information include the f...

United States