NVIDIA

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Senior System Software Engineer, NCCL – Partner Enablement

Full-stack EngineerSoftware EngineerFull TimeRemoteTeam 10,001+Since 1993H1B SponsorCompany SiteLinkedIn

Location

California + 1 moreAll locations: California, Texas

Posted

55 days ago

Salary

$152K - $218.5K / year

Bachelor Degree5 yrs expEnglishAnsibleAWSAzureCloudDockerGoogle Cloud PlatformKubernetesLinuxNode.jsPython

Job Description

• Engage with our partners and customers to root cause functional and performance issues reported with NCCL • Conduct performance characterization and analysis of NCCL and DL applications on groundbreaking GPU clusters • Develop tools and automation to isolate issues on new systems and platforms, including cloud platforms (Azure, AWS, GCP, etc.) • Guide our customers and support teams on HPC knowledge and standard methodologies for running applications on multi-node clusters • Document and conduct trainings/webinars for NCCL • Engage with internal teams in different time zones on networking, GPUs, storage, infrastructure and support.

Job Requirements

  • B.S./M.S. degree in CS/CE or equivalent experience with 5+ years of relevant experience.
  • Experience with parallel programming and at least one communication runtime (MPI, NCCL, UCX, NVSHMEM)
  • Excellent C/C++ programming skills, including debugging, profiling, code optimization, performance analysis, and test design
  • Experience working with engineering or academic research community supporting HPC or AI
  • Practical experience with high performance networking: Infiniband/RoCE/Ethernet networks, RDMA, topologies, congestion control
  • Expert in Linux fundamentals and a scripting language, preferably Python
  • Familiar with containers, cloud provisioning and scheduling tools (Docker, Docker Swarm, Kubernetes, SLURM, Ansible)
  • Adaptability and passion to learn new areas and tools
  • Flexibility to work and communicate effectively across different teams and timezones

Benefits

  • Equity
  • Benefits

Related Job Pages

More Full-stack Engineer Jobs

Full Stack Engineer

Fieldwire by Hilti

The all-in-one jobsite management software for field to office communication.

Full-stack Engineer55 days ago
Full TimeRemoteTeam 51-200Since 2013H1B No Sponsor

Mid-Level Fullstack Engineer developing core features for construction management platform

AngularBootstrapRubyRuby on RailsRustSCSS
United States
$145K - $170K / year

Software Engineer – Support Experience

SeatGeek

Help the world experience more live.

Full-stack Engineer55 days ago
Full TimeRemoteTeam 501-1,000Since 2009H1B Sponsor

Software Engineer developing ticketing solutions at SeatGeek

United States
$121K - $175K / year

Software Engineer I, Fullstack, Risk Engineering

Flex

Flex splits your bills into smaller, stress-free payments throughout the month. Start today with your rent bill!

Full-stack Engineer55 days ago
Full TimeRemoteTeam 201-500Since 2019H1B Sponsor

Software Engineer I developing backend services and APIs for Flex's risk engineering systems

Distributed SystemsJavaReactReact NativeSpringSpring BootSpringBootSQLTypeScript
California + 2 moreAll locations: California, New Jersey, New York
$125K - $138K / year

Full-Stack Developer

HOLYWATER

We publish stories that inspire millions of people around the world

Full-stack Engineer55 days ago
Full TimeRemoteTeam 51-200Since 2020H1B No Sponsor

Full-Stack Developer at HOLYWATER creating AI-based entertainment products

AWSFirebaseGoogle Cloud PlatformJavaScriptNext.jsNode.jsReactTypeScript
United States