Cornelis Networks

The Future of High Performance Fabrics

Principal Software Engineer – AI/HPC Middleware

Full TimeRemoteTeam 51-200H1B SponsorCompany SiteLinkedIn

Location

Texas

Posted

7 days ago

Salary

Not specified

12 yrs expEnglishLinux

Job Description

• Lead design and implementation enabling and optimizing HPC middleware (MPI and SHMEM) and AI middleware CCL stacks (e.g., NCCL/RCCL and related collective communication libraries) • Deliver performance-critical communication paths including low-latency small and medium message transfers, bulk SDMA data movement, GPU-Direct and IPC communication, and collective acceleration • Design and tune collective communication algorithms (latency-optimized and bandwidth-optimized), including GPU-aware collectives • Integrate middleware with underlying transports and provider layers such as libfabric/OFI, UCX, and verbs-style interfaces to achieve performance, portability, and maintainability • Implement and optimize memory registration strategies, progress and execution models, completion semantics, multi-rail communication behavior, and GPU memory handling • Drive upstream contributions across MPI/SHMEM projects, CCL ecosystems, and related components with a focus on upstreamable design and long-term maintainability • Represent Cornelis Networks in open-source communities through technical reviews, design discussions, and sustained technical leadership • Implement and prototype Ultra Ethernet capabilities supporting MPI/SHMEM and AI collective communication use cases • Collaborate with ecosystem partners to validate deployment models and performance scaling on customer-relevant configurations • Work closely with kernel, driver, and switch teams to deliver end-to-end performance aligned with the Cornelis product roadmap • Participate in architecture reviews, performance tuning, scaling validation, and multi-layer root-cause investigations • Analyze performance traces and triage advanced customer issues, translating findings into robust fixes and upstream improvements • Publish internal and external best practices, including tuning guidance, reference configurations, and debugging methodologies • Mentor senior engineers and promote best practices for design, testing, documentation, and code quality • Help define the long-term middleware technical roadmap aligned with product evolution and customer needs

Job Requirements

  • 12+ years of experience in high-performance systems programming in C/C++ on Linux
  • Hands-on experience with MPI internals (Open MPI, MPICH, MVAPICH) and/or SHMEM implementations
  • Experience implementing or optimizing collective communications for HPC and/or AI workloads, including NCCL/RCCL (CUDA/ROCm) or related CCL stacks
  • Demonstrated ability to design low-latency/high-throughput communication paths and diagnose performance issues using profiling and tracing tools
  • Working knowledge of transport and integration layers such as OFI/libfabric, UCX, and verbs-style networking concepts
  • Strong understanding of RDMA and performance tuning
  • Proven open-source contribution track record
  • Demonstrated technical leadership in complex HPC or AI system software.

Benefits

  • Health and retirement benefits
  • Generous paid holidays
  • 401(k) with company match
  • Open Time Off (OTO) for regular full-time exempt employees
  • Paid time off benefits including sick time, bonding leave, and pregnancy disability leave

Related Job Pages