Lavendo
Sales Recruiting for Startups
HPC Solutions Architect
Location
California
Posted
28 days ago
Salary
$225K - $315K / year
Bachelor Degree3 yrs expEnglishAnsibleCloudDockerKubernetesLinuxNFSPythonTerraform
Job Description
• Architect and implement HPC clusters for AI, simulation, and distributed training using Kubernetes and schedulers like Slurm.
• Integrate NVIDIA Hopper and Blackwell‑class GPUs with NVLink/NVSwitch and InfiniBand/RoCE.
• Deploy and manage GPU Operator and Network Operator for large fleets.
• Design and validate cloud‑native HPC environments with low latency and high bandwidth.
• Define and document reference architectures for AI model training and MLOps.
• Collaborate with NVIDIA and other partners to evaluate new GPU generations and software stacks.
• Benchmark performance, track down bottlenecks, and recommend concrete changes.
• Lead design sessions and architecture reviews with customers focused on performance and reliability.
Job Requirements
- A Bachelor’s or Master’s in Computer Science, Engineering, or a related field (PhD is a plus).
- 3+ years actually building or running HPC or large GPU clusters—on‑prem, cloud, or hybrid.
- Strong Linux background, plus Kubernetes and container runtimes (containerd, CRI‑O, Docker) in real environments, with CI/CD in the loop.
- A solid handle on HPC networking and RDMA: InfiniBand, RoCE, NVLink/NVSwitch.
- Experience with storage and I/O for big workloads: Ceph, Lustre, NFS at scale, GPUDirect Storage, or similar systems.
- Comfort with Terraform, Ansible, Helm, and GitOps‑style workflows.
- Good scripting skills in Python or Bash.
- You write and speak clearly, can lead a design review without losing the room, and can keep both engineers and non‑technical stakeholders on the same page.
- Legal authorization to work in the U.S. on a full-time basis without visa sponsorship.
Benefits
- 100% employer‑paid medical, dental, and vision for you and your family
- 4% 401(k) match with immediate vesting
- Company‑paid short‑ and long‑term disability and life insurance
- 20 weeks paid parental leave for primary caregivers, 12 weeks for secondary
- Support for your home office (mobile + internet stipend)
Related Guides
Related Categories
Related Job Pages
More Solutions Engineer Jobs
Solutions Consultant – Dental HaaS/SaaS
Henry Schein OneDentrix Enterprise. Dentrix. Dentrix Ascend. Jarvis Analytics. Lighthouse 360.
Solutions Engineer28 days ago
Full TimeRemoteTeam 1,001-5,000Since 2018
Solutions Consultant at Henry Schein ONE providing IT solutions for dental practices
Cloud
Alaska + 11 moreAll locations: Alaska, District of Columbia, Hawaii, Louisiana, Nebraska, North Dakota, Rhode Island, South Dakota, Vermont, Virginia, Washington, West Virginia
$52K - $65K / year
Solutions Engineer28 days ago
Full TimeRemoteTeam 10,001+Since 2015H1B Sponsor
AI Solution Engineer architecting AI solutions for HPE in customer engagements
KubernetesOpen SourcePythonPyTorchUnix
California + 4 moreAll locations: California, New York, North Carolina, Oregon, Texas
$172K - $349K / year
Principal Solutions Architect
solo.ioSolo.io connects the world's applications with APIs and service mesh across any infrastructure.
Solutions Engineer28 days ago
Full TimeRemoteTeam 51-200Since 2017H1B Sponsor
Principal Solutions Architect ensuring technical success for key customer accounts at Solo.io
CloudKubernetesPythonGo
United States
Solutions Engineer28 days ago
Full TimeRemoteTeam 11-50H1B No Sponsor
SAP Solution Architect leading Finance architecture at a global pharmaceutical company
AzureOracle