Building foundational AI for speech transcription and understanding.
Site Reliability Engineer – AI & ML Infrastructure, Kubernetes, AWS, Terraform
Location
United States
Posted
6 days ago
Salary
$150K - $220K / year
Job Description
Job Requirements
- 5+ years of experience in Platform Engineering, DevOps, or Site Reliability Engineering (SRE)
- Proven, hands-on experience building and managing production infrastructure with Terraform
- Expert-level knowledge of Kubernetes architecture and operations in a large-scale environment
- Experience with high-performance compute (HPC) job schedulers, specifically Slurm, for managing GPU-intensive AI workloads
- Experience managing bare metal infrastructure, including server provisioning (e.g., PXE boot, MAAS), configuration, and lifecycle management
- Strong scripting and automation skills (e.g., Python, Go, Bash)
Benefits
- Medical, dental, vision benefits
- Annual wellness stipend
- Mental health support
- Life, STD, LTD Income Insurance Plans
- Unlimited PTO
- Generous paid parental leave
- Flexible schedule
- 12 Paid US company holidays
- Quarterly personal productivity stipend
- One-time stipend for home office upgrades
- 401(k) plan with company match
- Tax Savings Programs
- Learning / Education stipend
- Participation in talks and conferences
- Employee Resource Groups
- AI enablement workshops / sessions
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Release Engineer managing Dynamics 365 CI/CD practices and deployments
Senior DevOps Engineer managing Azure cloud environments at EverOps
Senior Staff Site Reliability Engineer
JobgetherWe use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team. We appreciate your interest and wish you the best! Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time. #LI-CL1 We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
This role is pivotal in ensuring the reliability, scalability, and performance of cloud-based enterprise software. As a Senior Staff Site Reliability Engineer, you will: Design, deploy, and maintain robust infrastructure for mission-critical services Collaborate closely with deve...
DevOps Engineer III
ModivcareTo bring equity, hope and healing to those who need it most. To make a world of difference, one member at a time.
DevOps Engineer optimizing software development processes at Modivcare