Secure your enterprise with the autonomous cybersecurity platform. Endpoint. Cloud. Identity. XDR. Now.

Staff AI Infrastructure Engineer

Full TimeRemoteTeam 1,001-5,000Since 2013H1B SponsorCompany Site LinkedIn

Location

United States

Posted

7 days ago

Salary

$170.2K - $234.6K / year

Bachelor Degree7 yrs expEnglishAWSAzureCloudCyber SecurityGoogle Cloud PlatformGrafanaJenkinsKubernetesPrometheusPythonTerraform

Job Description

• Architect, build, and maintain scalable infrastructure to host and serve AI products and models reliably. • Automate infrastructure deployment and management using Helm, ArgoCD and Terraform. • Manage and optimize Kubernetes clusters to support high-performance AI workloads. • Implement and manage CI/CD pipelines utilizing GitHub Actions and Jenkins. • Ensure infrastructure compliance with security standards including FedRAMP and related guidelines. • Collaborate closely with AI engineering, product teams, and DevOps to meet infrastructure requirements. • Monitor infrastructure health and performance, implementing optimizations proactively. • Drive infrastructure best practices and mentor team members to foster technical excellence.

Job Requirements

A degree in Computer Science, Information Technology, or related field, or equivalent practical experience.
7+ years of experience managing scalable, secure, and resilient infrastructure for AI and machine learning applications.
Deep proficiency with infrastructure-as-code tools like Helm, Terraform and ArgoCD.
Extensive hands-on experience with Kubernetes for deploying containerized workloads.
Demonstrated experience with major cloud platforms (AWS, GCP, Azure), specifically with services related to AI model hosting (e.g., Azure OpenAI).
Experience implementing and managing CI/CD pipelines (GitHub Actions, Jenkins).
Familiarity with compliance frameworks, particularly FedRAMP, and security best practices.
Strong scripting and automation skills using Python, Bash, or similar languages.
Excellent problem-solving skills, creativity, and self-driven motivation.
Previous experience as a Site Reliability Engineer (SRE), particularly in AI or ML contexts.
Monitoring and logging tools (Prometheus, Grafana, Datadog, Jaeger).
Networking concepts and security best practices within cloud infrastructure.
Professional certifications in Kubernetes or cloud platforms (AWS, Azure, GCP).

Benefits

Medical, Vision, Dental, 401(k), Commuter, Health and Dependent FSA
Unlimited PTO
Industry-leading gender-neutral parental leave
Paid Company Holidays
Paid Sick Time
Employee stock purchase program
Disability and life insurance
Employee assistance program
Gym membership reimbursement
Cell phone reimbursement
Numerous company-sponsored events, including regular happy hours and team-building events

Related Categories

Infrastructure Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More US Remote Jobs