SentinelOne

Secure your enterprise with the autonomous cybersecurity platform. Endpoint. Cloud. Identity. XDR. Now.

Staff AI Infrastructure Engineer

Full TimeRemoteTeam 1,001-5,000Since 2013H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

7 days ago

Salary

$170.2K - $234.6K / year

Bachelor Degree7 yrs expEnglishAWSAzureCloudCyber SecurityGoogle Cloud PlatformGrafanaJenkinsKubernetesPrometheusPythonTerraform

Job Description

• Architect, build, and maintain scalable infrastructure to host and serve AI products and models reliably. • Automate infrastructure deployment and management using Helm, ArgoCD and Terraform. • Manage and optimize Kubernetes clusters to support high-performance AI workloads. • Implement and manage CI/CD pipelines utilizing GitHub Actions and Jenkins. • Ensure infrastructure compliance with security standards including FedRAMP and related guidelines. • Collaborate closely with AI engineering, product teams, and DevOps to meet infrastructure requirements. • Monitor infrastructure health and performance, implementing optimizations proactively. • Drive infrastructure best practices and mentor team members to foster technical excellence.

Job Requirements

  • A degree in Computer Science, Information Technology, or related field, or equivalent practical experience.
  • 7+ years of experience managing scalable, secure, and resilient infrastructure for AI and machine learning applications.
  • Deep proficiency with infrastructure-as-code tools like Helm, Terraform and ArgoCD.
  • Extensive hands-on experience with Kubernetes for deploying containerized workloads.
  • Demonstrated experience with major cloud platforms (AWS, GCP, Azure), specifically with services related to AI model hosting (e.g., Azure OpenAI).
  • Experience implementing and managing CI/CD pipelines (GitHub Actions, Jenkins).
  • Familiarity with compliance frameworks, particularly FedRAMP, and security best practices.
  • Strong scripting and automation skills using Python, Bash, or similar languages.
  • Excellent problem-solving skills, creativity, and self-driven motivation.
  • Previous experience as a Site Reliability Engineer (SRE), particularly in AI or ML contexts.
  • Monitoring and logging tools (Prometheus, Grafana, Datadog, Jaeger).
  • Networking concepts and security best practices within cloud infrastructure.
  • Professional certifications in Kubernetes or cloud platforms (AWS, Azure, GCP).

Benefits

  • Medical, Vision, Dental, 401(k), Commuter, Health and Dependent FSA
  • Unlimited PTO
  • Industry-leading gender-neutral parental leave
  • Paid Company Holidays
  • Paid Sick Time
  • Employee stock purchase program
  • Disability and life insurance
  • Employee assistance program
  • Gym membership reimbursement
  • Cell phone reimbursement
  • Numerous company-sponsored events, including regular happy hours and team-building events

Related Categories

Related Job Pages