SentinelOne
Secure your enterprise with the autonomous cybersecurity platform. Endpoint. Cloud. Identity. XDR. Now.
Staff AI Infrastructure Engineer
Location
United States
Posted
7 days ago
Salary
$170.2K - $234.6K / year
Bachelor Degree7 yrs expEnglishAWSAzureCloudCyber SecurityGoogle Cloud PlatformGrafanaJenkinsKubernetesPrometheusPythonTerraform
Job Description
• Architect, build, and maintain scalable infrastructure to host and serve AI products and models reliably.
• Automate infrastructure deployment and management using Helm, ArgoCD and Terraform.
• Manage and optimize Kubernetes clusters to support high-performance AI workloads.
• Implement and manage CI/CD pipelines utilizing GitHub Actions and Jenkins.
• Ensure infrastructure compliance with security standards including FedRAMP and related guidelines.
• Collaborate closely with AI engineering, product teams, and DevOps to meet infrastructure requirements.
• Monitor infrastructure health and performance, implementing optimizations proactively.
• Drive infrastructure best practices and mentor team members to foster technical excellence.
Job Requirements
- A degree in Computer Science, Information Technology, or related field, or equivalent practical experience.
- 7+ years of experience managing scalable, secure, and resilient infrastructure for AI and machine learning applications.
- Deep proficiency with infrastructure-as-code tools like Helm, Terraform and ArgoCD.
- Extensive hands-on experience with Kubernetes for deploying containerized workloads.
- Demonstrated experience with major cloud platforms (AWS, GCP, Azure), specifically with services related to AI model hosting (e.g., Azure OpenAI).
- Experience implementing and managing CI/CD pipelines (GitHub Actions, Jenkins).
- Familiarity with compliance frameworks, particularly FedRAMP, and security best practices.
- Strong scripting and automation skills using Python, Bash, or similar languages.
- Excellent problem-solving skills, creativity, and self-driven motivation.
- Previous experience as a Site Reliability Engineer (SRE), particularly in AI or ML contexts.
- Monitoring and logging tools (Prometheus, Grafana, Datadog, Jaeger).
- Networking concepts and security best practices within cloud infrastructure.
- Professional certifications in Kubernetes or cloud platforms (AWS, Azure, GCP).
Benefits
- Medical, Vision, Dental, 401(k), Commuter, Health and Dependent FSA
- Unlimited PTO
- Industry-leading gender-neutral parental leave
- Paid Company Holidays
- Paid Sick Time
- Employee stock purchase program
- Disability and life insurance
- Employee assistance program
- Gym membership reimbursement
- Cell phone reimbursement
- Numerous company-sponsored events, including regular happy hours and team-building events