Cyberhaven

We protect important data other tools can’t see, from threats they can’t detect, across technologies they can’t control.

Senior Director, SRE – Cloud Infrastructure

DevOps EngineerDevOps EngineerFull TimeRemoteTeam 51-200H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

28 days ago

Salary

$250K - $300K / year

EnglishAWSCloudDistributed SystemsGoogle Cloud PlatformKubernetesTerraform

Job Description

• Lead, grow, and mentor high-performing globally distributed SRE and Infrastructure teams, including managers and senior ICs • Own the reliability, availability, scalability, and performance of our production and developer platforms • Define and execute the SRE and infrastructure strategy, including cloud architecture, Kubernetes platforms, CI/CD, and automation • Drive horizontal scaling and enable teams to operate independently, through decoupling and modularization of both architecture and processes • Drive infrastructure cost (COGS) optimization, capacity planning, and cloud financial management in close partnership with Finance and Engineering leadership • Establish and evolve SLOs, SLIs, error budgets, and operational best practices across the organization • Oversee incident management, postmortems, and continuous improvement, ensuring a strong culture of learning and ownership • Collaborate closely with security to ensure our infrastructure is secure, compliant, and resilient by design • Contribute to and uphold strong documentation, operational standards, and knowledge sharing across teams

Job Requirements

  • Led SRE and Infrastructure organizations at high-growth SaaS, platform, or security companies
  • Strong technical leader with deep experience in cloud-native systems and a strong SRE mindset
  • Strong background in Kubernetes, cloud platforms (GCP and/or AWS), and infrastructure as code (Terraform or equivalent)
  • Designed or operated large-scale distributed systems, real-time data pipelines, or high-throughput platforms
  • Experience owning COGS, cloud spend, and efficiency metrics, communicating tradeoffs to executives
  • Comfortable operating at multiple levels: strategic planning, architectural reviews, and deep technical problem solving
  • Use data and metrics to drive reliability, performance, cost optimization, and team productivity
  • Proven track record of scaling teams and systems while maintaining high reliability and velocity
  • Empathetic leader fostering inclusion, ownership, accountability, and psychological safety
  • Thrives in fast-moving environments, comfortable navigating ambiguity and change

Benefits

  • Offers Equity
  • Offers Bonus

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior DevOps Engineer

NovoPayment

Next Level Digital Financial Services Simplified.

DevOps Engineer29 days ago
ContractRemoteTeam 201-500Since 2007H1B No Sponsor

Senior DevOps Engineer designing and maintaining cloud infrastructure and automation frameworks

AWSAzureCloudDockerGoogle Cloud PlatformGrafanaJenkinsKubernetesLinuxMicroservicesPrometheusPythonTerraformGo
United States
DevOps Engineer29 days ago
Full TimeRemoteTeam 380Since 2015

The Staff Site Reliability Engineer will design and manage AWS infrastructure, optimize Kubernetes operations, automate workflows, and troubleshoot systems for improved reliability and performance.

AWSCI/CDDatadogDockerEksGithub ActionsGoKafkaKubernetesNginxPrivatelinkPythonTerraformTransit GatewayVpc
United States
$149.2K - $222.0K / year
Full TimeRemoteTeam 10,001+Since 1916H1B Sponsor

Software Engineer maintaining CI/CD environment for software systems at Boeing

AWSAzureCloudJavaLinuxPython
United States
$104.6K - $197.8K / year

Lead Site Reliability Engineer

Intellum

We help large brands and fast-moving companies increase revenue and decrease support costs through education.

DevOps Engineer30 days ago
Full TimeRemoteTeam 51-200Since 2016H1B Sponsor

The Lead Software Engineer will lead the SRE team, focusing on reliability, performance optimization, security, and mentoring developers, while improving overall platform resilience.

ActivejobAnsibleAWSAws CloudwatchEc2EcsElasticsearchGitGCPGoogle Cloud StackdriverJenkinsJIRAKubernetesMemcachedMongoDBNew RelicNode.jsPostgreSQLRedisRuby On RailsSidekiqSpinnakerTerraformTerragrunt
United States