Software Engineer, DevOps
Location
United States
Posted
3 days ago
Salary
Not specified
Job Description
Role Description
This position is on the DevOps team, supporting the MNTN platform and Engineers. The right person will not only have a deep knowledge of system administration and GCP, but will also be able to work with a variety of Developers. You will work closely with our Engineering team and help meet the service's requirements.
- Design, provision, and manage infrastructure primarily on Google Cloud (GKE) and other cloud environments, with Kubernetes as the core platform abstraction.
- Work directly with engineering counterparts on the design and implementation of cloud-based infrastructure solutions.
- Build platform capabilities, internal developer portal (IDP) components, and self-service tooling that reduces friction for engineers and standardizes how services run in production.
- Design and build automations and infrastructure tooling with GitOps as a fundamental principle, integrating with APIs, cloud SDKs, and Kubernetes components.
- Create and manage CI/CD pipelines (e.g., ArgoCD, Argo Rollouts, Argo Workflows, GitHub Actions), ensuring safe, observable, and repeatable deployments across environments.
- Partner with engineering teams to define scalable infrastructure patterns, reusable modules, and cost-efficient solutions based on usage and growth requirements.
- Monitor infrastructure spend and contribute to cost analysis, optimization, and FinOps practices to improve cloud efficiency.
- Improve observability, alerting, SLOs, and runbooks to strengthen reliability and incident response.
- Implement cloud security and compliance best practices, including IAM design, workload identity, secrets management, and container security standards.
Qualifications
- 5+ years in DevOps, SRE, or Platform Engineering, with strong production experience in GCP and Kubernetes (GKE).
- Experience with additional cloud providers is a plus.
- Strong automation skills, programming and scripting in Python, including working with APIs, SDKs, and CLIs.
- Proven experience operating Kubernetes in production, including Helm, ArgoCD, operators, Kustomize, networking, autoscaling, and RBAC.
- Deep expertise in Terraform / OpenTofu and infrastructure-as-code best practices; experience designing reusable modules and managing multi-environment deployments. Terragrunt is a plus.
- Hands-on experience supporting hybrid or multi-cloud deployments where appropriate.
- Solid understanding of cost modeling, tagging strategies, and cloud cost optimization techniques (FinOps awareness a plus).
- Familiarity with microservices architectures, service discovery, and containerization workflows.
- A platform mindset: you’ve built tools, abstractions, or paved roads that improve developer velocity and system reliability.
- Strong communication skills and a habit of clear, useful documentation.
Benefits
- 100% remote within the US
- Flexible vacation policy
- Annual vacation allowance for travel related expenses
- Three-day weekend every month of the year
- Competitive compensation
- 100% healthcare coverage
- 401k plan
- Flexible Spending Account (FSA) for dependent, medical, and dental care
- Access to coaching, therapy, and professional development
Job Requirements
- 5+ years in DevOps, SRE, or Platform Engineering, with strong production experience in GCP and Kubernetes (GKE).
- Experience with additional cloud providers is a plus.
- Strong automation skills, programming and scripting in Python, including working with APIs, SDKs, and CLIs.
- Proven experience operating Kubernetes in production, including Helm, ArgoCD, operators, Kustomize, networking, autoscaling, and RBAC.
- Deep expertise in Terraform / OpenTofu and infrastructure-as-code best practices; experience designing reusable modules and managing multi-environment deployments. Terragrunt is a plus.
- Hands-on experience supporting hybrid or multi-cloud deployments where appropriate.
- Solid understanding of cost modeling, tagging strategies, and cloud cost optimization techniques (FinOps awareness a plus).
- Familiarity with microservices architectures, service discovery, and containerization workflows.
- A platform mindset: you’ve built tools, abstractions, or paved roads that improve developer velocity and system reliability.
- Strong communication skills and a habit of clear, useful documentation.
Benefits
- 100% remote within the US
- Flexible vacation policy
- Annual vacation allowance for travel related expenses
- Three-day weekend every month of the year
- Competitive compensation
- 100% healthcare coverage
- 401k plan
- Flexible Spending Account (FSA) for dependent, medical, and dental care
- Access to coaching, therapy, and professional development
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Site Reliability Engineer 5 – Core
NetflixWhere you come to do the best work of your life. Follow @WeAreNetflix on Twitter, IG, Facebook, & Youtube for more
Site Reliability Engineer ensuring reliable infrastructure for Netflix Streaming Suite
Site Reliability Engineer 5, Ads SRE
NetflixWhere you come to do the best work of your life. Follow @WeAreNetflix on Twitter, IG, Facebook, & Youtube for more
Site Reliability Engineer ensuring reliability of Netflix Ads Suite
Senior Site Reliability Engineer
AkamaiAkamai powers and protects life online. Leading companies worldwide choose Akamai to build, deliver, and secure their digital experiences helping billions of people live, work, and play every day. With the world's most distributed compute platform from cloud to edge we make it easy for customers to develop and run applications, while we keep experiences closer to users and threats farther away. Join us Are you seeking an opportunity to make a real difference in a company with a global reach and exciting services and clients? Come join us and grow with a team of people who will energize and inspire you!
The Senior Site Reliability Engineer will focus on improving the performance, availability, and scalability of large distributed content delivery systems using Internet technologies. Responsibilities include collaborating on defining SLIs/SLOs, providing technical expertise in design reviews, and developing automation solutions to enhance operational efficiency.
Senior Site Reliability Engineer
AkamaiAkamai powers and protects life online. Leading companies worldwide choose Akamai to build, deliver, and secure their digital experiences helping billions of people live, work, and play every day. With the world's most distributed compute platform from cloud to edge we make it easy for customers to develop and run applications, while we keep experiences closer to users and threats farther away. Join us Are you seeking an opportunity to make a real difference in a company with a global reach and exciting services and clients? Come join us and grow with a team of people who will energize and inspire you!
Do you enjoy collaborating with teams to solve complex challenges? Do you enjoy solving large scale distributed content delivery challenges? Join our critical Platform and Reliability Engineering Team! The Platform & Reliability Engineering team is responsible for defining, measu...