WorkOS
Your app, Enterprise Ready.
Site Reliability Engineer
Location
United States
Posted
15 days ago
Salary
$175K - $275K / year
Bachelor DegreeEnglishAWSCloudGrafanaKubernetesPrometheusType Script
Job Description
• Design and evolve the systems, tooling, and processes that improve the reliability and performance of WorkOS
• Collaborate with product and infrastructure teams to ensure services are production-ready, observable, and resilient to failure
• Define and measure SLIs/SLOs to guide reliability improvements
• Write and optimize backend systems (in TypeScript) with a focus on performance, maintainability, and graceful degradation
• Improve our incident response process, lead postmortems, and drive follow-through on reliability risks
• Develop internal tools and automations that make it easier to operate and scale our systems
• Participate in our on-call rotation—responding to, resolving, and learning from production incidents
• Contribute to design and architecture discussions with a focus on operability and long-term sustainability
• Document systems, share learnings, and help grow a reliability-minded engineering culture
Job Requirements
- Experience operating and scaling production systems in cloud environments (we use AWS)
- Familiarity with service reliability concepts—monitoring, alerting, incident response, and root cause analysis
- Comfort working across infrastructure layers (e.g. compute, networking, storage, observability tooling)
- Strong debugging and systems thinking skills—you can follow problems across services and layers
- Ability to work independently, take ownership, and drive projects from problem discovery through resolution
- Nice to have*
- Familiarity with Kubernetes or similar orchestration systems
- Exposure to observability stacks (e.g. Prometheus, Grafana, Datadog, OpenTelemetry)
- Exposure to TypeScript or interest in working in a TypeScript-based codebase
Benefits
- Competitive pay
- Substantial equity grants
- Healthcare insurance (Medical, Dental and Vision) for you and your family
- 401k matching
- Wellness and fitness monthly allowances
- PTO + paid holidays + unlimited sick leave
- Autonomy and flexibility with remote work
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Network DevOps Engineer, RDMA Fabric Automation
VultrVultr is on a mission to make high-performance cloud computing easy to use, affordable, and locally accessible.
DevOps Engineer15 days ago
Full TimeRemoteTeam 201-500Since 2014
NetDevOps Engineer for RDMA Fabric Automation at Vultr
AnsibleGrafanaJenkinsKafkaLinuxPHPPrometheusPythonRustGo
Senior Site Reliability Engineer, Core Cloud Engineering
VultrVultr is on a mission to make high-performance cloud computing easy to use, affordable, and locally accessible.
DevOps Engineer15 days ago
Full TimeRemoteTeam 201-500Since 2014
Senior Site Reliability Engineer ensuring performance and reliability at a cloud infrastructure company
Distributed SystemsGrafanaLinuxMySQLPHPPuppet
DevOps Engineer
VultrVultr is on a mission to make high-performance cloud computing easy to use, affordable, and locally accessible.
DevOps Engineer15 days ago
Full TimeRemoteTeam 201-500Since 2014
Mid-Level DevOps Engineer handling cloud-native systems at Vultr
AnsibleCloudDistributed SystemsHAProxyKubernetesLinuxNGINXTerraformGo
DevOps Engineer15 days ago
Full TimeRemoteTeam 51-200
Site Reliability Engineer ensuring performance of Crunchafi’s cloud-based SaaS platform
AzureCloudDNSDockerKubernetesPythonSQLTerraformGo
Wisconsin