Akuity logo
Akuity

Remove complexity, add velocity.

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 11-50Since 2021H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

2 days ago

Salary

Not specified

Seniority

Senior

Bachelor Degree5 yrs expEnglishAWSEC2GrafanaKubernetesPrometheusPythonGo

Job Description

• Own SLI/SLO/SLA definitions for the Akuity SaaS platform and drive continuous improvement against them • Design, instrument, and maintain observability systems (metrics, logs, traces) across multi-region AWS infrastructure • Identify reliability gaps, lead blameless post-mortems, and close the loop with permanent fixes • Partner with engineering teams to build reliability into new features before they ship to production • Participate in an on-call rotation and act as incident commander for high-severity production events • Build and maintain runbooks, escalation paths, and incident playbooks that keep mean time to resolution low • Drive improvements to alerting fidelity; reduce noise, increase signal, eliminate toil • Lead post-incident reviews with clear timelines, root cause analysis, and follow-through on action items

Job Requirements

  • 5+ years of SRE, platform engineering, or production operations experience in a SaaS environment
  • Deep hands-on Kubernetes expertise; you understand the scheduler, networking, storage, and autoscaling at a level where you can debug anything
  • Strong AWS fundamentals across compute (EC2, EKS), networking (VPC, NLB, Route53), storage (S3, RDS), and IAM
  • Experience defining and operating against SLOs in production; you've written error budgets, not just read about them
  • Proficiency with observability tooling (Prometheus, Grafana, OpenTelemetry, Datadog, or equivalent)
  • Solid scripting and automation skills; Go, Python, Bash, or similar; you automate what you touch
  • Strong written communication: clear runbooks, sharp incident reports, thoughtful post-mortems
  • Live within US time zones (Pacific through Eastern), including Canada and other regions

Benefits

  • Health insurance, dental, and vision coverage
  • Equity participation in a well-funded, growing company
  • Home office stipend and equipment budget
  • Flexible time off and a culture that respects it
  • Work directly with the engineers who built Argo CD and Kargo; you'll learn a lot here

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Gifthealth logo

Lead Site Reliability Engineer

Gifthealth

Seamlessly unifying access, fulfillment, and support for faster, simpler digital pharmacy care.

DevOps Engineer2 days ago
Full TimeRemoteTeam 501-1,000Since 2020

Lead Site Reliability Engineer at Gifthealth managing DevOps practices

United States
$123K - $154K / year
Akuity logo

Senior Site Reliability Engineer

Akuity

Remove complexity, add velocity.

DevOps Engineer2 days ago
Full TimeRemoteTeam 11-50Since 2021H1B No Sponsor

The Senior SRE will be responsible for owning platform reliability, defining and driving improvements against SLIs/SLOs/SLAs for the Company SaaS platform, and designing/maintaining observability systems across AWS infrastructure. This role also involves participating in on-call rotations, acting as incident commander for high-severity events, and driving improvements to alerting fidelity.

United States
CAKE.com logo

Site Reliability Engineer, SRE

CAKE.com

Deliciously simple way to run a business and empower your team 💫

DevOps Engineer2 days ago
Full TimeRemoteTeam 201-500Since 2009H1B No Sponsor

SRE managing scalable infrastructure for CAKE.com

United States