Site Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteTeam 11-50

Location

United States

Posted

3 days ago

Salary

$130K - $150K / year

LinuxShell ScriptingAWSAzureKubernetesDockerPythonJavaCi/cdTerraformTerragruntAnsibleDatadogMicroservicesDistributed SystemsNetworkingSecurityPostgre SQL

Job Description

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

We are seeking a highly skilled Site Reliability Engineer (SRE) to ensure the reliability, scalability, security, and performance of our production systems and services. The SRE will bridge the gap between software development and operations, implementing automation, monitoring, and best practices to enable rapid, reliable delivery of applications. You will report directly to the Senior Director of Engineering.

What you’ll do:

  • Reliability & Performance
    • Ensure high availability, scalability, and performance of production systems.
    • Implement and maintain SLIs, SLOs, and SLAs for critical services.
    • Conduct capacity planning and performance tuning.
  • Automation & Tooling
    • Automate infrastructure provisioning using IaC tools such as Terraform and Terragrunt, Ansible.
    • Develop automation to minimize manual operations and improve deployment workflows.
    • Build CI/CD pipelines to support rapid and reliable deployments.
  • Monitoring & Incident Response
    • Design and maintain monitoring, logging, and alerting systems (Datadog).
    • Participate in on-call rotations and lead incident response efforts.
    • Perform root-cause analysis and develop postmortems to prevent recurring issues.
  • Systems Engineering
    • Manage cloud infrastructure (AWS, Azure) and container orchestration platforms (Kubernetes, ECS).
    • Optimize system architecture for reliability and fault tolerance.
    • Implement best practices for security, networking, and service resilience.
  • Collaboration & Leadership
    • Work closely with development teams to design reliable microservices and distributed systems.
    • Advocate for SRE principles and drive operational excellence across engineering teams.
    • Mentor engineers on reliability practices, tooling, and automation strategies.

Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
  • 3–7 years of experience in SRE, DevOps, or Systems Engineering roles.
  • Strong proficiency with Linux systems and shell scripting.
  • Experience with cloud platforms (AWS, Azure).
  • Hands-on experience with Kubernetes/ECS and container technologies (Docker).
  • Proficiency in at least one programming language: Python or Java.
  • Experience with CI/CD pipelines and DevOps tooling.
  • Strong understanding of distributed systems, networking, and security fundamentals.

Preferred Qualifications

  • Experience with observability stacks (OpenTelemetry).
  • Knowledge of database management (PostgreSQL).
  • Experience with configuration management tools (Ansible, Chef, Puppet).
  • Familiarity with zero-downtime deployments and chaos engineering practices.

Soft Skills

  • Strong analytical and problem-solving skills.
  • Excellent communication and cross-team collaboration.
  • Ability to thrive in fast-paced, high-stakes environments.
  • A mindset focused on continuous improvement and operational excellence.

Work Location

  • Remote: Colorado, Delaware, Florida, New Hampshire, New Jersey, New York, Pennsylvania, Texas.

Additional Information

  • Full-time base salary range of $130,000 to $150,000 plus medical, dental, and vision benefits and a matching 401K.

Job Requirements

  • Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
  • 3–7 years of experience in SRE, DevOps, or Systems Engineering roles.
  • Strong proficiency with Linux systems and shell scripting.
  • Experience with cloud platforms (AWS, Azure).
  • Hands-on experience with Kubernetes/ECS and container technologies (Docker).
  • Proficiency in at least one programming language: Python or Java.
  • Experience with CI/CD pipelines and DevOps tooling.
  • Strong understanding of distributed systems, networking, and security fundamentals.
  • Preferred Qualifications
  • Experience with observability stacks (OpenTelemetry).
  • Knowledge of database management (PostgreSQL).
  • Experience with configuration management tools (Ansible, Chef, Puppet).
  • Familiarity with zero-downtime deployments and chaos engineering practices.
  • Soft Skills
  • Strong analytical and problem-solving skills.
  • Excellent communication and cross-team collaboration.
  • Ability to thrive in fast-paced, high-stakes environments.
  • A mindset focused on continuous improvement and operational excellence.
  • Work Location
  • Remote: Colorado, Delaware, Florida, New Hampshire, New Jersey, New York, Pennsylvania, Texas.
  • Additional Information
  • Full-time base salary range of $130,000 to $150,000 plus medical, dental, and vision benefits and a matching 401K.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Full TimeRemote

The Senior Site Reliability Engineer acts as the Technical Architecture & Stability Assessment Lead, evaluating the reliability and resilience of complex enterprise infrastructure environments over a structured 16-week assessment period. This role focuses on identifying stability risks, mapping dependencies, and strengthening current architecture to ensure operational continuity during modernization efforts.

United States
$140K - $180K / year

Junior DevOps Engineer

Oddball

A strangely human digital agency

DevOps Engineer3 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

Junior DevOps Engineer supporting cloud platforms for mission-driven applications

AWSAzureCloudDockerEC2JenkinsPythonTerraform
United States
$80K - $100K / year

Senior Site Reliability Engineer

ClickHouse

ClickHouse is an open-source, column-oriented OLAP database management system.

DevOps Engineer3 days ago
Full TimeRemoteTeam 51-200Since 2016H1B Sponsor

Senior Site Reliability Engineer ensuring performance and reliability of ClickHouse cloud infrastructure

AnsibleAWSAzureCloudDockerGoogle Cloud PlatformKubernetesPuppetPythonSQLTerraformGo
United States
$141K - $208K / year

IT Operations Engineer I

Aledade

Aledade, a public benefit corporation, exists to empower the most transformational part of our health care landscape - independent primary care. We were founded in 2014, and since then, we've become the largest network of independent primary care in the country - helping practices, health centers and clinics deliver better care to their patients and thrive in value-based care. Additionally, by creating value-based contracts across a wide variety of health plans, we aim to flip the script on the traditional fee-for-service model. Our work strengthens continuity of care, aligns incentives and ensures primary care physicians are paid for what they do best - keeping patients healthy. If you want to help create a health care system that is good for patients, good for practices and good for society - and if you're eager to join a collaborative, inclusive and remote-first culture - you've come to the right place.

DevOps Engineer3 days ago
Full TimeRemoteTeam 1,001-5,000

As an IT Operations Engineer I, you are a vital contributor to the health, stability, and efficiency of our production environments. Sitting at the intersection of traditional systems administration and modern DevOps, you are responsible for deploying standard infrastructure comp...

United States