Nonprofit. Free, 24/7, confidential mental health support in English and Spanish. Text SHARE or APOYO to 741741🇺🇲

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteTeam 201-500Since 2013H1B SponsorCompany Site LinkedIn

Location

United States

Posted

5 days ago

Salary

Not specified

No structured requirement data.

Job Description

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

This role involves strengthening and scaling the infrastructure behind our crisis care platform. In this role, you’ll ensure our systems are reliable, resilient, and observable—ready for every moment someone reaches out. You’ll bridge development and operations, champion automation, and drive a culture of reliability across engineering.

If you want to build systems that deliver help when it matters most, this is your opportunity.

Responsibilities

Automation & Infrastructure as Code
- Develop and maintain automation tools and frameworks to reduce manual operations.
- Implement Infrastructure as Code (IaC) using tools such as Terraform, CloudFormation, or similar.
- Build and manage CI/CD pipelines to support rapid, reliable deployments.
- Create self-service tooling and platforms that empower development teams.
System Reliability & Performance
- Design, implement, and maintain scalable, reliable, and secure infrastructure to support business-critical applications.
- Lead incident response, conduct root-cause analysis, and implement long-term preventive fixes.
- Optimize system performance through capacity planning and resource utilization improvements.
Monitoring & Observability
- Design and implement robust monitoring, logging, and alerting systems.
- Build dashboards and metrics that provide visibility into system and service health.
- Establish observability best practices across microservices and distributed systems.
- Reduce alert fatigue through intelligent alerting, automation, and clear runbooks.

Qualifications

Required
- 6–8+ years in Infrastructure, SRE, Platform, or DevOps engineering with strong Python and Linux/Unix fundamentals.
- Advanced AWS expertise (EC2, EKS/ECS, S3, IAM, VPC) and hands-on Kubernetes + Docker in production.
- Proficiency with Terraform and Infrastructure as Code best practices.
- Experience owning CI/CD pipelines, deployment automation, and promotion workflows.
- Experience in observability + reliability skills (Datadog/Prometheus/Grafana, incident response, RCA).
- Security-minded engineering approach, ideally with exposure to regulated or healthcare environments.
Preferred
- Strong architectural thinking and ability to modernize legacy systems.
- Experience with trunk-based development, GitOps practices, or developer tooling.
- Demonstrated mentorship, technical guidance, or influence on engineering best practices.
- Effective collaborator with high ownership, able to operate independently and support developers.
- AWS cost-optimization experience or familiarity with Aurora/database performance.
- Experience working on mission-critical, distributed, or global-scale platforms.

Benefits

20 paid holidays, including:
- Federal holidays like Juneteenth and Labor Day
- Election day
- Holiday break from Dec 24 through January 1
- 2 renewal days
- 2 floating holidays
Flexible paid time off, including:
- 15 vacation days
- 3 personal days
- 7 sick days
Medical, dental, and vision benefits for the staff member and family at no cost to the employee
403B retirement plan (the nonprofit equivalent of a 401K): 3% contribution by Crisis Text Line to support building financial wellness, regardless of personal contribution
12 weeks paid parental leave (after 6 months of employment)
Student loan repayment (after 2 years of continuous full-time service)
Family support through a virtual childcare platform
Stipends/Allowances:
- Mental health (Monthly)
- Internet Service (Monthly)
- Professional Development (Annual)
- Wellness (Annual)
- Home office setup (One-time/First year)

Job Requirements

Required 6–8+ years in Infrastructure, SRE, Platform, or DevOps engineering with strong Python and Linux/Unix fundamentals. Advanced AWS expertise (EC2, EKS/ECS, S3, IAM, VPC) and hands-on Kubernetes + Docker in production. Proficiency with Terraform and Infrastructure as Code best practices. Experience owning CI/CD pipelines, deployment automation, and promotion workflows. Experience in observability + reliability skills (Datadog/Prometheus/Grafana, incident response, RCA). Security-minded engineering approach, ideally with exposure to regulated or healthcare environments.
6–8+ years in Infrastructure, SRE, Platform, or DevOps engineering with strong Python and Linux/Unix fundamentals.
Advanced AWS expertise (EC2, EKS/ECS, S3, IAM, VPC) and hands-on Kubernetes + Docker in production.
Proficiency with Terraform and Infrastructure as Code best practices.
Experience owning CI/CD pipelines, deployment automation, and promotion workflows.
Experience in observability + reliability skills (Datadog/Prometheus/Grafana, incident response, RCA).
Security-minded engineering approach, ideally with exposure to regulated or healthcare environments.
Preferred Strong architectural thinking and ability to modernize legacy systems. Experience with trunk-based development, GitOps practices, or developer tooling. Demonstrated mentorship, technical guidance, or influence on engineering best practices. Effective collaborator with high ownership, able to operate independently and support developers. AWS cost-optimization experience or familiarity with Aurora/database performance. Experience working on mission-critical, distributed, or global-scale platforms.
Strong architectural thinking and ability to modernize legacy systems.
Experience with trunk-based development, GitOps practices, or developer tooling.
Demonstrated mentorship, technical guidance, or influence on engineering best practices.
Effective collaborator with high ownership, able to operate independently and support developers.
AWS cost-optimization experience or familiarity with Aurora/database performance.
Experience working on mission-critical, distributed, or global-scale platforms.

Benefits

20 paid holidays, including: Federal holidays like Juneteenth and Labor Day Election day Holiday break from Dec 24 through January 1 2 renewal days 2 floating holidays
Federal holidays like Juneteenth and Labor Day
Election day
Holiday break from Dec 24 through January 1
2 renewal days
2 floating holidays
Flexible paid time off, including: 15 vacation days 3 personal days 7 sick days
15 vacation days
3 personal days
7 sick days
Medical, dental, and vision benefits for the staff member and family at no cost to the employee
403B retirement plan (the nonprofit equivalent of a 401K): 3% contribution by Crisis Text Line to support building financial wellness, regardless of personal contribution
12 weeks paid parental leave (after 6 months of employment)
Student loan repayment (after 2 years of continuous full-time service)
Family support through a virtual childcare platform
Stipends/Allowances: Mental health (Monthly) Internet Service (Monthly) Professional Development (Annual) Wellness (Annual) Home office setup (One-time/First year)
Mental health (Monthly)
Internet Service (Monthly)
Professional Development (Annual)
Wellness (Annual)
Home office setup (One-time/First year)

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More US Remote Jobs

More DevOps Engineer Jobs

Senior Staff Site Reliability Engineer / DevOps Engineer

Ariel Partners

At Ariel Partners, we solve the most difficult problems that inhibit technology from enabling our customers to achieve their goals. Our vision is to be recognized by our stakeholders as an elite provider of IT solutions, so when they have their biggest challenges, we are on their short list. We are looking for team members who share our values of: Integrity - to do the right thing even when it hurts; Commitment - to the long-term success and happiness of our customers, our people, and our partners; Courage - to take on difficult challenges, accept new ideas, and accept incremental failure; Excellence - the constant pursuit of excellence. Ariel Partners is an Equal Opportunity Employer in accordance with federal, state, and local laws.

DevOps Engineer5 days ago

Full TimeRemote

We are seeking a Staff Site Reliability Engineer (SRE)/DevOps Engineer to improve the reliability, observability, and operational health of our production platform. This role requires someone who can go beyond basic monitoring—the ideal candidate must understand application archi...

DatadogAPMmonitoringalertingdashboardsdistributed systemsincident responseCI/CDautomationDevOpsSite Reliability Engineeringplatform engineering

View details: Senior Staff Site Reliability Engineer / DevOps Engineer

United States

Apply

Senior Staff DevOps Engineer

Ariel Partners

DevOps Engineer5 days ago

Full TimeRemote

We are seeking a hands-on Staff DevOps Engineer to support deployment, optimization, and operations of our cloud-native infrastructure. This role focuses on building and maintaining scalable systems in an AWS and Kubernetes-based environment, improving reliability, and optimizing...

AWSKubernetesTerraformCI/CDArgoGitHub ActionsElasticDatadogRedisSQSSpot OceanFalkorDBNeptune

View details: Senior Staff DevOps Engineer

United States

Apply

Site Reliability Engineering Intern

AWP Safety

DevOps Engineer6 days ago

InternshipRemoteTeam 5,001-10,000Since 1981H1B No Sponsor

Company Site LinkedIn

Site Reliability Engineering Intern working on observability and monitoring at AWP Safety

CloudDockerKubernetesMicroservicesPythonGo

View details: Site Reliability Engineering Intern

Ohio

$30 - $34 / hour

Apply

DevOps Engineer

Pyramid Systems

Pyramid Systems, Inc. is an award-winning, technology leader, driving digital transformation across federal agencies. Voted a Top Workplace, both regionally (Washington, DC) and Nationally (USA) the past 2 years (2023 and 2024) based on the feedback from our employees. Headquartered in Fairfax, VA, and have a growing national footprint. We value and promote our Flexible Workplace approach because of the positive impacts it has on work-life integration. We remain committed to ensuring every employee’s voice is heard, performance and results are recognized and rewarded, development and advancement is a focus, and diversity, equity and inclusion is a company priority. We offer competitive compensation and benefits (including a recently launched Employee Stock Ownership Plan - ESOP), a robust performance-based rewards program, and we know how to have fun! Our people and culture have endured and delivered for our clients for nearly three decades. EEO Statement Pyramid Systems, Inc. is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or protected veteran status and will not be discriminated against on the basis of disability.

DevOps Engineer6 days ago

Full TimeRemote

Pyramid Systems is seeking an experienced DevOps/AWS Engineer to take responsibility for creating, building, deploying, orchestrating, and automating deployment packages in an AWS cloud-based environment. Design and automate cloud environments at scale using Infrastructure as Cod...

AWSTerraformAnsibleDockerKubernetesPythonLinuxGitHub ActionsGitLab CI/CDJenkinsInfrastructure as CodeCI/CD

View details: DevOps Engineer

United States

Apply

Senior Site Reliability Engineer

Job Description

Responsibilities

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior Staff Site Reliability Engineer / DevOps Engineer

Senior Staff DevOps Engineer

Site Reliability Engineering Intern

DevOps Engineer