SafeRide Health
Transforming the way patients get where they need to go
Site Reliability Engineer
Location
United States
Posted
140 days ago
Salary
Not specified
5 yrs expEnglishAWS
Job Description
• Keeping systems and services running smoothly with minimal downtime by focusing on availability, reliability, and scalability
• Developing and maintaining tools and scripts to automate repetitive tasks such as deployments, configuration management, and monitoring
• Implementing and managing monitoring and alerting systems to provide visibility into system performance and quickly detect potential issues
• Responding to, diagnosing, and resolving system incidents, including conducting post-mortems to prevent future occurrences
• Monitoring system resource usage to forecast future needs and scale systems accordingly to handle increasing user load
• Collaborating with stakeholders to identify operational risks and implementing strategies to reduce their likelihood and impact
• Analyzing metrics from operating systems and applications to identify areas for performance improvement
Job Requirements
- Minimum of 5 years progressive experience in an IT, Software Engineering, Technology Operations, or Business Continuity role
- Minimum of 2 years of hands-on experience in a Site Reliability, DevOps, or IT Observability role
- Demonstrated proficiency with production monitoring and alerting tools (DataDog is a major plus!)
- Basic proficiency in an AWS containerized environment running infrastructure as code
Benefits
- Competitive compensation and performance-based bonus potential
- Full medical, dental, and vision coverage
- Generous PTO and paid company holidays
- 401(k) with employer match
- Paid parental leave and family support benefits