CentralReach
Elevating Autism & IDD Care through Technology
Senior Site Reliability Engineer, Security
Location
United States
Posted
17 days ago
Salary
$160K - $180K / year
EnglishAnsibleAWSChefCloudDockerGrafanaJavaKubernetesLinuxPrometheusPythonSplunkTerraformGo
Job Description
• Responsible for availability, latency, performance, efficiency, monitoring/observability, emergency response, capacity planning, setting and maintaining SLOs, SLIs and Error Budgets, creating dashboards.
• Analyze, troubleshoot and resolve operational challenges contributing to defined SLO's.
• Manage site stability, performance, reliability, and maintain uptime for production environments.
• Develop a fully automated multi-environment observability stack based on the existing system and extend it to predict capacity needs based on the usage patterns.
• Strive for automation to reduce toil and increase development velocity.
• Perform application-specific production support, incident management, change management, problem management, RCAs, and service restoration as needed.
• Identify changes for the product architecture from the reliability, performance and availability perspective with a data driven approach.
• Document resolution run books and standard operating procedures.
• Actively look for opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation.
• Collaborate with software development teams in the release management process and to shape the future roadmap and establish strong operational readiness across teams.
• Implementation of reliability and observability tools (like New Relic, Prometheus, Grafana etc.,).
• Collaborates with Security team and other platform engineering teams to build reliable, maintainable, and scalable solutions that improve our security posture.
Job Requirements
- Strong background as a SRE supporting a 24x7 highly available production environment for a SaaS or cloud service provider.
- Solid experience with Monitoring/APM/Observability tools (Splunk, New Relic etc.)
- Experience implementing observability plans around logs, metrics, and traces.
- Experience in an agile development team developing software.
- Experience with cloud infrastructure environments, preferably AWS, and Infrastructure as code (Terraform, CloudFormation).
- Extensive experience with Docker, Kubernetes, Helm, CI/CD and config management tools like Ansible, Chef.
- Strong experience with containerization technology and/or Kubernetes.
- Experience with Release automation, system administration, configuration management.
- Experience with programming languages (Java, Python, Go, etc.).
- Strong understanding of Linux, Windows, software development, systems, networking, and cloud concepts.
- Strong interpersonal and teaming skills - ability to set and enforce process and influence engineers who are not direct reports.
- Strong analytical and programming skills (Python, Go, Java etc.).
- Deep understanding around best practices for modern cloud security.
- Proven experience building observability for security concerns, such as privilege escalations and bot detection.
Benefits
- Comprehensive health benefits
- Generous PTO
- 401(k) matching
- Paid parental leave
- Hybrid work schedules
- Career development support
- Wellness programs
- Opportunities to give back through CR Cares™
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevOps Engineer17 days ago
Full TimeRemoteTeam 1-10H1B No Sponsor
DevSecOps Engineer fortifying infrastructure and security for healthcare technology company
AnsibleAWSCloudDockerFirewallsGrafanaJenkinsKubernetesLinuxPostgreSQLPython
DevOps Engineer17 days ago
Full TimeRemoteTeam 51-200
Design and run scalable infrastructure as a Senior DevOps Engineer for remote-first tech company
AWSCloudDNSGrafanaKubernetesPrometheusPythonTerraform
Backend/DevOps Engineer
Nick AIWe are building an AI Agent Trading Platform. Create your Agent, customize strategy & trade on your favorite exchanges.
DevOps Engineer17 days ago
Full TimeRemoteTeam 1-10Since 2024
Backend/DevOps Engineer managing infrastructure for AI-powered trading platform
AWSCloudDockerGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonWeb3
United States
DevOps Engineer17 days ago
Full TimeRemoteTeam 51-200Since 2019
Database Reliability Engineer maintaining PostgreSQL infrastructure for a cloud services platform
AnsibleAWSChefCloudDynamoDBGrafanaPostgreSQLPrometheusPythonRubySQLTerraformGo