Gov Services Hub
"Empowering Prime Contractors, Simplifying Services"
Senior Site Reliability Engineer, SRE
Location
New York
Posted
121 days ago
Salary
Not specified
5 yrs expEnglishAWSCloudEC2PrometheusPythonTerraform
Job Description
• Lead incident response and develop sustainable on-call practices, including runbooks, blameless postmortems, and continuous improvement to reduce MTTR
• Build and maintain self-service observability tools (Datadog, Prometheus, ELK) for proactive monitoring and troubleshooting
• Create and maintain Infrastructure as Code (IaC) using Terraform or CloudFormation for consistent, secure AWS environments
• Partner with development teams to architect resilient, scalable infrastructure for critical components like databases, networking, async workflows, and data pipelines
• Design and implement robust CI/CD pipelines (GitHub Actions) with advanced deployment strategies (blue/green, canary)
• Drive best practices in reliability and performance early in the design phase to future-proof January’s systems
Job Requirements
- Proven experience leading incident response and postmortem processes for high-availability production systems
- Deep expertise in designing highly available architectures (EC2, Fargate, auto-scaling, health checks, graceful degradation)
- Strong experience with AWS cloud infrastructure and IaC tools (Terraform, CloudFormation)
- Hands-on experience with CI/CD automation using GitHub Actions or equivalent tools
- Proficiency in observability and monitoring stacks (Datadog, Prometheus, ELK)
- Solid scripting/programming skills in Python (for automation, tooling, and debugging)
- Excellent communication and documentation skills, with the ability to collaborate across engineering and platform teams
Benefits
- Remote role (NYC-based preferred for hybrid collaboration)
- Opportunity to build and own the entire SRE practice for a growing FinTech startup
- Fast-paced, innovative environment working on AI-forward consumer finance products
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevOps Engineer122 days ago
Full TimeRemoteTeam 201-500Since 2006H1B No Sponsor
Staff SRE at FloSports improving developer enablement and infrastructure.
AWSGoogle Cloud PlatformJavaScriptKubernetesNode.jsTerraformGo
United States
DevOps Engineer – GitHub Migration Projects
AtmoseraSolution Enablement, Solution Management, Solution Training - Atmosera is the Apps, Data, and Azure Expert
DevOps Engineer122 days ago
ContractRemoteTeam 51-200H1B No Sponsor
DevOps Engineer supporting GitHub Enterprise Cloud migration projects
CloudPython
United States
DevOps Engineer124 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor
SRE role developing a production-grade observability stack for satellite communications systems
AWSCloudDistributed SystemsGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonTerraformGo
DevOps Engineer124 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor
Building observability stack for satellite and aerospace networking platforms
AWSCloudDistributed SystemsGoogle Cloud PlatformGrafanaJavaKubernetesPrometheusPythonTerraformGo