Gov Services Hub

"Empowering Prime Contractors, Simplifying Services"

Senior Site Reliability Engineer, SRE

DevOps EngineerDevOps EngineerContractRemoteTeam 51-200Since 2015H1B No SponsorCompany SiteLinkedIn

Location

New York

Posted

121 days ago

Salary

Not specified

5 yrs expEnglishAWSCloudEC2PrometheusPythonTerraform

Job Description

• Lead incident response and develop sustainable on-call practices, including runbooks, blameless postmortems, and continuous improvement to reduce MTTR • Build and maintain self-service observability tools (Datadog, Prometheus, ELK) for proactive monitoring and troubleshooting • Create and maintain Infrastructure as Code (IaC) using Terraform or CloudFormation for consistent, secure AWS environments • Partner with development teams to architect resilient, scalable infrastructure for critical components like databases, networking, async workflows, and data pipelines • Design and implement robust CI/CD pipelines (GitHub Actions) with advanced deployment strategies (blue/green, canary) • Drive best practices in reliability and performance early in the design phase to future-proof January’s systems

Job Requirements

  • Proven experience leading incident response and postmortem processes for high-availability production systems
  • Deep expertise in designing highly available architectures (EC2, Fargate, auto-scaling, health checks, graceful degradation)
  • Strong experience with AWS cloud infrastructure and IaC tools (Terraform, CloudFormation)
  • Hands-on experience with CI/CD automation using GitHub Actions or equivalent tools
  • Proficiency in observability and monitoring stacks (Datadog, Prometheus, ELK)
  • Solid scripting/programming skills in Python (for automation, tooling, and debugging)
  • Excellent communication and documentation skills, with the ability to collaborate across engineering and platform teams

Benefits

  • Remote role (NYC-based preferred for hybrid collaboration)
  • Opportunity to build and own the entire SRE practice for a growing FinTech startup
  • Fast-paced, innovative environment working on AI-forward consumer finance products

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Staff Site Reliability Engineer

FloSports

Live Events. Exclusive Content. Be there.

DevOps Engineer122 days ago
Full TimeRemoteTeam 201-500Since 2006H1B No Sponsor

Staff SRE at FloSports improving developer enablement and infrastructure.

AWSGoogle Cloud PlatformJavaScriptKubernetesNode.jsTerraformGo
United States

DevOps Engineer – GitHub Migration Projects

Atmosera

Solution Enablement, Solution Management, Solution Training - Atmosera is the Apps, Data, and Azure Expert

DevOps Engineer122 days ago
ContractRemoteTeam 51-200H1B No Sponsor

DevOps Engineer supporting GitHub Enterprise Cloud migration projects

CloudPython
United States

Site Reliability Engineer

Aalyria

Connectivity Everywhere

DevOps Engineer124 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

SRE role developing a production-grade observability stack for satellite communications systems

AWSCloudDistributed SystemsGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonTerraformGo
United States
$115K - $135K / year
DevOps Engineer124 days ago
Full TimeRemoteTeam 51-200H1B No Sponsor

Building observability stack for satellite and aerospace networking platforms

AWSCloudDistributed SystemsGoogle Cloud PlatformGrafanaJavaKubernetesPrometheusPythonTerraformGo
United States
$160K - $200K / year