MLabs logo
MLabs

We are a Haskell, Rust, Blockchain and AI consultancy.

Senior Site Reliability Engineer

DevOps EngineerDevOps EngineerRemoteSeniorTeam 51-200H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

1 day ago

Salary

Not specified

Seniority

Senior

EnglishTerraformKubernetesHelmGitOpsArgoCDCI/CDAWSGKEEKSAKSInfrastructure as CodeSite Reliability EngineeringDistributed SystemsMulti-region ArchitectureObservabilitySLOSLAMTTRBlue/Green DeploymentCanary Deployment

Job Description

Senior Site Reliability Engineer (Enterprise Platform)

Location: Remote - US - Open to Europe if happy to overlap with EST

Compensation: Competitive

We are hiring on behalf of our client, a high-growth software company supporting the development of a premier open-source, EVM-compatible public ledger built for global enterprise and Web3 use cases. They are currently hiring a Senior Site Reliability Engineer for their "greenfield" enterprise-focused team. This team is building a private and consortium distributed ledger platform designed specifically for sectors with high security and privacy requirements, such as financial services, healthcare, and supply chain.

This is a hands-on, high-impact role where you will own the design, deployment, and reliability of mission-critical, multi-region infrastructure. This is not a traditional support role; they are looking for an engineer who has operated real systems at scale and is eager to take end-to-end ownership of architecture and operational standards from the ground up.

Key Responsibilities:

  • Systems Architecture: Design and operate highly available, multi-region distributed systems with rigorous recovery strategies (RTO/RPO).
  • Infrastructure as Code: Own large-scale IaC using Terraform, developing reusable modules and multi-account patterns with policy guardrails.
  • Kubernetes Orchestration: Scale production environments (EKS, GKE, or AKS) utilizing GitOps (ArgoCD), Helm, and strict network policies.
  • CI/CD Leadership: Build secure pipelines supporting blue/green and canary deployments, artifact signing (SBOM), and automated rollback strategies.
  • SRE Advocacy: Define and improve SLOs, error budgets, and observability metrics to drive measurable reductions in MTTR.
  • Collaboration: Partner with the Head of SRE and VP of Engineering to translate complex business requirements into reliable, secure platform services.

Job Requirements

  • 7+ years of experience in SRE, Platform Engineering, or Infrastructure Engineering operating production distributed systems.
  • Multi-Cloud Mastery: Deep expertise in AWS or GCP, with experience running multi-region production environments and disaster recovery testing.
  • Containerization: Hands-on experience with Kubernetes at scale, including GitOps workflows and production-grade security controls.
  • Security Mindset: Strong background in Zero Trust principles, secrets management (Vault), and compliance frameworks (SOC 2, HIPAA, or NIST).
  • Tooling: Extensive experience with Terraform-first infrastructure in large-scale, real-world environments.
  • Nice to Have:
  • Experience with distributed ledger technology (DLT) or blockchain systems, particularly private/consortium deployments.
  • Familiarity with EVM-based systems and smart contract tooling (Solidity, Hardhat).
  • Experience operating active-active, globally distributed architectures.
  • Background in supporting financial services or other highly regulated industries.

Benefits

  • Incentive Package: Competitive base salary with Performance Bonuses.
  • Ownership: Equity and Token participation.
  • Future-Proofing: 401k and comprehensive health insurance (for US-based employees).
  • Innovation: The opportunity to build a "greenfield" platform from scratch within a stable, venture-backed organization.
  • Impact: Work on infrastructure that powers the world’s leading organizations across multiple sectors.
  • Due to the high volume of applications we anticipate, we regret that we are unable to provide individual feedback to all candidates. If you do not hear back from us within 4 weeks of your application, please assume that you have not been successful on this occasion. We genuinely appreciate your interest and wish you the best in your job search.
  • Commitment to Equality and Accessibility:
  • At MLabs, we are committed to offer equal opportunities to all candidates. We ensure no discrimination, accessible job adverts, and providing information in accessible formats. Our goal is to foster a diverse, inclusive workplace with equal opportunities for all. If you need any reasonable adjustments during any part of the hiring process or you would like to see the job-advert in an accessible format please let us know at the earliest opportunity by emailing human-resources@mlabs.city.
  • MLabs Ltd collects and processes the personal information you provide such as your contact details, work history, resume, and other relevant data for recruitment purposes only. This information is managed securely in accordance with MLabs Ltd’s Privacy Policy and Information Security Policy, and in compliance with applicable data protection laws. Your data may be shared only with clients and trusted partners where necessary for recruitment purposes. You may request the deletion of your data or withdraw your consent at any time by contacting legal@mlabs.city.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

HOESSLER & HOESSLER logo

DevOps Software Engineer

HOESSLER & HOESSLER

Full-Service. Full-Power. Full-Success.

DevOps Engineer1 day ago
Full TimeRemoteTeam 1-10H1B No Sponsor

DevOps Software Engineer at Auralis Group, remote work with enterprise clients

AWSAzureCloudFluxGoogle Cloud PlatformKubernetesPythonTerraformGo
United States
$100K - $130K / year
Cisco ThousandEyes logo

Senior Site Reliability Engineer (FedRAMP) - ThousandEyes

Cisco ThousandEyes

Cisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network – even the ones they don’t own. Powered by AI and an unmatched set of cloud, internet and enterprise network telemetry data, ThousandEyes enables IT teams to proactively detect, diagnose, and remediate issues – before they impact end- user experiences. ThousandEyes is deeply integrated across the entire Cisco technology portfolio and beyond, helping customers deploy at scale while also delivering AI-powered assurance insights within Cisco’s leading Networking, Security, Collaboration, and Observability portfolios.

DevOps Engineer1 day ago
Full TimeRemoteTeam 1,100Since 2010

Responsible for maintaining FedRAMP compliant services, designing infrastructure, monitoring systems, and ensuring security for federal regions, while driving automation and collaboration with development teams.

AWSFedrampGoKubernetesPuppetPythonTerraformUnix/Linux
Texas + 2 moreAll locations: Texas, New York, California
$146.7K - $277.6K / year
MetaPhase Consulting logo

Development, security, and operations Engineer

MetaPhase Consulting

Passion for agency missions + excellence in tech & mgmt + culture-first mentality = government consulting with heart 🧡

DevOps Engineer1 day ago
Full TimeHybridTeam 201-500H1B No Sponsor

Enhance developer experience by maintaining tools for secure code delivery, implement secure configurations in collaboration with teams, and ensure compliance with government security standards while supporting incident response protocols.

AWSKubernetesTerraformDockerCI/CDPythonLinuxSecurityNetworkingMonitoringScriptingCloudJenkinsAnsibleBash
Virginia
GROWE logo

System Reliability Engineer/DevOps

GROWE

GROWE TOGETHER: Our team is our main asset. We work together and support each other to achieve our common goals. DRIVE RESULT OVER PROCESS: We set ambitious, clear, measurable goals in line with our strategy and driving Growe to success. BE READY FOR CHANGE: We see challenges as opportunities to grow and evolve. We adapt today to win tomorrow.

DevOps Engineer1 day ago
Full TimeRemote

Growe welcomes those who are excited to: Ensure availability, performance, and scalability of infrastructure and services through monitoring, automation, and operational best practices; Lead incident response, perform root cause analysis, and implement recovery and long-term fixe...

AWSTerraformTerragruntKubernetesDockerHelmGitLab CIPrometheusGrafanaLokiVictoriaMetricsCloudWatchAnsiblePythonBashECSEKSRDSS3VPCRoute53KMSACMFluxCDArgo RolloutsKEDAVPAKarpenteringress-nginxPingdomPagerDutyAlertmanagerOpenSearchVector AgentNetwork FirewallTransit GatewaySite-to-Site VPNVaultSOPSCloudflareAWS Cost ExplorerKubeCost
United States + 180 moreAll locations: United States, Canada, Brazil, Colombia, Argentina, Chile, Venezuela, Bolivarian Republic Of, Bolivia, Plurinational State Of, Ecuador, French Guiana, Guyana, Paraguay, Peru, Suriname, Uruguay, Mexico, Costa Rica, El Salvador, Guatemala, Honduras, Nicaragua, Panama, Dominican Republic, Puerto Rico, Bahamas, Guadeloupe, Haiti, Jamaica, Martinique, Montserrat, United Kingdom, Germany, France, Estonia, Portugal, Hungary, Poland, Ukraine, Romania, Bulgaria, Czech Republic, Slovakia, Belarus, Moldova, Republic Of, Sweden, Greece, Belgium, Italy, Ireland, Switzerland, Netherlands, Finland, Malta, Denmark, Lithuania, Croatia, Spain, Austria, Bosnia And Herzegovina, Iceland, Luxembourg, Macedonia, The Former Yugoslav Republic Of, Montenegro, Norway, Serbia, Slovenia, Albania, Cyprus, Latvia, Monaco, South Africa, Egypt, Algeria, Angola, Benin, Botswana, Burkina Faso, Burundi, Cameroon, Cape Verde, Central African Republic, Chad, Congo, Côte D'ivoire, Congo, The Democratic Republic Of The, Equatorial Guinea, Eritrea, Ethiopia, Gabon, Gambia, Ghana, Guinea, Guinea-bissau, Kenya, Lesotho, Liberia, Libyan Arab Jamahiriya, Madagascar, Malawi, Mali, Mauritania, Mauritius, Mayotte, Morocco, Mozambique, Namibia, Niger, Nigeria, Réunion, Rwanda, Senegal, Seychelles, Sierra Leone, Somalia, Sudan, Swaziland, Tanzania, United Republic Of, Togo, Tunisia, Uganda, Zambia, Zimbabwe, Georgia, Turkey, Israel, United Arab Emirates, Armenia, Azerbaijan, Bahrain, Iraq, Jordan, Kuwait, Lebanon, Oman, Qatar, Saudi Arabia, Palestinian Territory, Occupied, Yemen, India, Japan, Philippines, Pakistan, Thailand, Singapore, Viet Nam, Taiwan, Province Of China, Indonesia, Cambodia, Lao People's Democratic Republic, Malaysia, Myanmar, Korea, Republic Of, China, Afghanistan, Bangladesh, Bhutan, Kazakhstan, Kyrgyzstan, Maldives, Mongolia, Nepal, Sri Lanka, Tajikistan, Turkmenistan, Uzbekistan, Australia, Papua New Guinea, Kiribati, Palau, French Polynesia, Tuvalu, New Zealand