Aalyria

Connectivity Everywhere

Site Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteTeam 51-200H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

123 days ago

Salary

$115K - $135K / year

Bachelor Degree4 yrs expEnglishAWSCloudDistributed SystemsGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonTerraformGo

Job Description

• Help design and build Aalyria's centralized observability platform, integrating and scaling tools for metrics (e.g. Prometheus), logging (e.g. Loki), and distributed tracing (e.g. Tempo/OpenTelemetry). • Define, implement, and manage a robust framework of Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for our core products, ensuring we are launch-ready. • Partner with SWEs to implement observability best practices, develop standard templates and documentation, and configure tooling (e.g., OpenTelemetry libraries). • Automate the deployment, scaling, and management of the entire observability stack using Infrastructure as Code (e.g. Terraform) and GitOps principles (e.g. ArgoCD). • Partner closely with the core infrastructure team to ensure deep visibility into our Kubernetes clusters and underlying GCP and AWS environments. • Develop and lead the company's monitoring, alerting, and incident response strategy, driving a culture of proactive reliability and blameless post-mortems.

Job Requirements

  • 4+ years of experience in an SRE or platform engineering role, with a focus on observability for large-scale, distributed compute or network systems.
  • Deep, hands-on expertise building, scaling, and managing observability platforms (e.g., Prometheus, Grafana, Loki/ELK, OpenTelemetry, Tempo/Jaeger, Honeycomb, etc.).
  • Proven experience using these tools to support performance analysis and debugging of complex distributed systems.
  • Strong production-level experience with Google Cloud Platform (GCP) and Kubernetes.
  • Experience using Infrastructure as Code (IaC) and GitOps principles (e.g., ArgoCD).
  • Proficiency in a systems programming language, with a strong preference for Go and Python for debugging and writing tooling.
  • Demonstrable experience defining, implementing, and managing SLOs, SLIs, and error budgets for production services for high availability distributed systems.

Benefits

  • Innovative Environment: Work at a cutting-edge company shaping the future of aerospace communications.
  • Impactful Work: Directly contribute to critical national security programs and initiatives.
  • Growth Opportunities: Expand your career with opportunities for professional development and advancement.
  • Inclusive Culture: Be part of a collaborative, supportive, and inclusive workplace where your contributions matter.
  • Flexibility: Flexible working arrangements including hybrid remote/in-office schedules.
  • Competitive salary, comprehensive benefits (401(k), dental, vision, health, life insurance), paid time off, and equity options.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior DevOps Engineer / Cloud Architect

AGENTIC

The Event for the Autonomous AI Era

DevOps Engineer125 days ago
Full TimeRemoteTeam 11-50Since 2017H1B No Sponsor

Senior DevOps Engineer / Cloud Architect designing multi-account architectures

AWSAzureCloudPostgresPythonReactTypeScript
United States
ContractRemoteTeam 11-50Since 2003H1B No Sponsor

DevOps Engineer designing and managing CI/CD pipelines and cloud infrastructure

AnsibleAWSAzureCloudDockerEC2JenkinsKubernetesPythonTerraform
United States

Senior DevOps Engineer, Remote

Trax Technologies

The global leader in Transportation Spend Management (TSM) visibility for Freight Audit & Payment solutions.

DevOps Engineer126 days ago
Full TimeRemoteTeam 501-1,000Since 1993H1B No Sponsor

Senior DevOps Engineer leading infrastructure for supply chain optimization.

AWSCloudDistributed SystemsDNSDockerKubernetesTCP/IP
United States

DevOps Engineer, Platform Engineer

MAK-SYSTEM

Create & deliver innovative technologies to ensure efficiency, compliance & safety of blood, plasma & cellular products

DevOps Engineer128 days ago
Full TimeRemoteTeam 201-500Since 1984H1B No Sponsor

Platform Engineer supporting AWS platforms at MAK-SYSTEM

AnsibleAWSChefDockerJavaJenkinsKubernetesLinuxMySQLOraclePostgresPuppetSubversionTerraformUnix
United States