Moonlite

Senior Site Reliability Engineer – SRE

DevOps EngineerDevOps EngineerFull TimeRemoteSeniorTeam 1-10Company Site LinkedIn

Location

Illinois

Posted

70 days ago

Salary

$165K - $225K / year

Seniority

Senior

Bachelor Degree5 yrs expEnglishAnsibleDNSGrafanaKubernetesLinuxPrometheusPythonTerraformGo

Job Description

• Design, build, and operate production Kubernetes clusters on bare-metal infrastructure. • Implement and operate custom Kubernetes networking solutions. • Develop and maintain custom Kubernetes operators and controllers. • Deploy and optimize NVIDIA GPU operators and custom scheduling logic for GPU workloads. • Build deep integrations between Kubernetes and underlying infrastructure. • Design and implement automation using Terraform, Ansible, Helm, and custom operators. • Manage production bare-metal infrastructure across multiple regions ensuring high availability, fault tolerance, and graceful degradation. • Build comprehensive monitoring, logging, and alerting using Prometheus, Grafana, and ELK stack. • Identify and resolve performance bottlenecks across infrastructure domains.

Job Requirements

5+ years in SRE, DevOps, or infrastructure engineering roles with proven experience operating production infrastructure at scale.
Deep hands-on experience building and operating production Kubernetes clusters on bare-metal infrastructure.
Strong understanding of Kubernetes internals including custom resource definitions (CRDs), operators, controllers, admission webhooks, and scheduling.
Strong fundamentals in Linux systems administration, performance tuning, troubleshooting, and automation in production environments.
Proficiency with infrastructure-as-code tools (Terraform, Ansible, Helm) and building automation to reduce operational overhead.
Solid understanding of networking concepts including IPAM, DNS, DHCP, VLAN/VXLAN, routing, load balancing, and experience troubleshooting network issues in production.
Experience building and maintaining comprehensive monitoring solutions using tools like Prometheus, Grafana, and centralized logging systems.
Understanding of SRE principles including SLIs/SLOs/SLAs, error budgets, incident management, and blameless postmortems.
Strong scripting skills in Go, Python, or Bash for automation, tooling development, and operational efficiency.
Demonstrated ability to troubleshoot complex issues under pressure, manage incidents effectively, and communicate clearly during outages.
Excellent communication skills and ability to work across teams including systems engineers, network engineers, and software developers.

Benefits

6% 401(k) match
Fully covered health insurance premiums
Other comprehensive offerings to support your well-being and success as we grow together.

Related Categories

DevOps Engineer

Related Job Pages

DevOps Engineer Jobs in Illinois Remote Full-time Jobs (US)Remote Python Jobs (US)More US Remote Jobs

More DevOps Engineer Jobs

DevOps Engineer

AAPC

Advancing the Business of Healthcare

DevOps Engineer70 days ago

Full TimeRemoteTeam 51-200Since 1988H1B No Sponsor

Company Site LinkedIn

DevOps Engineer with expertise in Azure, AWS, and Terraform

AWSAzureCloudDockerPythonTerraform

View details: DevOps Engineer

United States

Apply

Deployment Engineer

Cyngn

Autonomous Vehicle solutions and retrofits for industrial use cases across logistics, material handling, and mining.

DevOps Engineer70 days ago

Full TimeRemoteTeam 51-200H1B Sponsor

Company Site LinkedIn

Deployment Engineer optimizing autonomy for Cyngn's autonomous vehicles in customer facilities

GrafanaLinux

View details: Deployment Engineer

United States

$90K - $112K / year

Apply

Senior DevOps Engineer, AWS Cloud

H1 is the connecting force for global HCP, clinical, scientific and research information.

DevOps Engineer70 days ago

Full TimeRemoteTeam 201-500H1B Sponsor

Company Site LinkedIn

Senior DevOps Engineer scaling AWS cloud infrastructure for healthcare company

AWSAzureCloudGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonTerraform

View details: Senior DevOps Engineer, AWS Cloud

New York

$120K - $145K / year

Apply

Senior Site Reliability Engineer

CaptivateIQ

The agile commission solution. We're hiring!

DevOps Engineer71 days ago

Full TimeRemoteTeam 201-500Since 2017H1B No Sponsor

Company Site LinkedIn

The Site Reliability Engineering team in CaptivateIQ operates across the engineering organization, supporting our development teams by providing them with the tools and processes they need to get their job done well. We ensure that the service provided by our product is great for...

TerraformAWSECSBashPythonGoDatadogInfrastructure as CodeContainersContainer OrchestrationObservabilityReliability Engineering

View details: Senior Site Reliability Engineer

United States

$195.7K - $225K / year

Apply

Senior Site Reliability Engineer – SRE

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

DevOps Engineer

Deployment Engineer

Senior DevOps Engineer, AWS Cloud

Senior Site Reliability Engineer