Top rated business phone solution and personalized service to help your business thrive.

Site Reliability Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteTeam 201-500Since 2004H1B No SponsorCompany Site LinkedIn

Location

United States

Posted

30 days ago

Salary

$118K - $158K / year

Bachelor Degree5 yrs expEnglishAnsibleFluxGrafanaKafkaKubernetesLinuxMicroservicesPrometheus

Job Description

• Monitor and troubleshoot system performance, reliability, and availability issues using modern observability tools and techniques, with a strong emphasis on diagnosing and resolving issues in operating systems and bare metal environments. • Design, implement, and maintain scalable and reliable infrastructure using containers, Kubernetes, and microservices architecture. • Manage CI/CD pipelines to facilitate efficient software development and deployment processes. • Implement GitOps workflows using ArgoCD or Flux, manage Helm charts and Kustomize configurations for declarative application deployment and version control. • Oversee configuration management to ensure consistent and reliable software releases across environments. Using Ansible for consistent system configuration, patch management, and provisioning across datacenter infrastructure. • Design and operate high-throughput Kafka clusters for event streaming, managing topics, partitions, replication, consumer lag monitoring, and disaster recovery strategies across datacenter infrastructure. • Collaborate with development teams to influence system design choices and operational policies. • Provide expert guidance on managing large data centers, including hundreds of bare metal servers and virtual machines (VMs), ensuring optimal configuration and performance. • Implement name services and server management practices to support our infrastructure needs. • Continuously evaluate and integrate new technologies to enhance operational efficiency and reliability. • Participate in on-call rotations to provide support for production systems as necessary, conduct blameless post-mortems with root cause analysis, and maintain incident response runbooks and procedures. • Create comprehensive technical documentation, runbooks, architectural diagrams, network topology maps, and maintain knowledge bases for operational procedures and best practices. • Continuously evaluate and integrate new technologies to enhance operational efficiency and reliability.

Job Requirements

Bachelor’s degree in Computer Science, Engineering, or a related field; advanced degree preferred.
5+ years of experience as an SRE or a related field, with a strong focus on production systems, containers, microservices and service delivery.
Extensive experience with managing and maintaining CI/CD Pipelines and the essentials supporting it (GitOps workflows, ArgoCD, Helm charts)
Comprehensive knowledge of Observability Tools such as Prometheus, ELK Stack, log collectors, and Grafana for visuals
Extensive on-premises datacenter experience managing large data centers with hundreds of bare metal servers and VMs.
Deep knowledge of Linux operating systems, their configuration, performance tuning, and troubleshooting.
Experience with configuration management tools.
Familiarity with networking concepts and protocols in the scope of Linux Operating Systems.
Proven ability to analyze complex systems, identify bottlenecks, and implement solutions with strong troubleshooting skills.
Excellent communication skills, with the ability to collaborate effectively with cross-functional teams.
Experience with containers and orchestration technologies, particularly Kubernetes is a plus.

Benefits

Comprehensive Medical/Dental/Vision insurance for you and eligible dependents
Employer Paid Income Protection Benefits (Basic Life and AD&D, Short- and Long-term disability)
FSA Healthcare & Dependent Care
Commuter Benefits
Voluntary Accident, Critical Illness, Hospital Indemnity and Legal
401(k), including employer match, and Roth
Employee Stock Purchase Plan (ESPP)
Paid Time off, Sick Time, as well as corporate holidays observed
Employee Assistance Program
Life Balance benefits with Travel Assistance Services and Identity Theft
Additional Benefits include a Discount Program, Credit Union, Medicare Assistance, etc.

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)More US Remote Jobs

More DevOps Engineer Jobs

Senior DevOps Engineer

CI&T

Navigate Change

DevOps Engineer31 days ago

Full TimeRemoteTeam 5,001-10,000Since 1995H1B No Sponsor

Company Site LinkedIn

Open this job to view full details and requirements.

AWSEC2FluxGrafanaJavaJenkinsKubernetesPrometheusPythonTerraform

View details: Senior DevOps Engineer

Colorado

Apply

Cloud DevOps Engineer

ICA, Inc.

DevOps Engineer31 days ago

Full TimeRemoteTeam 51-200H1B No Sponsor

Company Site LinkedIn

Cloud DevOps Engineer managing AWS infrastructure and CI/CD pipelines for government clients

AWSKubernetesPostgresPythonTerraform

View details: Cloud DevOps Engineer

Virginia

Apply

DevOps Engineer

Espresso Systems

Tools & infrastructure for more safe, open, performant blockchains

DevOps Engineer31 days ago

Full TimeRemoteTeam 32Since 2020

The DevOps Engineer will build infrastructure for sequencer software, manage cloud environments, and assist with CI/CD pipelines.

AnsibleAWSAzureDockerGCPGithub ActionsNixRustTerraform

View details: DevOps Engineer

United States

Apply

Build Your Own Job Description

Zenith Health

Zenith Health is building the platform to transform real pregnancy experiences into evidence – establishing a foundation for data-driven decisions, improved care, and better outcomes. Our mission is for every pregnancy health question to have an answer informed by real evidence, not guesswork or anecdotes.

DevOps Engineer31 days ago

Full TimeRemoteTeam 8

Zenith Health is seeking motivated individuals who want to contribute to improving maternal and infant health through data science, software engineering, product development, and marketing. Candidates can submit interests for varying roles as the company expands its team.

View details: Build Your Own Job Description

New York

Apply

Site Reliability Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Senior DevOps Engineer

Cloud DevOps Engineer

DevOps Engineer

Build Your Own Job Description