Vultr

Vultr is on a mission to make high-performance cloud computing easy to use, affordable, and locally accessible.

Senior Site Reliability Engineer, Core Cloud Engineering

DevOps EngineerDevOps EngineerFull TimeRemoteTeam 201-500Since 2014Company SiteLinkedIn

Location

United States

Posted

16 days ago

Salary

$120K - $130K / year

EnglishDistributed SystemsGrafanaLinuxMy SQLPHPPuppet

Job Description

• Operate and scale Vultr’s control plane, ensuring availability, correctness, and performance across global datacenters. • Design, implement, and maintain automation to manage hypervisor fleets (KVM, QEMU, libvirt) and supporting infrastructure at scale. • Develop tooling and automation for Open vSwitch (OVS), BGP routing, and other networking components to ensure resilient and self-healing network operations. • Continuously analyze and improve system performance across compute, storage, and network layers, with an emphasis on reducing toil and eliminating single points of failure. • Implement advanced monitoring, logging, and tracing solutions (Grafana, Sentry, SumoLogic) while leading incident response to minimize impact and drive postmortem culture. • Maintain and evolve infrastructure pipelines (GitLab CI/CD, Puppet) to enable safe, fast, and reliable changes to both control plane and hypervisor infrastructure. • Work closely with Software Engineers, Network Engineers, and Product teams to align platform reliability with business and user needs. • Produce clear technical documentation for runbooks, operational procedures, and automation frameworks to improve team efficiency and reliability standards. • Coach and mentor team members in best practices for site reliability, incident handling, automation, and low-level Linux systems debugging.

Job Requirements

  • Proficiency in PHP with strong scripting and automation skills.
  • Experience running large-scale distributed systems and control plane infrastructure in production.
  • Strong background in hypervisor technologies (libvirt, QEMU, KVM) and Linux systems administration.
  • Expertise in networking protocols and tools, particularly BGP and Open vSwitch (OVS), with automation experience.
  • Deep knowledge of observability and monitoring frameworks (Grafana, Sentry, SumoLogic) and incident management.
  • Advanced troubleshooting skills across compute, networking, and storage subsystems.
  • Experience building and maintaining CI/CD pipelines (GitLab) and configuration management (Puppet).
  • Familiarity with MySQL or similar databases, with an understanding of operational considerations for reliability and scale.
  • Strong problem-solving abilities and the drive to tackle complex, low-level reliability challenges.
  • Effective cross-team communication and collaboration skills.
  • A commitment to continuous improvement and fostering a culture of operational excellence.

Benefits

  • Excellent Medical Benefits w/ 100% company paid premiums for employee only plan + 100% company paid dental & vision premiums
  • 401(k) plan that matches 100% up to 4% with immediate vesting
  • Professional Development Reimbursement of $2,500 each year
  • 11 Holidays + Paid Time Off Accrual + Rollover Plan
  • Increased PTO at 3 year & 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year
  • $500 first year remote office setup + $400 each following year for new equipment
  • Internet reimbursement up to $75 per month
  • Gym membership reimbursement up to $50 per month
  • Company paid Wellable subscription

Related Categories

Related Job Pages

More DevOps Engineer Jobs

DevOps Engineer16 days ago
Full TimeRemoteTeam 51-200

Site Reliability Engineer ensuring performance of Crunchafi’s cloud-based SaaS platform

AzureCloudDNSDockerKubernetesPythonSQLTerraformGo
Wisconsin

Senior DevOps, Infrastructure Engineer

AlphaHire

The Operating System for Automated Hiring

DevOps Engineer16 days ago
Full TimeRemoteTeam 11-50Since 2020

Senior DevOps Engineer designing infrastructure for observability platform

AWSAzureDistributed SystemsDNSDockerGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonTerraformGo
United States

DevOps Engineer – Mission-Critical Systems

Tactibit Technologies

Mission-focused, innovative, and agile cybersecurity and IT operations support for the most demanding missions.

DevOps Engineer16 days ago
Full TimeRemoteTeam 11-50

DevOps Engineer modernizing legacy architectures for critical mission systems

AWSCloud
Maryland

Lead Site Reliability Engineer

DraftKings Inc.

Defining what it means to build and deliver the most extraordinary sports & entertainment experiences.The Crown is Yours

DevOps Engineer16 days ago
Full TimeRemoteTeam 1,001-5,000Since 2012H1B No Sponsor

Lead Site Reliability Engineer at DraftKings enhancing infrastructure reliability and efficiency

AnsibleAWSChefCloudDockerElixirGoogle Cloud PlatformIoTJavaKubernetesLinuxPythonRubyTerraformGo.NET
United States
$148K - $185K / year