Vultr
Vultr is on a mission to make high-performance cloud computing easy to use, affordable, and locally accessible.
Senior Site Reliability Engineer, Core Cloud Engineering
Location
United States
Posted
16 days ago
Salary
$120K - $130K / year
EnglishDistributed SystemsGrafanaLinuxMy SQLPHPPuppet
Job Description
• Operate and scale Vultr’s control plane, ensuring availability, correctness, and performance across global datacenters.
• Design, implement, and maintain automation to manage hypervisor fleets (KVM, QEMU, libvirt) and supporting infrastructure at scale.
• Develop tooling and automation for Open vSwitch (OVS), BGP routing, and other networking components to ensure resilient and self-healing network operations.
• Continuously analyze and improve system performance across compute, storage, and network layers, with an emphasis on reducing toil and eliminating single points of failure.
• Implement advanced monitoring, logging, and tracing solutions (Grafana, Sentry, SumoLogic) while leading incident response to minimize impact and drive postmortem culture.
• Maintain and evolve infrastructure pipelines (GitLab CI/CD, Puppet) to enable safe, fast, and reliable changes to both control plane and hypervisor infrastructure.
• Work closely with Software Engineers, Network Engineers, and Product teams to align platform reliability with business and user needs.
• Produce clear technical documentation for runbooks, operational procedures, and automation frameworks to improve team efficiency and reliability standards.
• Coach and mentor team members in best practices for site reliability, incident handling, automation, and low-level Linux systems debugging.
Job Requirements
- Proficiency in PHP with strong scripting and automation skills.
- Experience running large-scale distributed systems and control plane infrastructure in production.
- Strong background in hypervisor technologies (libvirt, QEMU, KVM) and Linux systems administration.
- Expertise in networking protocols and tools, particularly BGP and Open vSwitch (OVS), with automation experience.
- Deep knowledge of observability and monitoring frameworks (Grafana, Sentry, SumoLogic) and incident management.
- Advanced troubleshooting skills across compute, networking, and storage subsystems.
- Experience building and maintaining CI/CD pipelines (GitLab) and configuration management (Puppet).
- Familiarity with MySQL or similar databases, with an understanding of operational considerations for reliability and scale.
- Strong problem-solving abilities and the drive to tackle complex, low-level reliability challenges.
- Effective cross-team communication and collaboration skills.
- A commitment to continuous improvement and fostering a culture of operational excellence.
Benefits
- Excellent Medical Benefits w/ 100% company paid premiums for employee only plan + 100% company paid dental & vision premiums
- 401(k) plan that matches 100% up to 4% with immediate vesting
- Professional Development Reimbursement of $2,500 each year
- 11 Holidays + Paid Time Off Accrual + Rollover Plan
- Increased PTO at 3 year & 10 year anniversary + 1 month paid sabbatical every 5 years + Anniversary Bonus each year
- $500 first year remote office setup + $400 each following year for new equipment
- Internet reimbursement up to $75 per month
- Gym membership reimbursement up to $50 per month
- Company paid Wellable subscription
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
DevOps Engineer16 days ago
Full TimeRemoteTeam 51-200
Site Reliability Engineer ensuring performance of Crunchafi’s cloud-based SaaS platform
AzureCloudDNSDockerKubernetesPythonSQLTerraformGo
Wisconsin
DevOps Engineer16 days ago
Full TimeRemoteTeam 11-50Since 2020
Senior DevOps Engineer designing infrastructure for observability platform
AWSAzureDistributed SystemsDNSDockerGoogle Cloud PlatformGrafanaKubernetesPrometheusPythonTerraformGo
United States
DevOps Engineer – Mission-Critical Systems
Tactibit TechnologiesMission-focused, innovative, and agile cybersecurity and IT operations support for the most demanding missions.
DevOps Engineer16 days ago
Full TimeRemoteTeam 11-50
DevOps Engineer modernizing legacy architectures for critical mission systems
AWSCloud
Maryland
Lead Site Reliability Engineer
DraftKings Inc.Defining what it means to build and deliver the most extraordinary sports & entertainment experiences.The Crown is Yours
DevOps Engineer16 days ago
Full TimeRemoteTeam 1,001-5,000Since 2012H1B No Sponsor
Lead Site Reliability Engineer at DraftKings enhancing infrastructure reliability and efficiency
AnsibleAWSChefCloudDockerElixirGoogle Cloud PlatformIoTJavaKubernetesLinuxPythonRubyTerraformGo.NET