Senior Incident Manager

ManagerManagerFull TimeRemoteTeam 1,001-5,000Since 2013H1B SponsorCompany SiteLinkedIn

Location

California

Posted

125 days ago

Salary

$143.3K - $200.6K / year

Bachelor Degree5 yrs expEnglishAWSAzureCloudDistributed SystemsElastic SearchGoogle Cloud PlatformGrafanaPrometheusPythonSplunkGo

Job Description

• Lead critical incidents — coordinate multi-disciplinary response efforts across Databricks’ cloud-based services to rapidly mitigate impact and restore operations. • Drive technical root cause analysis and Reliability improvements: • collaborate with engineering teams to trace and document underlying causes across distributed systems, services, and data stores. • Summarize key learnings, clearly communicate action items, and ensure that technical and procedural improvements are followed through. • Own communications during incidents — deliver frequent, high-quality updates to internal stakeholders (executives, engineering leadership, support) and compose and publish customer-facing notifications that are accurate, timely, and empathetic. • Mentor and train peers in both incident communication and technical response disciplines to raise the overall quality of Databricks’ incident response.

Job Requirements

  • 5+ years of experience in incident management, site reliability engineering, or production operations supporting large-scale, cloud-native systems.
  • Proven ability to lead and coordinate high-severity incidents, including identifying impact, isolating fault domains, and managing multi-team response efforts.
  • Strong understanding of cloud infrastructure (AWS, Azure, or GCP) — including compute, networking, storage, and observability components.
  • Deep expertise in log analysis and debugging:
  • Familiarity with log aggregation and search tools (e.g., Datadog, Elasticsearch, Splunk, Cloud Logging, or OpenTelemetry).
  • Hands-on experience with observability systems — metrics, logging, and tracing frameworks (Prometheus, Grafana, OpenTelemetry, etc.).
  • Proficiency in at least one major programming or scripting language (Python, Go, or Bash) for automating diagnostics, data collection, or analysis.
  • Experience developing and maintaining incident playbooks and communication templates to ensure consistent, timely updates.
  • Excellent contextual interpretation and writing skills, as well as the ability to effectively summarize and communicate to both technical and business audiences, are required.
  • BS, Master's or other advanced degree in Computer Science or Computer Engineering, or related Engineering field.

Benefits

  • At Databricks, we strive to provide comprehensive benefits and perks that meet the needs of all of our employees.

Related Job Pages

More Manager Jobs

Construction Manager – Team Leader

Crest Industries

Made for the Challenge — Innovative project solutions to help you take control of your business and do more with less.

Manager125 days ago
Full TimeRemoteTeam 1,001-5,000Since 1958H1B No Sponsor

Construction Manager Team Leader overseeing high voltage projects in US

PMP
United States
$102K - $140K / year
Manager126 days ago
Full TimeRemoteTeam 10,001+H1B No Sponsor

Coalition Manager promoting and implementing Measure ULA in Los Angeles

California
$75K - $85K / year

Delivery Manager

AGENTIC

The Event for the Autonomous AI Era

Manager126 days ago
Full TimeRemoteTeam 11-50Since 2017H1B No Sponsor

Delivery Manager ensuring efficient technology project delivery

United States
Manager127 days ago
Full TimeRemoteTeam 11-50Since 2021H1B No Sponsor

Conducting medical management and health education programs for government health care customers

New Mexico
$45 / hour