Senior Incident Manager
Location
California
Posted
125 days ago
Salary
$143.3K - $200.6K / year
Bachelor Degree5 yrs expEnglishAWSAzureCloudDistributed SystemsElastic SearchGoogle Cloud PlatformGrafanaPrometheusPythonSplunkGo
Job Description
• Lead critical incidents — coordinate multi-disciplinary response efforts across Databricks’ cloud-based services to rapidly mitigate impact and restore operations.
• Drive technical root cause analysis and Reliability improvements:
• collaborate with engineering teams to trace and document underlying causes across distributed systems, services, and data stores.
• Summarize key learnings, clearly communicate action items, and ensure that technical and procedural improvements are followed through.
• Own communications during incidents — deliver frequent, high-quality updates to internal stakeholders (executives, engineering leadership, support) and compose and publish customer-facing notifications that are accurate, timely, and empathetic.
• Mentor and train peers in both incident communication and technical response disciplines to raise the overall quality of Databricks’ incident response.
Job Requirements
- 5+ years of experience in incident management, site reliability engineering, or production operations supporting large-scale, cloud-native systems.
- Proven ability to lead and coordinate high-severity incidents, including identifying impact, isolating fault domains, and managing multi-team response efforts.
- Strong understanding of cloud infrastructure (AWS, Azure, or GCP) — including compute, networking, storage, and observability components.
- Deep expertise in log analysis and debugging:
- Familiarity with log aggregation and search tools (e.g., Datadog, Elasticsearch, Splunk, Cloud Logging, or OpenTelemetry).
- Hands-on experience with observability systems — metrics, logging, and tracing frameworks (Prometheus, Grafana, OpenTelemetry, etc.).
- Proficiency in at least one major programming or scripting language (Python, Go, or Bash) for automating diagnostics, data collection, or analysis.
- Experience developing and maintaining incident playbooks and communication templates to ensure consistent, timely updates.
- Excellent contextual interpretation and writing skills, as well as the ability to effectively summarize and communicate to both technical and business audiences, are required.
- BS, Master's or other advanced degree in Computer Science or Computer Engineering, or related Engineering field.
Benefits
- At Databricks, we strive to provide comprehensive benefits and perks that meet the needs of all of our employees.
Related Guides
Related Categories
Related Job Pages
More Manager Jobs
Construction Manager – Team Leader
Crest IndustriesMade for the Challenge — Innovative project solutions to help you take control of your business and do more with less.
Manager125 days ago
Full TimeRemoteTeam 1,001-5,000Since 1958H1B No Sponsor
Construction Manager Team Leader overseeing high voltage projects in US
PMP
Manager126 days ago
Full TimeRemoteTeam 10,001+H1B No Sponsor
Coalition Manager promoting and implementing Measure ULA in Los Angeles
Manager126 days ago
Full TimeRemoteTeam 11-50Since 2017H1B No Sponsor
Delivery Manager ensuring efficient technology project delivery
United States
Manager127 days ago
Full TimeRemoteTeam 11-50Since 2021H1B No Sponsor
Conducting medical management and health education programs for government health care customers