Principal Engineer, Operational Excellence – Resilience

Full-stack EngineerSoftware EngineerFull TimeRemoteTeam 5,001-10,000Since 2011H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

151 days ago

Salary

$145K - $220K / year

Bachelor Degree10 yrs expEnglishAWSAzureCloudGoogle Cloud Platform

Job Description

• Facilitate coordination between stakeholders across IT, Product, Engineering, and business units, serving as the central point for technology resilience initiatives and ensuring alignment with business objectives • Own and maintain enterprise-wide technology resilience standards, ensuring consistent implementation and reducing organizational drift from established frameworks across infrastructure, application, and product domains • Drive comprehensive technical resilience architecture including infrastructure redundancy and fault tolerance, application resilience and graceful degradation strategies, and chaos engineering frameworks for continuous resilience validation • Lead enterprise technical recovery strategy development and implementation, including backup and redundancy systems, recovery time/point objectives (RTO/RPO) for technical systems, and data recovery/restoration procedures • Partner to define and implement resilience standards, including feature flagging, release, testing, multi-tenancy frameworks, and scalability frameworks to manage growth • Provide technical oversight and aggregation of technology resilience risks across the enterprise, establishing and monitoring key performance indicators including system uptime • Drive chaos engineering and resilience testing programs, establishing enterprise-wide practices for proactive resilience validation and continuous improvement • Own shared resilience tooling strategy, evaluation, and implementation to support enterprise-wide capabilities including monitoring, testing, and recovery automation • Build and maintain formal networks with key constituents across business units, engineering teams, and external partners • Serve as senior technical advisor during major incident response, providing expertise on technical recovery strategies and coordinating cross-functional recovery efforts • Drive innovation in resilience practices, identifying emerging technologies and methodologies to advance CrowdStrike's competitive resilience advantage • Provide strategic guidance and expertise to junior team members and cross-functional partners on resilience engineering best practices

Job Requirements

  • 10+ years of direct experience in technology resilience, disaster recovery, site reliability engineering, or related technical disciplines, with demonstrated expertise in enterprise-scale cloud-native environments
  • Deep understanding of infrastructure redundancy patterns, application resilience design, chaos engineering principles, and enterprise disaster recovery strategies across hybrid cloud architectures
  • Proven experience with feature management systems, progressive deployment strategies, multi-tenant architecture resilience, and scalability engineering practices
  • Proven ability to drive strategic initiatives across large technology organizations, with experience influencing senior stakeholders and leading complex, cross-functional resilience programs
  • Experience establishing and monitoring resilience KPIs, including system uptime, MTTR, RTO/RPO objectives, and deployment success metrics
  • Advanced certifications in disaster recovery, cloud architecture, or site reliability disciplines (e.g., DRCS, CISSP, AWS/Azure/GCP architecture certifications)
  • Exceptional written and oral communication skills, including experience developing and delivering strategic briefings to executive leadership and technical teams
  • Advanced analytical and conceptual thinking abilities, with proven track record of solving complex, ambiguous resilience challenges with enterprise-wide impact
  • Demonstrated ability to build formal networks and influence stakeholders across engineering, product, and business organizations
  • Bachelor's degree in Computer Science, Information Systems, Engineering, Risk/Resilience, or equivalent practical experience

Benefits

  • Remote-friendly and flexible work culture
  • Market leader in compensation and equity awards
  • Comprehensive physical and mental wellness programs
  • Competitive vacation and holidays for recharge
  • Paid parental and adoption leaves
  • Professional development opportunities for all employees regardless of level or role
  • Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections
  • Vibrant office culture with world class amenities
  • Great Place to Work Certified™ across the globe

Related Job Pages

More Full-stack Engineer Jobs

Senior Software Engineer – Privacy

Brave Software

We're building a more private Internet by stopping trackers, making Web3 more accessible and reimagining advertising.

Full-stack Engineer151 days ago
Full TimeRemoteTeam 51-200Since 2016H1B Sponsor

Senior Software Engineer working on privacy features for Brave browser

United States
Full-stack Engineer151 days ago
Full TimeRemoteTeam 1-10H1B No Sponsor

Principal Software Engineer designing and maintaining applications for O'Reilly Auto Parts

SDLC
Missouri
$119.2K - $178.8K / year

Regional Technical Leader

Altium Packaging

Customer Centric Packaging Solutions. Always Made Right®

Full-stack Engineer151 days ago
Full TimeRemoteTeam 1,001-5,000H1B No Sponsor

Regional Technical Leader supporting manufacturing teams at Altium Packaging

California
$94.6K - $117.3K / year

Senior Software Engineer

Evolve

We make vacation rental easy for everyone. For owners, for guests, for you.

Full-stack Engineer152 days ago
Full TimeRemoteTeam 1,001-5,000H1B Sponsor

Senior Software Engineer developing scalable solutions for vacation rental platform

Cloud
California + 5 moreAll locations: California, New York, Maryland, Pennsylvania, Rhode Island, Washington
$141K - $184K / year