Principal Engineer, Operational Excellence – Resilience
Full-stack EngineerSoftware EngineerFull TimeRemoteTeam 5,001-10,000Since 2011H1B SponsorCompany SiteLinkedIn
Location
United States
Posted
151 days ago
Salary
$145K - $220K / year
Bachelor Degree10 yrs expEnglishAWSAzureCloudGoogle Cloud Platform
Job Description
• Facilitate coordination between stakeholders across IT, Product, Engineering, and business units, serving as the central point for technology resilience initiatives and ensuring alignment with business objectives
• Own and maintain enterprise-wide technology resilience standards, ensuring consistent implementation and reducing organizational drift from established frameworks across infrastructure, application, and product domains
• Drive comprehensive technical resilience architecture including infrastructure redundancy and fault tolerance, application resilience and graceful degradation strategies, and chaos engineering frameworks for continuous resilience validation
• Lead enterprise technical recovery strategy development and implementation, including backup and redundancy systems, recovery time/point objectives (RTO/RPO) for technical systems, and data recovery/restoration procedures
• Partner to define and implement resilience standards, including feature flagging, release, testing, multi-tenancy frameworks, and scalability frameworks to manage growth
• Provide technical oversight and aggregation of technology resilience risks across the enterprise, establishing and monitoring key performance indicators including system uptime
• Drive chaos engineering and resilience testing programs, establishing enterprise-wide practices for proactive resilience validation and continuous improvement
• Own shared resilience tooling strategy, evaluation, and implementation to support enterprise-wide capabilities including monitoring, testing, and recovery automation
• Build and maintain formal networks with key constituents across business units, engineering teams, and external partners
• Serve as senior technical advisor during major incident response, providing expertise on technical recovery strategies and coordinating cross-functional recovery efforts
• Drive innovation in resilience practices, identifying emerging technologies and methodologies to advance CrowdStrike's competitive resilience advantage
• Provide strategic guidance and expertise to junior team members and cross-functional partners on resilience engineering best practices
Job Requirements
- 10+ years of direct experience in technology resilience, disaster recovery, site reliability engineering, or related technical disciplines, with demonstrated expertise in enterprise-scale cloud-native environments
- Deep understanding of infrastructure redundancy patterns, application resilience design, chaos engineering principles, and enterprise disaster recovery strategies across hybrid cloud architectures
- Proven experience with feature management systems, progressive deployment strategies, multi-tenant architecture resilience, and scalability engineering practices
- Proven ability to drive strategic initiatives across large technology organizations, with experience influencing senior stakeholders and leading complex, cross-functional resilience programs
- Experience establishing and monitoring resilience KPIs, including system uptime, MTTR, RTO/RPO objectives, and deployment success metrics
- Advanced certifications in disaster recovery, cloud architecture, or site reliability disciplines (e.g., DRCS, CISSP, AWS/Azure/GCP architecture certifications)
- Exceptional written and oral communication skills, including experience developing and delivering strategic briefings to executive leadership and technical teams
- Advanced analytical and conceptual thinking abilities, with proven track record of solving complex, ambiguous resilience challenges with enterprise-wide impact
- Demonstrated ability to build formal networks and influence stakeholders across engineering, product, and business organizations
- Bachelor's degree in Computer Science, Information Systems, Engineering, Risk/Resilience, or equivalent practical experience
Benefits
- Remote-friendly and flexible work culture
- Market leader in compensation and equity awards
- Comprehensive physical and mental wellness programs
- Competitive vacation and holidays for recharge
- Paid parental and adoption leaves
- Professional development opportunities for all employees regardless of level or role
- Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections
- Vibrant office culture with world class amenities
- Great Place to Work Certified™ across the globe
Related Guides
Related Job Pages
More Full-stack Engineer Jobs
Senior Software Engineer – Privacy
Brave SoftwareWe're building a more private Internet by stopping trackers, making Web3 more accessible and reimagining advertising.
Full-stack Engineer151 days ago
Full TimeRemoteTeam 51-200Since 2016H1B Sponsor
Senior Software Engineer working on privacy features for Brave browser
United States
Full-stack Engineer151 days ago
Full TimeRemoteTeam 1-10H1B No Sponsor
Principal Software Engineer designing and maintaining applications for O'Reilly Auto Parts
SDLC
Full-stack Engineer151 days ago
Full TimeRemoteTeam 1,001-5,000H1B No Sponsor
Regional Technical Leader supporting manufacturing teams at Altium Packaging
Senior Software Engineer
EvolveWe make vacation rental easy for everyone. For owners, for guests, for you.
Full-stack Engineer152 days ago
Full TimeRemoteTeam 1,001-5,000H1B Sponsor
Senior Software Engineer developing scalable solutions for vacation rental platform
Cloud
California + 5 moreAll locations: California, New York, Maryland, Pennsylvania, Rhode Island, Washington
$141K - $184K / year