The College Board

Clearing a path for all students to own their future

Lead Engineer, Enterprise Incident & Change Management

Full-stack EngineerSoftware EngineerFull TimeRemoteTeam 1,001-5,000Since 1900H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

19 days ago

Salary

$168K - $183K / year

Bachelor Degree7 yrs expEnglishAnsibleAWSCloudDynamo DBEC2GrafanaJava ScriptJenkinsMicroservicesNode.jsPrometheusPythonReactTerraformType Script

Job Description

• Design and Implementation (60%) • Evaluate incident and change management frameworks using data-driven insights to identify opportunities for improvement that will provide value to the EIM team and engineering teams. • Design and implement automation solutions for incident response and management, change management, and observability leveraging input and feedback from domain SMEs and end users. • Develop and maintain scripts, tools, and integrations to reduce manual processes and operational overhead. • Define key performance indicators (KPIs) and metrics to measure the success of automation and improvement efforts and develop and enhance dashboards and reporting mechanisms to measure KPIs as well as incident and change management performance. • Ensure compliance with governance, risk, and change control policies while promoting agility and innovation. • Lead cross-functional initiatives and partner with domain SMEs to analyze, design, and deliver powerful features, capabilities, and automation strategies that align with engineering best practices. • Serve as a subject matter expert (SME) for cloud operations, infrastructure automation, and CI/CD pipelines. • Strategy, Operations Support, and Communication (25%) • Collaborate with the EIM team’s director and other technology leaders to understand business objectives and team goals and to align solutions and process improvement efforts with those goals. • Contribute to the long-term technology strategy by researching emerging trends, evaluating new tools (especially AI-driven tools that support observability), and recommending technologies or automations that improve cost-effectiveness, metrics delivery to evaluate performance, and system and process efficiency. • Participate in weekly on-call and incident response rotations responsible for monitoring alerts to identify potential issues, ensuring timely triage and escalation of incidents, collaborating with impacted teams, and supporting assessment, response, and communication to bring the incident to resolution. • Play an active role in agile scrum ceremonies while contributing to high-quality team deliverables. • Team Coordination (15%) • Provide technical direction and guidance to team members, ensuring alignment with architectural standards, best practices and organizational objectives. • Review designs, automation scripts, and implementation plans, offering constructive feedback to improve quality, efficiency, and maintainability. • Foster a culture of continuous learning and collaboration by mentoring engineers in modern automation, cloud infrastructure, and operational excellence.

Job Requirements

  • 7 + years of software development experience with Infrastructure as Code (IaC), CI/CD framework, immutable infrastructure, automation, orchestration, and other modern DevOps patterns.
  • Strong proficiency in IaC tools (e.g., Terraform, CloudFormation, Ansible) and experience with CI/CD pipeline design and automation using platforms such as Jenkins, GitLab CI, or GitHub Actions is a plus.
  • Strong knowledge and experience with distributed cloud infrastructure, including AWS resources such as Lambda, SNS, SQS, S3, Step Functions, EC2, ECS, VPC, IAM, CloudWatch, and DynamoDB.
  • Experience building event-driven cloud-based serverless applications, with technical knowledge of cloud computing, DevOps, and microservices.
  • Strong coding/scripting experience for automation and integration tasks using tools (e.g., JavaScript, TypeScript, React.js, and Node.js) and proficiency in scripting languages (Python, Bash, PowerShell, etc.).
  • Familiarity with AI tools used for observability (e.g., AWS resilience hub).
  • Familiarity with incident and change management systems (e.g., Jira Service Management).
  • Deep understanding of ITIL frameworks, especially incident, change, and problem management.
  • Experience integrating monitoring and alerting tools (e.g., Datadog, Prometheus, CloudWatch, Grafana).
  • Strong troubleshooting, analytical, and problem-solving skills.
  • Proven ability to lead technical initiatives, influence cross-functional teams, and prioritize and execute tasks in a high-pressure environment.
  • Excellent communication skills, with the ability to translate technical details into business outcomes.
  • Ability to take a weekly, on-call shift every month and a half.
  • Authorization to work in the U.S.

Benefits

  • Annual bonuses and opportunities for merit-based raises and promotions
  • A mission-driven workplace where your impact matters
  • A team that invests in your development and success

Related Job Pages

More Full-stack Engineer Jobs

Full-stack Engineer19 days ago
Full TimeRemoteTeam 1,001-5,000Since 2002H1B No Sponsor

Sr. Embedded Software Engineer developing modular software solutions at Lynx.

Linux
United States
$100K - $130K / year

Software Engineering Tech Lead – AI Product Development

CommandLink

#1 Global Platform To Simplify & Scale Your Telco, ISP, Network, Phone, & Security Stack.

Full-stack Engineer19 days ago
Full TimeRemoteTeam 201-500H1B Sponsor

Software Engineering Tech Lead building AI-powered products

Alabama + 20 moreAll locations: Alabama, Arizona, Florida, Kansas, Kentucky, Louisiana, Nevada, New Hampshire, North Carolina, Ohio, Oklahoma, Maryland, Michigan, Mississippi, Missouri, South Carolina, Tennessee, Texas, Utah, Virginia, Wisconsin
$105K - $140K / year

Staff Software Engineer – Platform

Rogo

Bespoke Generative AI for financial services.

Full-stack Engineer19 days ago
Full TimeRemoteTeam 11-50H1B No Sponsor

Staff Platform Engineer defining technical direction of AI platform at Rogo

Distributed SystemsKubernetes
New York

Senior Full Stack Developer

Serverless Guru

Your guide to AWS Excellence. Adopt, Migrate, Build. Hire our AWS Architects & Engineers to bring your vision to life!

Full-stack Engineer19 days ago
ContractRemoteTeam 51-200Since 2017H1B No Sponsor

Senior Full Stack Developer designing cloud-native web applications

AWSCloudJavaScriptNode.jsNoSQLReactSQLTerraformTypeScript
United States