Leidos

Leidos is an innovation company rapidly addressing the world’s most vexing challenges in national security and health.

Site Reliability Engineer, Artificial Intelligence Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteTeam 10,001+Since 1969H1B SponsorCompany SiteLinkedIn

Location

California + 3 moreAll locations: California, Hawaii, Virginia, Washington

Posted

24 days ago

Salary

$131.3K - $237.4K / year

Bachelor Degree5 yrs expEnglishDistributed Systems

Job Description

• Design, develop, and maintain AI/ML models for anomaly detection, trend analysis, and signal correlation across metrics, logs, traces, and events. • Reduce alert noise through intelligent alert grouping, suppression, and prioritization. • Enhance observability platforms with AI-generated insights supporting SLO and error-budget management. • Implement AI-driven incident classification, enrichment, and summarization. • Provide probable root-cause analysis recommendations based on historical and real-time telemetry. • Support on-call and incident response teams with AI-guided remediation suggestions. • Contribute AI insights to post-incident reviews and reliability improvement plans. • Apply AI techniques to identify repetitive operational tasks and automation opportunities. • Assist in generating, validating, and optimizing automation playbooks and workflows. • Analyze automation execution data to improve success rates, resiliency, and reuse. • Build and maintain AI-searchable knowledge repositories containing runbooks, SOPs, lessons learned, and historical incident data. • Enable natural-language access to operational knowledge for SREs and operations staff. • Develop predictive models for capacity planning, failure forecasting, configuration risk, and reliability debt identification. • Support proactive remediation strategies to prevent incidents before customer impact. • Assist SRE leadership in data-driven prioritization of reliability investments. • Ensure AI solutions adhere to organizational security, compliance, and data-handling policies. • Establish guardrails for AI recommendations and automation execution. • Promote transparency, explainability, and auditability of AI-driven operational decisions.

Job Requirements

  • Bachelor’s degree in computer science, Engineering, Information Systems, Data Science, or related discipline
  • 5+ years in Site Reliability Engineering, DevOps, IT Operations, or Systems Engineering
  • 2+ years applying AI/ML techniques in operational, analytics, or automation contexts
  • Demonstrated experience supporting production systems in high-availability environments
  • Must have an active Secret Clearance in order to be considered for the position
  • Proficiency in data analysis tooling
  • Experience with machine learning fundamentals (anomaly detection, clustering, time-series analysis, NLP)
  • Familiarity with observability platforms (metrics, logs, traces, events)
  • Experience with automation frameworks and infrastructure-as-code concepts
  • Strong understanding of distributed systems and operational telemetry

Benefits

  • Competitive compensation
  • Health and Wellness programs
  • Income Protection
  • Paid Leave
  • Retirement

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Software Engineer II – SRE/DevOps

Flowhub

The cannabis retail platform for modern dispensaries. Making safe cannabis products accessible to every adult on Earth.

DevOps Engineer24 days ago
Full TimeRemoteTeam 51-200Since 2015H1B No Sponsor

SRE/DevOps Engineer focusing on operational stability for a cannabis retail platform

CloudGoogle Cloud PlatformGrafanaKubernetesPrometheusTerraform
United States
$115K - $145K / year
DevOps Engineer24 days ago
Full TimeRemoteTeam 10,001Since 1914

DevOps Engineer The Opportunity: Everyone is trying to “harness the cloud,” but not everyone knows how. As a DevOps engineer, you’re eager to develop, manage, and secure a container platform that meets your client’s needs and takes advantage of cloud capabilities. We need you to ...

California
$61.9K - $141K / year
DevOps Engineer24 days ago
Full TimeRemoteTeam 59

As a Site Reliability Engineer, you'll build and maintain infrastructure for ML models, automate processes, and collaborate cross-functionally.

Circle CiCloudFormationElk StackGithub ActionsGitlab CiGrafanaJenkinsKubernetesOpentelemetryPrometheusPulumiTerraform
New York + 1 moreAll locations: New York, California
$150K - $250K / year

Multigres Deployment Engineer

Supabase

Build in a weekend. Scale to millions.

DevOps Engineer24 days ago
Full TimeRemoteTeam 51-200Since 2020H1B No Sponsor

The Multigres Deployment Engineer will manage deployment infrastructure, build tools for Kubernetes-based Postgres systems, and ensure operational excellence.

AksCiliumCsi DriversEksGkeGoIstioKubernetesPulumiTerraform
United States