Leidos is an innovation company rapidly addressing the world’s most vexing challenges in national security and health.

Site Reliability Engineer, Artificial Intelligence Engineer

DevOps EngineerDevOps EngineerFull TimeRemoteTeam 10,001+Since 1969H1B SponsorCompany Site LinkedIn

Location

California + 3 more

Posted

24 days ago

Salary

$131.3K - $237.4K / year

Bachelor Degree5 yrs expEnglishDistributed Systems

Job Description

• Design, develop, and maintain AI/ML models for anomaly detection, trend analysis, and signal correlation across metrics, logs, traces, and events. • Reduce alert noise through intelligent alert grouping, suppression, and prioritization. • Enhance observability platforms with AI-generated insights supporting SLO and error-budget management. • Implement AI-driven incident classification, enrichment, and summarization. • Provide probable root-cause analysis recommendations based on historical and real-time telemetry. • Support on-call and incident response teams with AI-guided remediation suggestions. • Contribute AI insights to post-incident reviews and reliability improvement plans. • Apply AI techniques to identify repetitive operational tasks and automation opportunities. • Assist in generating, validating, and optimizing automation playbooks and workflows. • Analyze automation execution data to improve success rates, resiliency, and reuse. • Build and maintain AI-searchable knowledge repositories containing runbooks, SOPs, lessons learned, and historical incident data. • Enable natural-language access to operational knowledge for SREs and operations staff. • Develop predictive models for capacity planning, failure forecasting, configuration risk, and reliability debt identification. • Support proactive remediation strategies to prevent incidents before customer impact. • Assist SRE leadership in data-driven prioritization of reliability investments. • Ensure AI solutions adhere to organizational security, compliance, and data-handling policies. • Establish guardrails for AI recommendations and automation execution. • Promote transparency, explainability, and auditability of AI-driven operational decisions.

Job Requirements

Bachelor’s degree in computer science, Engineering, Information Systems, Data Science, or related discipline
5+ years in Site Reliability Engineering, DevOps, IT Operations, or Systems Engineering
2+ years applying AI/ML techniques in operational, analytics, or automation contexts
Demonstrated experience supporting production systems in high-availability environments
Must have an active Secret Clearance in order to be considered for the position
Proficiency in data analysis tooling
Experience with machine learning fundamentals (anomaly detection, clustering, time-series analysis, NLP)
Familiarity with observability platforms (metrics, logs, traces, events)
Experience with automation frameworks and infrastructure-as-code concepts
Strong understanding of distributed systems and operational telemetry

Benefits

Competitive compensation
Health and Wellness programs
Income Protection
Paid Leave
Retirement

Related Categories

DevOps Engineer

Related Job Pages

DevOps Engineer Jobs in California Remote Full-time Jobs (US)More US Remote Jobs

More DevOps Engineer Jobs

Software Engineer II – SRE/DevOps

Flowhub

The cannabis retail platform for modern dispensaries. Making safe cannabis products accessible to every adult on Earth.

DevOps Engineer24 days ago

Full TimeRemoteTeam 51-200Since 2015H1B No Sponsor

Company Site LinkedIn

SRE/DevOps Engineer focusing on operational stability for a cannabis retail platform

CloudGoogle Cloud PlatformGrafanaKubernetesPrometheusTerraform

View details: Software Engineer II – SRE/DevOps

United States

$115K - $145K / year

Apply

DevOps Engineer

Booz Allen Hamilton

DevOps Engineer24 days ago

Full TimeRemoteTeam 10,001Since 1914

Company Site

DevOps Engineer The Opportunity: Everyone is trying to “harness the cloud,” but not everyone knows how. As a DevOps engineer, you’re eager to develop, manage, and secure a container platform that meets your client’s needs and takes advantage of cloud capabilities. We need you to ...

View details: DevOps Engineer

California

$61.9K - $141K / year

Apply

Site Reliability Engineer (SRE)

Baseten

DevOps Engineer24 days ago

Full TimeRemoteTeam 59

Company Site

As a Site Reliability Engineer, you'll build and maintain infrastructure for ML models, automate processes, and collaborate cross-functionally.

Circle CiCloudFormationElk StackGithub ActionsGitlab CiGrafanaJenkinsKubernetesOpentelemetryPrometheusPulumiTerraform

View details: Site Reliability Engineer (SRE)

New York + 1 more

$150K - $250K / year

Apply

Multigres Deployment Engineer

Supabase

Build in a weekend. Scale to millions.

DevOps Engineer24 days ago

Full TimeRemoteTeam 51-200Since 2020H1B No Sponsor

Company Site LinkedIn

The Multigres Deployment Engineer will manage deployment infrastructure, build tools for Kubernetes-based Postgres systems, and ensure operational excellence.

AksCiliumCsi DriversEksGkeGoIstioKubernetesPulumiTerraform

View details: Multigres Deployment Engineer

United States

Apply

Site Reliability Engineer, Artificial Intelligence Engineer

Job Description

Job Requirements

Benefits

Related Guides

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Software Engineer II – SRE/DevOps

DevOps Engineer

Site Reliability Engineer (SRE)

Multigres Deployment Engineer