Innovaccer

Two years in a row: Innovaccer Awarded Best in KLAS Data & Analytics Platforms Category.

Site Reliability Engineer II

DevOps EngineerDevOps EngineerFull TimeRemoteTeam 1,001-5,000Since 2014H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

8 days ago

Salary

Not specified

4 yrs expEnglishAWSAzureCloudElastic SearchGoogle Cloud PlatformJenkinsKafkaKubernetesLinuxMongo DBPostgre SQLPrometheusPython

Job Description

• Take ownership of SRE pillars: Deployment, Reliability, Scalability, Service Availability (SLA/SLO/SLI), Performance, and Cost. • Lead production rollouts of new releases and emergency patches using CI/CD pipelines while continuously improving deployment processes. • Establish robust production promotion and change management processes with quality gates across Dev/QA teams. • Roll out a complete observability stack across systems to proactively detect and resolve outages or degradations. • Analyze production system metrics, optimize system utilization, and drive cost efficiency. • Manage autoscaling of the platform during peak usage scenarios. • Perform triage and RCA by leveraging observability toolchains across the platform architecture. • Reduce escalations to higher-level teams through proactive reliability improvements. • Participate in the 24x7 OnCall Production Support team. • Lead monthly operational reviews with executives covering KPIs such as uptime, RCA, CAP (Corrective Action Plan), PAP (Preventive Action Plan), and security/audit reports. • Operate and manage production and staging cloud platforms, ensuring uptime and SLA adherence. • Collaborate with Dev, QA, DevOps, and Customer Success teams to drive RCA and product improvements. • Implement security guidelines (e.g., DDoS protection, vulnerability management, patch management, security agents). • Manage least-privilege RBAC for production services and toolchains. • Build and execute Disaster Recovery plans and actively participate in Incident Response.

Job Requirements

  • 4–7 years in production engineering, site reliability, or related roles.
  • Solid hands-on experience with at least one cloud provider (AWS, Azure, GCP) with automation focus (certifications preferred).
  • Strong expertise in Kubernetes and Linux.
  • Proficiency in scripting/programming (Python required).
  • Observability is very critical for the scale of our systems and ability to find insights/behavior, detect problem/failures. Looking for leads to drive this charter spanning across logs, metrics, mesh, tracing etc.
  • Knowledge of CI/CD pipelines and toolchains (Jenkins, ArgoCD, GitOps).
  • Familiarity with persistence stores (Postgres, MongoDB), data warehousing (Snowflake, Databricks), and messaging (Kafka).
  • Exposure to monitoring/observability tools such as ElasticSearch, Prometheus, Jaeger, NewRelic, etc.
  • Proven experience in production reliability, scalability, and performance systems.
  • Experience in 24x7 production environments with process focus.
  • Familiarity with ticketing and incident management systems.
  • Security-first mindset with knowledge of vulnerability management and compliance.
  • Excellent judgment, analytical thinking, and problem-solving skills.
  • Strong sense of personal responsibility and accountability for delivering high quality work.

Benefits

  • Generous Paid Time Off: Recharge and relax with 22 days of fixed time off per year, in addition to company holidays—because we believe work-life balance fuels performance.
  • Best-in-Class Parental Leave: Spend quality time with your growing family. We offer one of the industry’s most generous parental leave policies to support you during life’s most important moments.
  • Recognition & Rewards: We celebrate wins—big and small. Get rewarded with monetary incentives and company-wide recognition for your impact and dedication. Your hard work won’t go unnoticed.
  • Comprehensive Insurance Coverage: Stay covered with medical, dental, and vision insurance, plus 100% company-paid short- and long-term disability and basic life insurance. Optional perks include discounted legal aid and pet insurance.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

3954- Site Reliability Engineer II

Innovaccer

Two years in a row: Innovaccer Awarded Best in KLAS Data & Analytics Platforms Category.

DevOps Engineer8 days ago
Full TimeRemoteTeam 1,001-5,000Since 2014H1B Sponsor

About the RoleWe at Innovaccer are looking for a Site Reliability Engineer-II to build secured modern healthcare cloud infrastructure and a massive data stack and aim to write everything as codeA Day in the LifeTake ownership of SRE pillars: Deployment...

United States

Forward Deployment Engineer

Toast

We empower the restaurant community to delight guests, do what they love, and thrive.

DevOps Engineer8 days ago
Full TimeRemoteTeam 1,001-5,000Since 2013H1B Sponsor

Forward Deployment Engineer architecting AI solutions for Toast's revenue operations

United States
$200K - $320K / year

Senior Developer/DevSecOps Engineer

Govcio

GovCIO is a team of transformers--people who are passionate about transforming government IT. Every day, we make a positive impact by delivering innovative IT services and solutions that improve how government agencies operate and serve our citizens. We are changing the face of government IT and building a workforce that fuels this mission. Are you ready to be a transformer? What You Can Expect If you are selected to move forward through the process, here’s what you can expect: Virtual video interview conducted via video with the hiring manager and/or team Camera must be on A valid photo ID must be presented during each interview During the Hiring Process Enhanced Biometrics ID verification screening Background check, to include: Criminal history (past 7 years) Verification of your highest level of education Verification of your employment history (past 7 years), based on information provided in your application Criminal history (past 7 years) Verification of your highest level of education Verification of your employment history (past 7 years), based on information provided in your application Equal Opportunity Employer All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, disability, or status as a protected veteran. EOE, including disability/vets. Posted Pay Range The posted pay range, if referenced, reflects the range expected for this position at the commencement of employment, however, base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, education, experience, and internal equity. The total compensation package for this position may also include other compensation elements, to be discussed during the hiring process. If hired, employee will be in an “at-will position” and the GovCIO reserves the right to modify base salary (as well as any other discretionary payment or compensation program) at any time, including for reasons related to individual performance, GovCIO or individual department/team performance, and market factors. Posted Salary Range USD $17.53 - USD $27.50 /Hr.

DevOps Engineer8 days ago
Full TimeRemote

GovCIO is currently hiring for a Senior Developer/DevSecOps Engineer to support our client’s contract needs. This position is located in the Washington, DC and will be a remote position with intermittent visits to customer location. JBOSS Install JBoss EAP on supported platforms ...

United States
Full TimeRemote

ARES is seeking a remote Aerospace Reliability Engineer to directly support the Orion Spacecraft team – NASA’s spacecraft designed for long-duration, human-rated deep space exploration, including Artemis missions to the Moon and beyond. Supporting Lockheed Martin's reliability en...

United States