We help large brands and fast-moving companies increase revenue and decrease support costs through education.

Lead Software Engineer

Full TimeRemoteTeam 51-200Since 2016H1B SponsorCompany Site LinkedIn

Location

United States

Posted

45 days ago

Salary

Not specified

Bachelor Degree10 yrs expEnglishAnsibleAWSCloudGoogle Cloud PlatformKubernetesPostgresPrometheusRubyRuby ON RailsSQLTerraform

Job Description

• SRE Leadership & Strategy: Set clear goals for the SRE team and partner with Engineering leadership to align platform initiatives with business objectives. • Reliability & Observability (SLA/SLO): Lead the definition and enforcement of SLAs, SLIs, and SLOs. Architect observability frameworks to translate telemetry data into actionable roadmaps that reduce toil and enhance resilience. • Core Engineering & Performance: Take ownership of critical code components (i.e., Queues, Enrollments) and lead efforts to identify bottlenecks, optimize performance, and improve code quality across the engineering department. • Security by Design: Champion infrastructure security. Partner with InfoSec to define hardening standards, manage perimeter defense (WAF/DDoS), and automate vulnerability remediation within the CI/CD pipeline. • Incident Command: Participate in the 24x7 on-call rotation and lead post-incident reviews (RCAs), ensuring action items are implemented to improve MTTR and prevent recurrence. • Mentorship: Empower developers with better tooling and guidance on performant coding practices, fostering a culture of collaboration and reliability and "you build it, you run it".

Job Requirements

10+ years of engineering experience, with 5+ years specifically developing Ruby on Rails applications.
Expertise in Cloud Computing (AWS/GCP) and Infrastructure as Code (Terraform/Ansible).
Strong proficiency with SQL databases (PostgreSQL) and the ability to quickly navigate and optimize complex, unfamiliar codebases.
Deep Observability: Proven experience designing monitoring solutions (Datadog, New Relic, Prometheus) based on the "Golden Signals".
SLO Governance: Demonstrated ability to define SLIs/SLOs from scratch, negotiate Error Budgets, and use data to balance feature velocity with reliability.
Security Focus: Experience securing cloud environments and container platforms (Kubernetes), including hands-on management of WAF rules and edge security.
Incident Management: Experience leading post-incident reviews (RCAs) and implementing action items that directly improve MTTR (Mean Time to Recovery) and MTTD (Mean Time to Detection).
Proven experience leading technical teams, mentoring engineers, and working in a team-oriented, collaborative environment with strong communication skills.
Documenting solutions and training operational teams on how to effectively support and maintain systems.
Proactive Problem-Solving: Demonstrated ability to communicate clearly, seek help proactively, and take ownership of tasks, leading them to completion.

Benefits

Medical - 100% of employee premiums for selected individual plans
Dental - 100% of employee premiums covered
Vision - 100% of employee premiums covered
LinkedIn Learning
401(k) plus matching (US Based Only)
Unlimited PTO
Calm subscription
Annual Company Retreat

Related Categories

Remote Full-stack Engineer Jobs in the US Remote Software Engineer Jobs in the US Remote Backend Engineer Jobs in the US Frontend Engineer Android Engineer Game Engineer iOS Engineer

Related Job Pages

Remote Full-stack Engineer Jobs in the US Remote Full-time Jobs (US)More US Remote Jobs