Intellum

We help large brands and fast-moving companies increase revenue and decrease support costs through education.

Lead Software Engineer

Full TimeRemoteTeam 51-200Since 2016H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

45 days ago

Salary

Not specified

Bachelor Degree10 yrs expEnglishAnsibleAWSCloudGoogle Cloud PlatformKubernetesPostgresPrometheusRubyRuby ON RailsSQLTerraform

Job Description

• SRE Leadership & Strategy: Set clear goals for the SRE team and partner with Engineering leadership to align platform initiatives with business objectives. • Reliability & Observability (SLA/SLO): Lead the definition and enforcement of SLAs, SLIs, and SLOs. Architect observability frameworks to translate telemetry data into actionable roadmaps that reduce toil and enhance resilience. • Core Engineering & Performance: Take ownership of critical code components (i.e., Queues, Enrollments) and lead efforts to identify bottlenecks, optimize performance, and improve code quality across the engineering department. • Security by Design: Champion infrastructure security. Partner with InfoSec to define hardening standards, manage perimeter defense (WAF/DDoS), and automate vulnerability remediation within the CI/CD pipeline. • Incident Command: Participate in the 24x7 on-call rotation and lead post-incident reviews (RCAs), ensuring action items are implemented to improve MTTR and prevent recurrence. • Mentorship: Empower developers with better tooling and guidance on performant coding practices, fostering a culture of collaboration and reliability and "you build it, you run it".

Job Requirements

  • 10+ years of engineering experience, with 5+ years specifically developing Ruby on Rails applications.
  • Expertise in Cloud Computing (AWS/GCP) and Infrastructure as Code (Terraform/Ansible).
  • Strong proficiency with SQL databases (PostgreSQL) and the ability to quickly navigate and optimize complex, unfamiliar codebases.
  • Deep Observability: Proven experience designing monitoring solutions (Datadog, New Relic, Prometheus) based on the "Golden Signals".
  • SLO Governance: Demonstrated ability to define SLIs/SLOs from scratch, negotiate Error Budgets, and use data to balance feature velocity with reliability.
  • Security Focus: Experience securing cloud environments and container platforms (Kubernetes), including hands-on management of WAF rules and edge security.
  • Incident Management: Experience leading post-incident reviews (RCAs) and implementing action items that directly improve MTTR (Mean Time to Recovery) and MTTD (Mean Time to Detection).
  • Proven experience leading technical teams, mentoring engineers, and working in a team-oriented, collaborative environment with strong communication skills.
  • Documenting solutions and training operational teams on how to effectively support and maintain systems.
  • Proactive Problem-Solving: Demonstrated ability to communicate clearly, seek help proactively, and take ownership of tasks, leading them to completion.

Benefits

  • Medical - 100% of employee premiums for selected individual plans
  • Dental - 100% of employee premiums covered
  • Vision - 100% of employee premiums covered
  • LinkedIn Learning
  • 401(k) plus matching (US Based Only)
  • Unlimited PTO
  • Calm subscription
  • Annual Company Retreat

Related Job Pages