DigitalOcean
The cloud ☁️ of choice for developers, startups, and growing digital businesses around the world.
Staff Software Engineer
Location
Massachusetts
Posted
2 days ago
Salary
$191K - $239K / year
15 yrs expEnglishAnsibleCloudDistributed SystemsGrafanaG RPCKafkaMicroservicesNo SQLPrometheusRedisSQLTerraformGo
Job Description
• Architect, design, develop, and maintain scalable backend services and systems.
• Drive technical initiatives and large cross-team projects from concept to production.
• Collaborate with product managers, UX designers, and engineers across distributed teams to deliver end-to-end solutions.
• Develop deep expertise in observability tools and technologies such as Prometheus, Grafana, time-series databases, and distributed tracing.
• Build and maintain high-performance APIs and microservices using Go (Golang) and gRPC, integrating with systems like Kafka, Redis, and NoSQL databases.
• Work with Terraform and Ansible to automate infrastructure deployment and configuration management.
• Utilize knowledge of SQL for data analysis, service integration, and operational insights.
• Lead efforts in debugging, troubleshooting, and performance tuning of complex distributed systems.
• Champion operational excellence by improving reliability, monitoring, and alerting practices.
• Provide technical leadership, mentorship, and guidance to other engineers.
Job Requirements
- 15+ years of relevant industry experience building and operating large-scale cloud services or distributed systems in a fast-paced, high-growth environment.
- Strong programming experience in Go (Golang) and deep understanding of distributed systems fundamentals.
- Solid understanding of observability, monitoring, and alerting systems (e.g., Prometheus, Grafana).
- Experience working with OTEL (OpenTelemetry) Collector, including instrumentation, data pipelines, and telemetry ingestion for metrics, logs, and traces.
- Proven experience designing and implementing scalable event-driven architectures using Kafka or similar technologies.
- Experience with gRPC, Terraform, and Ansible for service communication and infrastructure automation.
- Working knowledge of SQL, Redis, and NoSQL databases.
- Demonstrated ability to drive operational excellence and improve system reliability.
- Experience making pragmatic technical trade-offs while balancing short-term needs and long-term goals.
- Excellent communication and collaboration skills, especially with geographically distributed teams.
- Strong ownership mindset and the ability to independently deliver high-impact projects.
Benefits
- Competitive salary
- Paid time off
- Professional development opportunities
- Flexible work hours
- Employee Assistance Program
- Local Employee Meetups
- Reimbursement for relevant conferences and training
- Access to LinkedIn Learning's courses
- Bonus eligibility based on performance
- Equity compensation for eligible employees