SRE Senior Engineer/Specialist
Location
New York
Posted
22 days ago
Salary
Not specified
Bachelor Degree3 yrs expEnglishCloudGoogle Cloud PlatformKubernetesPythonTerraform
Job Description
• Write, configure, and deploy code that improves service reliability for existing or new systems; set standard for others with respect to code quality.
• Provide helpful and actionable feedback and review for code or production changes.
• Drive repair/optimization of complex systems with consideration towards a wide range of contributing factors.
• Lead debugging, troubleshooting, and analysis of service architecture and design.
• Participate in on-call rotation.
• Write documentation: design, system analysis, runbooks, playbooks. Provide design feedback and uplevel design skills of others.
• Implement and manage SRE monitoring applications using AI, Python, and Observability data.
• Develop tooling using Terraform and other IaC tools to ensure visibility and proactive issue detection across our platforms.
• Work within GCP infrastructure, optimizing performance, and cost, and scaling resources to meet demand.
• Collaborate with development teams to enhance system reliability and performance, applying a platform engineering mindset to system administration tasks.
• Develop and maintain AI-enhanced automated solutions for operational aspects such as on-call monitoring, performance tuning, and disaster recovery.
• Troubleshoot and resolve issues in our dev, test, and production environments.
• Participate in postmortem analysis and create preventative measures for future incidents.
• Implement and maintain security best practices across our infrastructure, ensuring compliance with industry standards and internal policies. Participate in security audits and vulnerability assessments.
• Participate in capacity planning and forecasting efforts to ensure our systems can handle future growth and demand. Analyze trends and make recommendations for resource allocation.
• Identify and address performance bottlenecks through code profiling, system analysis, and configuration tuning. Implement and monitor performance metrics to proactively identify and resolve issues.
• Develop, maintain, and test disaster recovery plans and procedures to ensure business continuity in the event of a major outage or disaster. Participate in regular disaster recovery exercises.
• Contribute to internal knowledge bases and documentation.
Job Requirements
- Bachelor’s degree in Computer Science, Engineering, Mathematics or equivalent work experience.
- 3+ years of experience as an SRE, DevOps Engineer, Software Engineer or similar role.
- Agentic AI and MCP development experience preferred.
- Strong experience with Python development and desired familiarity with Terraform Provider development.
- Experience with Dynatrace SaaS preferred.
- Proficient with monitoring and observability tools.
- Proficient with cloud services, with a strong preference for Kubernetes and Google Cloud Platform (GCP) experience.
- Solid programming skills in Python, with a good understanding of software development best practices.
- Experience with relational and document databases.
- Ability to debug, optimize code, and automate routine tasks.
- Strong problem-solving skills and the ability to work under pressure in a fast-paced environment.
- Excellent verbal and written communication skills.
Benefits
- Immediate medical, dental, and prescription drug coverage
- Flexible family care, parental leave, new parent ramp-up programs, subsidized back-up child care and more
- Vehicle discount program for employees and family members, and management leases
- Tuition assistance
- Established and active employee resource groups
- Paid time off for individual and team community service
- A generous schedule of paid holidays, including the week between Christmas and New Year’s Day
- Paid time off and the option to purchase additional vacation time.
Related Guides
Related Categories
Related Job Pages
More DevOps Engineer Jobs
Advisor, Configuration – Release Engineering
MerativeA data and software partner for health and government social services, with tech and expertise to drive real progress.
DevOps Engineer22 days ago
Full TimeRemoteTeam 1,001-5,000Since 2022H1B Sponsor
Release and Maintenance Manager for Micromedex software releases
Cloud DevOps Engineer
Seamless Migration LLCDeveloper nerds who enable organizations through automation.
DevOps Engineer22 days ago
Full TimeRemoteTeam 1-10Since 2021H1B No Sponsor
Cloud Infrastructure DevOps Engineer supporting AWS, Azure, and GCP operations
AnsibleAWSAzureCloudGoogle Cloud PlatformKubernetesLinuxPythonTerraform
United States
DevOps Engineer22 days ago
Full TimeRemoteTeam 201-500H1B No Sponsor
DevOps Engineer managing AWS infrastructure for iGaming platforms at Heaven of 7.
AWSCloudDNSDockerLinuxNoSQLPostgresTerraform
United States
Technical Lead – Lead Developer, DevOps, Cybersecurity
StreamlineTaking companies to the next level with innovative IT solutions and strategies.
DevOps Engineer22 days ago
ContractRemoteTeam 51-200H1B Sponsor
Technical Lead developing DevOps strategies for a tech organization
AzureCloudCyber SecurityDockerKubernetesLinuxMicroservicesPython
United States