GROWE logo
GROWE

GROWE TOGETHER: Our team is our main asset. We work together and support each other to achieve our common goals. DRIVE RESULT OVER PROCESS: We set ambitious, clear, measurable goals in line with our strategy and driving Growe to success. BE READY FOR CHANGE: We see challenges as opportunities to grow and evolve. We adapt today to win tomorrow.

System Reliability Engineer/DevOps

DevOps EngineerDevOps EngineerFull TimeRemoteMid Level

Location

United States + 180 moreAll locations: United States, Canada, Brazil, Colombia, Argentina, Chile, Venezuela, Bolivarian Republic Of, Bolivia, Plurinational State Of, Ecuador, French Guiana, Guyana, Paraguay, Peru, Suriname, Uruguay, Mexico, Costa Rica, El Salvador, Guatemala, Honduras, Nicaragua, Panama, Dominican Republic, Puerto Rico, Bahamas, Guadeloupe, Haiti, Jamaica, Martinique, Montserrat, United Kingdom, Germany, France, Estonia, Portugal, Hungary, Poland, Ukraine, Romania, Bulgaria, Czech Republic, Slovakia, Belarus, Moldova, Republic Of, Sweden, Greece, Belgium, Italy, Ireland, Switzerland, Netherlands, Finland, Malta, Denmark, Lithuania, Croatia, Spain, Austria, Bosnia And Herzegovina, Iceland, Luxembourg, Macedonia, The Former Yugoslav Republic Of, Montenegro, Norway, Serbia, Slovenia, Albania, Cyprus, Latvia, Monaco, South Africa, Egypt, Algeria, Angola, Benin, Botswana, Burkina Faso, Burundi, Cameroon, Cape Verde, Central African Republic, Chad, Congo, Côte D'ivoire, Congo, The Democratic Republic Of The, Equatorial Guinea, Eritrea, Ethiopia, Gabon, Gambia, Ghana, Guinea, Guinea-bissau, Kenya, Lesotho, Liberia, Libyan Arab Jamahiriya, Madagascar, Malawi, Mali, Mauritania, Mauritius, Mayotte, Morocco, Mozambique, Namibia, Niger, Nigeria, Réunion, Rwanda, Senegal, Seychelles, Sierra Leone, Somalia, Sudan, Swaziland, Tanzania, United Republic Of, Togo, Tunisia, Uganda, Zambia, Zimbabwe, Georgia, Turkey, Israel, United Arab Emirates, Armenia, Azerbaijan, Bahrain, Iraq, Jordan, Kuwait, Lebanon, Oman, Qatar, Saudi Arabia, Palestinian Territory, Occupied, Yemen, India, Japan, Philippines, Pakistan, Thailand, Singapore, Viet Nam, Taiwan, Province Of China, Indonesia, Cambodia, Lao People's Democratic Republic, Malaysia, Myanmar, Korea, Republic Of, China, Afghanistan, Bangladesh, Bhutan, Kazakhstan, Kyrgyzstan, Maldives, Mongolia, Nepal, Sri Lanka, Tajikistan, Turkmenistan, Uzbekistan, Australia, Papua New Guinea, Kiribati, Palau, French Polynesia, Tuvalu, New Zealand

Posted

1 day ago

Salary

Not specified

Seniority

Mid Level

AWSTerraformTerragruntKubernetesDockerHelmGitLab CIPrometheusGrafanaLokiVictoriaMetricsCloudWatchAnsiblePythonBashECSEKSRDSS3VPCRoute53KMSACMFluxCDArgo RolloutsKEDAVPAKarpenteringress-nginxPingdomPagerDutyAlertmanagerOpenSearchVector AgentNetwork FirewallTransit GatewaySite-to-Site VPNVaultSOPSCloudflareAWS Cost ExplorerKubeCost

Job Description

Role Description

Growe welcomes those who are excited to:

  • Ensure availability, performance, and scalability of infrastructure and services through monitoring, automation, and operational best practices;
  • Lead incident response, perform root cause analysis, and implement recovery and long-term fixes;
  • Manage infrastructure using Terraform, Terragrunt, and automation tools for consistency and repeatability;
  • Implement and maintain metrics, logs, and tracing solutions (Prometheus, Grafana, Loki, VictoriaMetrics, CloudWatch) to ensure system visibility;
  • Identify bottlenecks, tune systems, and improve infrastructure performance;
  • Monitor resources, forecast growth, and implement scaling strategies;
  • Integrate security best practices into IaC, CI/CD pipelines, and deployments;
  • Support vulnerability management;
  • Participate in 24/7 rotations (once a week) for timely resolution of critical incidents;
  • Work with DevOps, PRE, development, and security teams to improve reliability and design resilient systems;
  • Maintain operational runbooks, incident reports, and system documentation.

Qualifications

  • 3+ years in a DevOps, SRE, or related role;
  • Strong hands-on experience with AWS services including EC2, ECS, EKS, RDS, DocumentDB, ElastiCache, Keyspaces, S3, EBS, VPC, Route53, KMS, ACM, and CloudWatch;
  • Proficiency with Terraform, Terragrunt, and Atlantis for reproducible and version-controlled infrastructure;
  • Experience with GitLab CI, FluxCD, Argo Rollouts, and automation tools (Ansible, Python, Bash);
  • Solid experience with Docker, Kubernetes (AWS EKS), and Helm (including custom templates, ChartMuseum);
  • Familiarity with cluster add-ons such as KEDA, VPA, Karpenter, External-DNS, ingress-nginx, aws-alb-controller, and ebs-csi-driver;
  • Hands-on experience with Grafana, VictoriaMetrics stack, Tempo, metrics exporters, Pingdom, AWS CloudWatch, and alerting systems like PagerDuty, VMAlert, and Alertmanager;
  • Proficiency with Grafana Loki, OpenSearch, and Vector Agent for centralized logging;
  • Strong understanding of networking concepts, AWS networking (VPC, Network Firewall, Transit Gateway, Site-to-Site VPN), identity and access management, certificate management (ACM, Vault, SOPS), and application security best practices;
  • Familiarity with Cloudflare services, including caching, DNS, and Workers;
  • Exposure to AWS Cost Explorer, KubeCost, and custom cost export tools;
  • Certifications: AWS, Terraform, Kubernetes, or Helm are a plus.

Requirements

  • Problem-Solving Mindset: Approaches complex issues methodically and finds practical solutions under pressure;
  • Analytical Thinking: Able to interpret metrics, logs, and system behavior to make informed decisions;
  • Attention to Details: Ensures accuracy in infrastructure changes, configurations, and deployment processes;
  • Adaptability: Comfortable learning new tools, technologies, and adjusting to changing environments;
  • Collaboration & Teamwork: Works effectively with cross-functional teams and communicates clearly;
  • Ownership & Responsibility: Takes accountability for tasks, incidents, and service reliability;
  • Continuous Learning: Keeps up-to-date with DevOps, SRE, cloud, and security best practices;
  • Effective Communication: Can explain technical concepts clearly to both technical and non-technical stakeholders.

Company Description

We are seeking those who align with our core values:

  • GROWE TOGETHER: Our team is our main asset. We work together and support each other to achieve our common goals;
  • DRIVE RESULT OVER PROCESS: We set ambitious, clear, measurable goals in line with our strategy and driving Growe to success;
  • BE READY FOR CHANGE: We see challenges as opportunities to grow and evolve. We adapt today to win tomorrow.

Job Requirements

  • 3+ years in a DevOps, SRE, or related role;
  • Strong hands-on experience with AWS services including EC2, ECS, EKS, RDS, DocumentDB, ElastiCache, Keyspaces, S3, EBS, VPC, Route53, KMS, ACM, and CloudWatch;
  • Proficiency with Terraform, Terragrunt, and Atlantis for reproducible and version-controlled infrastructure;
  • Experience with GitLab CI, FluxCD, Argo Rollouts, and automation tools (Ansible, Python, Bash);
  • Solid experience with Docker, Kubernetes (AWS EKS), and Helm (including custom templates, ChartMuseum);
  • Familiarity with cluster add-ons such as KEDA, VPA, Karpenter, External-DNS, ingress-nginx, aws-alb-controller, and ebs-csi-driver;
  • Hands-on experience with Grafana, VictoriaMetrics stack, Tempo, metrics exporters, Pingdom, AWS CloudWatch, and alerting systems like PagerDuty, VMAlert, and Alertmanager;
  • Proficiency with Grafana Loki, OpenSearch, and Vector Agent for centralized logging;
  • Strong understanding of networking concepts, AWS networking (VPC, Network Firewall, Transit Gateway, Site-to-Site VPN), identity and access management, certificate management (ACM, Vault, SOPS), and application security best practices;
  • Familiarity with Cloudflare services, including caching, DNS, and Workers;
  • Exposure to AWS Cost Explorer, KubeCost, and custom cost export tools;
  • Certifications: AWS, Terraform, Kubernetes, or Helm are a plus.
  • Problem-Solving Mindset: Approaches complex issues methodically and finds practical solutions under pressure;
  • Analytical Thinking: Able to interpret metrics, logs, and system behavior to make informed decisions;
  • Attention to Details: Ensures accuracy in infrastructure changes, configurations, and deployment processes;
  • Adaptability: Comfortable learning new tools, technologies, and adjusting to changing environments;
  • Collaboration & Teamwork: Works effectively with cross-functional teams and communicates clearly;
  • Ownership & Responsibility: Takes accountability for tasks, incidents, and service reliability;
  • Continuous Learning: Keeps up-to-date with DevOps, SRE, cloud, and security best practices;
  • Effective Communication: Can explain technical concepts clearly to both technical and non-technical stakeholders.

Related Categories

Related Job Pages

More DevOps Engineer Jobs

Rula logo

Staff Software Engineer – SRE

Rula

Our mission is to make mental healthcare work for everyone.

DevOps Engineer1 day ago
Full TimeRemoteTeam 201-500Since 2019H1B No Sponsor

Staff Software Engineer for Platform Engineering team ensuring robust and scalable systems

AWSKubernetes
Hawaii
$206.6K - $230.9K / year
Full TimeRemoteTeam 201-500

The DevOps Engineer will design, build, and maintain automated systems and tools focusing on CI/CD practices to streamline the software release lifecycle. Key tasks include developing automation scripts, managing CI/CD pipelines, handling cloud infrastructure provisioning, and setting up monitoring and alerting systems.

AzureTerraformGitGitHub ActionsAnsiblePythonBashPowerShellCI/CDSonarPrometheusGrafana
United States
$105K - $140K / year
Kyndryl logo

DevSecOps Engineer

Kyndryl

We design, build, manage and modernize the mission-critical technology systems that the world depends on every day.

DevOps Engineer1 day ago
Full TimeRemoteTeam 10,001+Since 2021H1B Sponsor

The DevSecOps Engineer acts as a key technical leadership link, uniting and guiding stakeholders from clients, governance, and project executives through project delivery. This role involves understanding client needs from the start of a project and determining the optimal technical solution to ensure timely and budget-conscious product delivery.

AzureDevOpsSystems ArchitectureNetworkingSecurityStorageDatabasesCloud ComputingData Center Operations
United States
$77.0K - $175K / year
Leidos logo

Site Reliability Engineer

Leidos

Leidos is an innovation company rapidly addressing the world’s most vexing challenges in national security and health.

DevOps Engineer1 day ago
Full TimeRemoteTeam 10,001+Since 1969H1B Sponsor

Site Reliability Engineer focusing on complex distributed systems at Leidos

AnsibleAWSAzureChefCloudDockerJenkinsKubernetesLinuxOpenShiftPuppetPythonSplunkTerraformUnix
Florida + 2 moreAll locations: Florida, Hawaii, Virginia
$87.1K - $157.5K / year