GROWE TOGETHER: Our team is our main asset. We work together and support each other to achieve our common goals. DRIVE RESULT OVER PROCESS: We set ambitious, clear, measurable goals in line with our strategy and driving Growe to success. BE READY FOR CHANGE: We see challenges as opportunities to grow and evolve. We adapt today to win tomorrow.

System Reliability Engineer/DevOps

DevOps EngineerDevOps EngineerFull TimeRemoteMid Level

Location

United States + 180 more

Posted

1 day ago

Salary

Not specified

Seniority

Mid Level

AWSTerraformTerragruntKubernetesDockerHelmGitLab CIPrometheusGrafanaLokiVictoriaMetricsCloudWatchAnsiblePythonBashECSEKSRDSS3VPCRoute53KMSACMFluxCDArgo RolloutsKEDAVPAKarpenteringress-nginxPingdomPagerDutyAlertmanagerOpenSearchVector AgentNetwork FirewallTransit GatewaySite-to-Site VPNVaultSOPSCloudflareAWS Cost ExplorerKubeCost

Job Description

Role Description

Growe welcomes those who are excited to:

Ensure availability, performance, and scalability of infrastructure and services through monitoring, automation, and operational best practices;
Lead incident response, perform root cause analysis, and implement recovery and long-term fixes;
Manage infrastructure using Terraform, Terragrunt, and automation tools for consistency and repeatability;
Implement and maintain metrics, logs, and tracing solutions (Prometheus, Grafana, Loki, VictoriaMetrics, CloudWatch) to ensure system visibility;
Identify bottlenecks, tune systems, and improve infrastructure performance;
Monitor resources, forecast growth, and implement scaling strategies;
Integrate security best practices into IaC, CI/CD pipelines, and deployments;
Support vulnerability management;
Participate in 24/7 rotations (once a week) for timely resolution of critical incidents;
Work with DevOps, PRE, development, and security teams to improve reliability and design resilient systems;
Maintain operational runbooks, incident reports, and system documentation.

Qualifications

3+ years in a DevOps, SRE, or related role;
Strong hands-on experience with AWS services including EC2, ECS, EKS, RDS, DocumentDB, ElastiCache, Keyspaces, S3, EBS, VPC, Route53, KMS, ACM, and CloudWatch;
Proficiency with Terraform, Terragrunt, and Atlantis for reproducible and version-controlled infrastructure;
Experience with GitLab CI, FluxCD, Argo Rollouts, and automation tools (Ansible, Python, Bash);
Solid experience with Docker, Kubernetes (AWS EKS), and Helm (including custom templates, ChartMuseum);
Familiarity with cluster add-ons such as KEDA, VPA, Karpenter, External-DNS, ingress-nginx, aws-alb-controller, and ebs-csi-driver;
Hands-on experience with Grafana, VictoriaMetrics stack, Tempo, metrics exporters, Pingdom, AWS CloudWatch, and alerting systems like PagerDuty, VMAlert, and Alertmanager;
Proficiency with Grafana Loki, OpenSearch, and Vector Agent for centralized logging;
Strong understanding of networking concepts, AWS networking (VPC, Network Firewall, Transit Gateway, Site-to-Site VPN), identity and access management, certificate management (ACM, Vault, SOPS), and application security best practices;
Familiarity with Cloudflare services, including caching, DNS, and Workers;
Exposure to AWS Cost Explorer, KubeCost, and custom cost export tools;
Certifications: AWS, Terraform, Kubernetes, or Helm are a plus.

Requirements

Problem-Solving Mindset: Approaches complex issues methodically and finds practical solutions under pressure;
Analytical Thinking: Able to interpret metrics, logs, and system behavior to make informed decisions;
Attention to Details: Ensures accuracy in infrastructure changes, configurations, and deployment processes;
Adaptability: Comfortable learning new tools, technologies, and adjusting to changing environments;
Collaboration & Teamwork: Works effectively with cross-functional teams and communicates clearly;
Ownership & Responsibility: Takes accountability for tasks, incidents, and service reliability;
Continuous Learning: Keeps up-to-date with DevOps, SRE, cloud, and security best practices;
Effective Communication: Can explain technical concepts clearly to both technical and non-technical stakeholders.

Company Description

We are seeking those who align with our core values:

GROWE TOGETHER: Our team is our main asset. We work together and support each other to achieve our common goals;
DRIVE RESULT OVER PROCESS: We set ambitious, clear, measurable goals in line with our strategy and driving Growe to success;
BE READY FOR CHANGE: We see challenges as opportunities to grow and evolve. We adapt today to win tomorrow.

Job Requirements

3+ years in a DevOps, SRE, or related role;
Strong hands-on experience with AWS services including EC2, ECS, EKS, RDS, DocumentDB, ElastiCache, Keyspaces, S3, EBS, VPC, Route53, KMS, ACM, and CloudWatch;
Proficiency with Terraform, Terragrunt, and Atlantis for reproducible and version-controlled infrastructure;
Experience with GitLab CI, FluxCD, Argo Rollouts, and automation tools (Ansible, Python, Bash);
Solid experience with Docker, Kubernetes (AWS EKS), and Helm (including custom templates, ChartMuseum);
Familiarity with cluster add-ons such as KEDA, VPA, Karpenter, External-DNS, ingress-nginx, aws-alb-controller, and ebs-csi-driver;
Hands-on experience with Grafana, VictoriaMetrics stack, Tempo, metrics exporters, Pingdom, AWS CloudWatch, and alerting systems like PagerDuty, VMAlert, and Alertmanager;
Proficiency with Grafana Loki, OpenSearch, and Vector Agent for centralized logging;
Strong understanding of networking concepts, AWS networking (VPC, Network Firewall, Transit Gateway, Site-to-Site VPN), identity and access management, certificate management (ACM, Vault, SOPS), and application security best practices;
Familiarity with Cloudflare services, including caching, DNS, and Workers;
Exposure to AWS Cost Explorer, KubeCost, and custom cost export tools;
Certifications: AWS, Terraform, Kubernetes, or Helm are a plus.
Problem-Solving Mindset: Approaches complex issues methodically and finds practical solutions under pressure;
Analytical Thinking: Able to interpret metrics, logs, and system behavior to make informed decisions;
Attention to Details: Ensures accuracy in infrastructure changes, configurations, and deployment processes;
Adaptability: Comfortable learning new tools, technologies, and adjusting to changing environments;
Collaboration & Teamwork: Works effectively with cross-functional teams and communicates clearly;
Ownership & Responsibility: Takes accountability for tasks, incidents, and service reliability;
Continuous Learning: Keeps up-to-date with DevOps, SRE, cloud, and security best practices;
Effective Communication: Can explain technical concepts clearly to both technical and non-technical stakeholders.

Related Categories

DevOps Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More US Remote Jobs

More DevOps Engineer Jobs

Staff Software Engineer – SRE

Rula

Our mission is to make mental healthcare work for everyone.

DevOps Engineer1 day ago

Full TimeRemoteTeam 201-500Since 2019H1B No Sponsor

Company Site LinkedIn

Staff Software Engineer for Platform Engineering team ensuring robust and scalable systems

AWSKubernetes

View details: Staff Software Engineer – SRE

Hawaii

$206.6K - $230.9K / year

Apply

Development Operations Engineer (Healthcare Consulting)

Sellers Dorsey

DevOps Engineer1 day ago

Full TimeRemoteTeam 201-500

The DevOps Engineer will design, build, and maintain automated systems and tools focusing on CI/CD practices to streamline the software release lifecycle. Key tasks include developing automation scripts, managing CI/CD pipelines, handling cloud infrastructure provisioning, and setting up monitoring and alerting systems.

AzureTerraformGitGitHub ActionsAnsiblePythonBashPowerShellCI/CDSonarPrometheusGrafana

View details: Development Operations Engineer (Healthcare Consulting)

United States

$105K - $140K / year

Apply

DevSecOps Engineer

Kyndryl

We design, build, manage and modernize the mission-critical technology systems that the world depends on every day.

DevOps Engineer1 day ago

Full TimeRemoteTeam 10,001+Since 2021H1B Sponsor

Company Site LinkedIn

The DevSecOps Engineer acts as a key technical leadership link, uniting and guiding stakeholders from clients, governance, and project executives through project delivery. This role involves understanding client needs from the start of a project and determining the optimal technical solution to ensure timely and budget-conscious product delivery.

AzureDevOpsSystems ArchitectureNetworkingSecurityStorageDatabasesCloud ComputingData Center Operations

View details: DevSecOps Engineer

United States

$77.0K - $175K / year

Apply