Software Development, Staff Augmentation, and Advisory Services company operating in 8 countries across 4 continents.

Staff Engineer

Full TimeRemoteTeam 501-1,000Since 2016H1B No SponsorCompany Site LinkedIn

Location

Massachusetts

Posted

8 days ago

Salary

$163.5K - $208.3K / year

Bachelor DegreeEnglishAirflowApacheAWSAzureCloudDockerGoogle Cloud PlatformKubernetesPythonSparkTerraform

Job Description

• Design, build, and maintain scalable, reliable data pipelines using Python and Apache Spark to support data science and ML workflows. • Architect and own the data platform infrastructure—including data lakes, data warehouses, and feature stores—ensuring performance, quality, and governance at scale. • Partner closely with data scientists and ML engineers to build and maintain the data foundations required for model training, validation, and deployment. • Define and implement data engineering best practices including pipeline orchestration, data quality frameworks, lineage tracking, and observability. • Lead the design of reusable data assets—feature engineering pipelines, curated datasets, and domain-specific data models—that accelerate ML experimentation and production readiness. • Collaborate with platform and DevOps teams to operationalize data infrastructure through CI/CD pipelines, infrastructure as code, and automated testing. • Evaluate and introduce modern data tooling and frameworks, driving continuous improvement in the data engineering ecosystem. • Establish and enforce data governance, security, and compliance standards aligned with HIPAA and healthcare data requirements. • Conduct design reviews and technical mentorship for senior and mid-level data engineers across the organization. • Serve as a cross-functional technical authority, aligning data engineering direction with product, clinical, and analytics stakeholders.

Job Requirements

Deep expertise in Python — including data engineering libraries, pipeline development, testing, and production-grade code quality.
Strong hands-on experience with Apache Spark for large-scale distributed data processing, optimization, and performance tuning.
Proven experience designing and maintaining data platforms including data lakes, lakehouses, or data warehouse architectures (e.g., Delta Lake, Iceberg, Hudi).
Experience building and orchestrating data pipelines using tools such as Apache Airflow, Prefect, Dagster, or equivalent.
Solid understanding of ML platform concepts — feature stores, training data pipelines, model registries, and experiment tracking (e.g., MLflow, Feast).
Proficiency with cloud data platforms, preferably Azure (Azure Data Factory, Azure Databricks, Azure Synapse, ADLS) or equivalent AWS/GCP services.
Strong knowledge of data modeling, schema design, and data warehousing principles for both analytical and ML workloads.
Experience with data quality frameworks and observability tooling (e.g., Great Expectations, Monte Carlo, dbt tests).
Familiarity with infrastructure as code and DevOps practices — Terraform, Docker, Kubernetes, or equivalent.
Solid understanding of data security, access controls, and compliance requirements in regulated industries.

Benefits

medical, dental & vision coverage
health spending accounts
voluntary benefits
leave of absence policies
Employee Assistance Program
401(k) program with employer contribution
Flexible work schedules
time-off policy
company equipment for all new full-time US-based remote employees

Related Categories

Remote Full-stack Engineer Jobs in the US Remote Software Engineer Jobs in the US Remote Backend Engineer Jobs in the US Frontend Engineer Android Engineer Game Engineer iOS Engineer

Related Job Pages

Remote Full-stack Engineer Jobs in the US Full-stack Engineer Jobs in Massachusetts Remote Full-time Jobs (US)Remote Python Jobs (US)More US Remote Jobs