Newfire Global Partners

Software Development, Staff Augmentation, and Advisory Services company operating in 8 countries across 4 continents.

Staff Engineer

Full TimeRemoteTeam 501-1,000Since 2016H1B No SponsorCompany SiteLinkedIn

Location

Massachusetts

Posted

8 days ago

Salary

$163.5K - $208.3K / year

Bachelor DegreeEnglishAirflowApacheAWSAzureCloudDockerGoogle Cloud PlatformKubernetesPythonSparkTerraform

Job Description

• Design, build, and maintain scalable, reliable data pipelines using Python and Apache Spark to support data science and ML workflows. • Architect and own the data platform infrastructure—including data lakes, data warehouses, and feature stores—ensuring performance, quality, and governance at scale. • Partner closely with data scientists and ML engineers to build and maintain the data foundations required for model training, validation, and deployment. • Define and implement data engineering best practices including pipeline orchestration, data quality frameworks, lineage tracking, and observability. • Lead the design of reusable data assets—feature engineering pipelines, curated datasets, and domain-specific data models—that accelerate ML experimentation and production readiness. • Collaborate with platform and DevOps teams to operationalize data infrastructure through CI/CD pipelines, infrastructure as code, and automated testing. • Evaluate and introduce modern data tooling and frameworks, driving continuous improvement in the data engineering ecosystem. • Establish and enforce data governance, security, and compliance standards aligned with HIPAA and healthcare data requirements. • Conduct design reviews and technical mentorship for senior and mid-level data engineers across the organization. • Serve as a cross-functional technical authority, aligning data engineering direction with product, clinical, and analytics stakeholders.

Job Requirements

  • Deep expertise in Python — including data engineering libraries, pipeline development, testing, and production-grade code quality.
  • Strong hands-on experience with Apache Spark for large-scale distributed data processing, optimization, and performance tuning.
  • Proven experience designing and maintaining data platforms including data lakes, lakehouses, or data warehouse architectures (e.g., Delta Lake, Iceberg, Hudi).
  • Experience building and orchestrating data pipelines using tools such as Apache Airflow, Prefect, Dagster, or equivalent.
  • Solid understanding of ML platform concepts — feature stores, training data pipelines, model registries, and experiment tracking (e.g., MLflow, Feast).
  • Proficiency with cloud data platforms, preferably Azure (Azure Data Factory, Azure Databricks, Azure Synapse, ADLS) or equivalent AWS/GCP services.
  • Strong knowledge of data modeling, schema design, and data warehousing principles for both analytical and ML workloads.
  • Experience with data quality frameworks and observability tooling (e.g., Great Expectations, Monte Carlo, dbt tests).
  • Familiarity with infrastructure as code and DevOps practices — Terraform, Docker, Kubernetes, or equivalent.
  • Solid understanding of data security, access controls, and compliance requirements in regulated industries.

Benefits

  • medical, dental & vision coverage
  • health spending accounts
  • voluntary benefits
  • leave of absence policies
  • Employee Assistance Program
  • 401(k) program with employer contribution
  • Flexible work schedules
  • time-off policy
  • company equipment for all new full-time US-based remote employees

Related Job Pages