Machinify, Inc.

Bending the healthcare cost curve with AI.

Senior Data Engineer

Data EngineerData EngineerFull TimeRemoteTeam 51-200H1B No SponsorCompany SiteLinkedIn

Location

United States

Posted

15 days ago

Salary

$180K - $220K / year

Bachelor Degree6 yrs expEnglishAirflowAWSCloudKafkaPythonSparkSQL

Job Description

• Design and implement robust, production-grade pipelines using Python, Spark SQL, and Airflow to process high-volume file-based datasets (CSV, Parquet, JSON). • Lead efforts to canonicalize raw healthcare data (837 claims, EHR, partner data, flat files) into internal models. • Own the full lifecycle of core pipelines — from file ingestion to validated, queryable datasets — ensuring high reliability and performance. • Onboard new customers by integrating their raw data into internal pipelines and canonical models; collaborate with SMEs, Account Managers, and Product to ensure successful implementation and troubleshooting. • Build resilient, idempotent transformation logic with data quality checks, validation layers, and observability. • Refactor and scale existing pipelines to meet growing data and business needs. • Tune Spark jobs and optimize distributed processing performance. • Implement schema enforcement and versioning aligned with internal data standards. • Collaborate deeply with Data Analysts, Data Scientists, Product Managers, Engineering, Platform, SMEs, and AMs to ensure pipelines meet evolving business needs. • Monitor pipeline health, participate in on-call rotations, and proactively debug and resolve production data flow issues. • Contribute to the evolution of our data platform — driving toward mature patterns in observability, testing, and automation. • Build and enhance streaming pipelines (Kafka, SQS, or similar) where needed to support near-real-time data needs. • Help develop and champion internal best practices around pipeline development and data modeling.

Job Requirements

  • 6+ years of experience as a Data Engineer (or equivalent), building production-grade pipelines.
  • Strong expertise in Python, Spark SQL, and Airflow.
  • Experience processing large-scale file-based datasets (CSV, Parquet, JSON, etc) in production environments.
  • Experience mapping and standardizing raw external data into canonical models.
  • Familiarity with AWS (or any cloud), including file storage and distributed compute concepts.
  • Experience onboarding new customers and integrating external customer data with non-standard formats.
  • Ability to work across teams, manage priorities, and own complex data workflows with minimal supervision.
  • Strong written and verbal communication skills — able to explain technical concepts to non-engineering partners.
  • Comfortable designing pipelines from scratch and improving existing pipelines.
  • Experience working with large-scale or messy datasets (healthcare, financial, logs, etc).
  • Experience building or willingness to learn streaming pipelines using tools such as Kafka or SQS.
  • Bonus: Familiarity with healthcare data (837, 835, EHR, UB04, claims normalization).

Benefits

  • Work from anywhere in the US! Machinify is digital-first.
  • Full Medical/Dental/Vision for employees & their families
  • Flexible and trusting environment where you’ll feel empowered to do your best work
  • Unlimited FTO
  • Competitive salary, equity, 401(k) including employer match

Related Categories

Related Job Pages

More Data Engineer Jobs

Staff Data Engineer

Hauler Hero

Delivering Technology for Waste & Recycling Hauler Operators

Data Engineer15 days ago
Full TimeRemoteTeam 11-50Since 2020

Staff Data Engineer owning customer data migration platform at Hauler Hero

ETLHibernatePandasPythonSQL
United States
Data Engineer15 days ago
Full TimeRemoteTeam 1,001-5,000

Senior Data Architect responsible for architecting the Databricks landscape

AWSAzureCloudGoogle Cloud PlatformPySparkPythonSparkSQLUnityVault
Florida

Database, Data Warehouse Developer

Decision Foundry

A Global, Salesforce Marketing Cloud Implementation Partner.

Data Engineer15 days ago
Full TimeRemoteTeam 201-500H1B No Sponsor

Database/Data Warehouse Developer at Decision Foundry supporting analytics and data solutions

AWSAzureCloudETLGoogle Cloud PlatformMS SQL ServerMySQLPythonSQLSSIS
United States
$110K - $140K / year

Data Engineer

RevenueBase

B2B data for AI agents and GTM tools. 350M+ contacts. Unmetered access.

Data Engineer15 days ago
Full TimeRemoteTeam 11-50Since 2021

Data Engineer building and maintaining data pipelines for B2B AI applications

AirflowAWSETLPythonSQL
United States