BLACKBIRD.AI

Deception Detection for the Information Age.

Staff Data Engineer

Data EngineerData EngineerFull TimeRemoteTeam 11-50H1B No SponsorCompany SiteLinkedIn

Location

New York + 2 moreAll locations: New York, Texas, Washington

Posted

38 days ago

Salary

$160K - $190K / year

8 yrs expEnglishApacheAWSAzureCloudElastic SearchPythonSparkSQL

Job Description

• Design and implement scalable data platform architecture on Databricks, supporting both batch and streaming ingestion • Build robust, fault-tolerant data ingestion pipelines that integrate with multiple third-party APIs and data providers • Design and implement AI-powered enrichment stages within pipelines—applying ML clustering, generative AI summarization, classification, and entity extraction to transform raw data into actionable intelligence • Build analytical systems with full-text search capabilities using Elasticsearch for rapid querying and analysis of enriched data • Work with AI/ML researchers to implement, integrate and scaling AI processing • Expose data platform capabilities as APIs and other interfaces for downstream consumption by applications and services • Optimize data lake and lakehouse architecture for performance, cost-efficiency, and scalability • Design and implement data quality frameworks, monitoring, and alerting systems • Design efficient architectures for calling external AI APIs and managing rate limits, costs, and reliability • Architect solutions with cost-efficiency as a first-class concern, implementing monitoring and optimization strategies for compute and storage • Make critical build-vs-buy decisions and establish architectural standards for the data organization • Mentor engineers and elevate the team's technical capabilities through code reviews, design discussions, and knowledge sharing

Job Requirements

  • 8+ years of software engineering experience with 5+ years focused on data platforms or data engineering
  • Deep expertise with Databricks, Apache Spark, and data lakehouse architectures
  • Strong experience building and operating data pipelines at scale (handling TBs+ of data)
  • Experience integrating AI/ML capabilities into data pipelines (clustering, LLM APIs, classification, summarization)
  • Proficiency in Python, DBT, and SQL for data processing and pipeline development
  • Experience with both batch and streaming large scale data processing patterns
  • Strong understanding of cloud platforms (AWS, Azure)
  • Excellent communication skills and ability to mentor engineers
  • Preferred Qualifications:**
  • Experience designing both batch and streaming/near real-time data architectures
  • Proficiency with Elasticsearch for building analytical systems with full-text search capabilities
  • Hands-on experience with LLM APIs and understanding of rate limiting and cost optimization
  • Experience with Agentic AI, context engineering, and evaluation
  • Background in trust & safety, security, or content moderation domains
  • Experience with data observability tools and building comprehensive monitoring systems
  • Prior experience at a startup or fast-paced environment
  • Apply agentic coding tools for day to day development
  • Familiarity with Databricks' Lakeflow, Agent Bricks, and vector databases

Benefits

  • Competitive compensation package, 401(k), and equity -** everyone has a stake in our growth! **
  • Comprehensive health benefits for you and your loved ones, including wellness days and monthly wellness reimbursements - **an apple a day doesn't always keep the doctor away! **
  • Generous vacation policy, encouraging you to take the time you need - we trust you to strike the right work/life balance!
  • A flexible work environment with opportunities to collaborate with your team in person -** you can have it all! **
  • Inclusion and Impact **- soar to new heights! **
  • Professional development stipend -** never stop learning! **

Related Categories

Related Job Pages

More Data Engineer Jobs

Senior Data Engineer

SmarterDx

Improving clinical and financial outcomes with physician-validated AI for documentation and coding.

Data Engineer38 days ago
Full TimeRemoteTeam 11-50H1B No Sponsor

Senior Data Engineer building scalable data pipelines for healthcare AI solutions

AirflowApacheAWSCloudInformaticaSparkSQL
United States
$200K - $220K / year

Data Engineer

Mento

Coaching that accelerates the growth of high performers

Data Engineer38 days ago
Full TimeRemoteTeam 11-50H1B No Sponsor

Data Engineer building data infrastructure for Mento's coaching platform

ETLPostgresPythonSQLGo
United States

Data Platform Engineer II

Jamf

The Standard in Apple Enterprise Management

Data Engineer38 days ago
Full TimeRemoteTeam 1,001-5,000Since 2002H1B Sponsor

Business Intelligence at Jamf powers data-driven decision-making across the organization. As a Data Platform Engineer II, you’ll be responsible not just for building & transforming data, but for owning critical data infrastructure: from ingestion and storage, to governance, qua...

PythonSnowflakeSQLDockerKubernetesTerraformAWSdbtCI/CD
United States
$85.1K - $181.7K / year
Full TimeRemoteTeam 11-50H1B No Sponsor

Senior Data Engineer building actionable data systems for healthcare startup

PythonSQL
United States
$170K - $190K / year