Leidos

Leidos is an innovation company rapidly addressing the world’s most vexing challenges in national security and health.

Unstructured Data Engineer

Data EngineerData EngineerFull TimeRemoteTeam 10,001+Since 1969H1B SponsorCompany SiteLinkedIn

Location

United States

Posted

14 days ago

Salary

$107.9K - $195.1K / year

Bachelor Degree8 yrs expEnglishAWSAzureCloudGoogle Cloud PlatformMicroservicesPython

Job Description

• Design, build, and manage end-to-end RAG pipelines for enterprise AI applications. • Lead preprocessing of unstructured data, including discovery, classification, cleansing, redaction, and metadata enrichment. • Develop and optimize document chunking, embedding, and vectorization strategies for structured and unstructured datasets. • Coordinate ingestion of curated datasets into vector databases and AI platforms. • Package curated unstructured datasets as governed, reusable data products for enterprise consumption. • Define and implement metadata tagging strategies to align with Collibra governance standards. • Partner with Data Governance and Data Quality teams to ensure AI-ready data meets enterprise standards for lineage, classification, and compliance. • Evaluate and optimize embedding models, retrieval strategies, and indexing performance. • Monitor and tune RAG pipeline performance, including latency, retrieval accuracy, and cost efficiency. • Implement automation for document ingestion, transformation, and publishing workflows. • Support integration with enterprise AI platforms (e.g., ChatGPT Enterprise, AskSage, Moveworks). • Conduct cost analysis and capacity planning for vector storage and processing workloads. • Provide technical guidance on AI data readiness and unstructured data lifecycle management. • Design, implement, and optimize enterprise-grade RAG and prompt engineering frameworks, including context engineering strategies (chunking, metadata enrichment, semantic filtering, dynamic context management) to improve retrieval accuracy, grounding, and response quality. • Develop and maintain scalable multi-modal data pipelines that ingest, preprocess, embed, and integrate text, documents, images, audio, and structured data into governed vectorized data products consumable by enterprise AI platforms.

Job Requirements

  • Bachelor’s degree in Computer Science, Data Engineering, AI/ML, or related field and 8+ years of relevant experience.
  • Hands-on experience designing and implementing RAG architectures in production environments.
  • Experience working with unstructured data (PDFs, documents, email, transcripts, images with OCR, etc.).
  • Strong proficiency in Python and experience with NLP/LLM frameworks (e.g., LangChain, LlamaIndex, Hugging Face, OpenAI APIs).
  • Experience with vector databases (e.g., Pinecone, Weaviate, FAISS, OpenSearch, Azure AI Search).
  • Experience implementing document chunking, embedding generation, and similarity search.
  • Understanding of metadata modeling and governance principles.
  • Experience building scalable data pipelines in cloud environments (AWS, Azure, or GCP).
  • Hands-on experience with prompt engineering, evaluation metrics, and context window optimization.
  • Strong understanding of multi-modal data processing and pipeline engineering.
  • Strong knowledge of API integration and microservices architecture.
  • US Citizenship is required.

Benefits

  • Competitive compensation
  • Health and Wellness programs
  • Income Protection
  • Paid Leave
  • Retirement

Related Categories

Related Job Pages

More Data Engineer Jobs

Data Engineer II

Trend Health Partners

An independent, tech-enabled payment integrity company.

Data Engineer14 days ago
Full TimeRemoteTeam 201-500Since 2018H1B No Sponsor

Data Engineer II designing and managing data pipelines at TREND Health Partners

AzurePythonSQL
United States
$110K - $135K / year

Senior Data Engineer, 1

People Inc.

People Inc. is America’s largest digital and print publisher. Our 40+ iconic and fast-growing brands harness the best intent-driven content, the fastest sites, and the fewest ads to help nearly 200 million people every month, including 95 percent of US women, make decisions, take action, and find inspiration. People Inc. brands include PEOPLE, Better Homes & Gardens, Verywell, FOOD & WINE, The Spruce, Allrecipes, Byrdie, REAL SIMPLE, Investopedia, Southern Living and more.

Data Engineer14 days ago
Full TimeRemoteTeam 3,500Since 1996

The Senior Data Engineer will build and optimize data integration pipelines, ensure data quality, and collaborate with stakeholders to implement business requirements.

Apache BeamApache KafkaSparkGoogle Cloud PlatformPub/SubPythonSQL
New York
$140K - $170K / year

Staff Data Engineer, Compliance Engineering & Technology

Block

Block builds simple, powerful tools that make progress towards an economy that’s truly open to all.

Data Engineer14 days ago
Full TimeRemoteTeam 10,001+Since 1990H1B Sponsor

As a Staff Data Engineer, you'll architect and model data, create ETL pipelines, and standardize datasets for compliance teams. You'll guide best practices and lead data solutions while collaborating with various stakeholders.

AirflowDatabricksDbtGitPrefectPythonSnowflakeSQLTableauTerraform
California

Data Migration Director

Impact Advisors

Impact Advisors, LLC is a nationally recognized healthcare management consulting firm delivering Best in KLAS advisory, implementation, and optimization services. We are driven by a commitment to exceed client expectations and are proud to be a trusted partner to many of the nation's leading healthcare organizations. Our mission to drive patient-centered, value-driven outcomes has earned us prestigious industry accolades.

Data Engineer14 days ago
Full TimeRemoteTeam 501-1,000

Impact Advisors is seeking a dynamic and motivated Data Migration Director to join our Oracle Health team. In this role, you will be instrumental in leading enterprise data conversion strategy for large-scale EHR transformations. This position offers a unique opportunity to contr...

United States