Skip to content

References: Semantic Layers for Data Lakes

  1. Semantic Layer - Wikipedia - Defines the semantic layer concept, its role translating physical data into business terms, and its position between raw storage and end-user tools — directly matching this chapter's central topic.

  2. Data Governance - Wikipedia - Covers policies, standards, and stewardship practices that underpin consistent metric definitions, naming standards, and business glossaries described throughout this chapter.

  3. Data Catalog - Wikipedia - Explains data catalog systems that enable table discovery, business glossary management, and source system mapping — all core semantic layer components covered in this chapter.

  4. Designing Data-Intensive Applications - Martin Kleppmann - O'Reilly Media - Chapters 2–3 cover data modeling approaches and storage engines; Chapter 10 covers data pipelines — foundational context for understanding data lake and lakehouse trade-offs discussed here.

  5. Fundamentals of Data Engineering - Joe Reis, Matt Housley - O'Reilly Media - Chapters 6–8 cover data serving, semantic layers, and transformation patterns from ingestion to analytics, directly supporting this chapter's treatment of the data lake-to-semantic-layer stack.

  6. Extract, Transform, Load - Wikipedia - Describes ETL and ELT patterns that feed data lakes and lakehouses, providing background for this chapter's discussion of schema-on-write vs. schema-on-read architectures.

  7. Data Lineage - Wikipedia - Explains data lineage tracking that connects business metrics back to source columns — directly relevant to source system mapping and the semantic layer's role in grounding LLM context.

  8. OpenLineage Open Standard - OpenLineage Project - Defines the open standard for capturing and sharing data lineage metadata across pipelines, supporting this chapter's treatment of source system mappings and semantic consistency.

  9. DataHub Open Source Data Catalog - DataHub Project - Documents an open-source metadata platform for table discovery, business glossary management, and schema registry functions described in this chapter's discovery and naming sections.

  10. Master Data Management - Wikipedia - Covers MDM practices for maintaining authoritative entity definitions across systems, supporting this chapter's sections on vocabulary alignment and semantic consistency across source systems.