Interactive lineage mapping

Data Lineage Diagram for Every Dependency

Your pipeline passed. The dashboard broke anyway. DataHub gives platform engineers a complete data lineage diagram, column by column, source to consumer.

Trace column-level lineage through every transformation automatically
See downstream impact before you change a single schema
Connect Snowflake, dbt, Airflow, and Looker without pipeline rebuilds

Request a Demo

See your full lineage in one view

Request a demo scoped to your stack, not a generic walkthrough.

Trusted by modern data teams

The real cost

What breaks when your data lineage diagram is missing?

Lineage gaps don't announce themselves. They surface in standups, audits, and incidents, after the damage is done.

Incidents you can't trace

A dashboard breaks. You spend hours tracing which upstream table changed. No lineage means no fast answer.

Downstream impact is invisible

Schema changes ripple silently. Without downstream impact visibility, breakages reach consumers before engineers.

Audits without documentation

Compliance teams ask where data came from. Without data lineage documentation, the answer takes days to reconstruct.

Column changes break reports

A renamed column. A dropped field. Column level lineage gaps mean you find out when a report goes blank.

How DataHub helps

A better way to visualize and trust your data flows

DataHub gives platform engineers an interactive data lineage visualization, automated, column-deep, and connected to every tool in your stack.

Column precision

Column level lineage, traced automatically

DataHub extracts column-to-column lineage from SQL queries across 20+ dialects, with no manual mapping. Transformation logic surfaces directly in the lineage graph so you know exactly what changed and why.

Automatic column mapping from SQL parsing and OpenLineage
Transformation operations visible inside each lineage node
Python SDK and GraphQL API for custom column-level ingestion

Impact analysis

Know your downstream impact before you ship

Before you rename a column or deprecate a table, DataHub shows every downstream asset that depends on it, including dashboards, models, reports, and ML features, filtered by degree of separation.

Search downstream and upstream assets across the full graph
Filter by entity type: datasets, charts, dashboards, ML models
Degree-of-separation filtering to scope blast radius

Documentation

Data lineage documentation teams can use

Export lineage diagrams as PNG screenshots for incident reports, architecture reviews, and compliance documentation. Manual lineage editing lets you fill gaps, with a full audit trail of who changed what and when.

PNG export with entity name as filename, ready for docs
Add or remove lineage edges with edit privilege controls
Every manual change logged with user and timestamp

Interactive graph

Lineage visualization built for exploration

The DataHub lineage graph is interactive and filterable. Expand nodes, collapse branches, and navigate from source to consumer without losing context. Every asset links directly to its metadata profile.

Expand and collapse nodes to control graph complexity
Filter by platform, owner, domain, or data product
Click any node to open its full metadata profile inline

Three steps

How it works

Connect your existing stack, build lineage visualization automatically, and put it to work across your team.

Connect your sources

Ingest from Snowflake, dbt, Airflow, Looker, and 80+ sources

No pipeline rebuilds required to start collecting lineage

OpenLineage and custom SDK ingestion supported from day one

Build lineage visualization

Column-level lineage extracted automatically from SQL query logs

Graph updates incrementally as your pipelines run and evolve

Manual edges fill gaps where automated parsing cannot reach

Activate across your team

Run downstream impact analysis before any schema change ships

Export lineage diagrams for compliance and architecture reviews

Query lineage programmatically via GraphQL and REST APIs

Deployment and integrations

Open source data lineage built for enterprise scale

DataHub is Apache 2.0 licensed. Deploy on your infrastructure, extend via API, and integrate with the tools your team already runs.

Deployment options

Self-hosted on Kubernetes, Docker, or bare metal

DataHub Cloud for managed deployment with SLA support

Role-based access control and SSO integration included

Integrations

Snowflake, BigQuery, Databricks, Redshift, and Spark

dbt, Airflow, Prefect, and Dagster for pipeline lineage

Looker, Tableau, Power BI, and Superset for BI lineage

Security and compliance

Fine-grained metadata policies at the platform level

Audit logs for every lineage edit and metadata change

Data stays in your environment, no third-party data transfer

Peer review

Trusted by modern data teams

Gartner Peer Insights

Verified Review

Outcome

Full column-level lineage across the stack

"DataHub gave our platform team the lineage graph we had been trying to build manually for two years. Column-level tracing across dbt and Snowflake changed how we handle incident response."

Senior Data Engineer

Financial Services, Enterprise

FAQ

Frequently asked questions about data lineage diagrams

How does DataHub extract column-level lineage automatically?

DataHub parses SQL query logs from your warehouse to extract column-to-column mappings across 20+ SQL dialects. For orchestration tools like dbt and Airflow, it reads transformation definitions directly. Where automated parsing cannot reach, the Python SDK and GraphQL API let you push custom lineage edges programmatically. The result is a graph that reflects your actual data flows, not a manually maintained diagram.

Does DataHub support lineage across multiple platforms in one graph?

Yes. DataHub builds a unified lineage graph across all connected sources. A single lineage path can span a Kafka topic, a Spark job, a Snowflake table, a dbt model, and a Looker dashboard, all in one view. Cross-platform lineage is one of the core reasons teams adopt DataHub over single-platform lineage tools that only cover one layer of the stack.

How complete will our lineage graph be on day one?

It depends on your stack and how much query history is available. For well-instrumented environments with dbt and a modern warehouse, coverage is high from the first ingestion run. For custom pipelines or legacy systems, gaps are expected. DataHub is designed for this: manual lineage editing lets engineers fill gaps with full audit trails, and the graph improves incrementally as more sources are connected and more queries are processed.

Can we use DataHub lineage in our own tooling via API?

Yes. The full lineage graph is queryable via GraphQL and REST APIs. Teams use this to power custom impact analysis scripts, feed lineage data into internal portals, trigger alerts when critical assets change, and integrate lineage context into CI/CD pipelines. The API is the same one the DataHub UI uses, so anything visible in the interface is accessible programmatically.

What does deployment look like for a self-hosted installation?

DataHub runs on Kubernetes via Helm chart or with Docker Compose for smaller environments. The core services are the metadata service, a search backend (Elasticsearch), a graph store (Neo4j or embedded), and a message queue (Kafka). Most platform teams complete an initial deployment in a day. Production hardening, including SSO, RBAC, and high availability, takes longer and depends on your infrastructure standards. DataHub Cloud is available if you prefer a managed option.

Get started

Ready to map every data dependency?

DataHub gives your team a complete data lineage diagram, column by column, across every source in your stack. You will speak with a DataHub engineer about your specific environment, not a generic walkthrough.

Request a Demo Explore the product

Apache 2.0 open source

Deploy on your infrastructure

80+ pre-built connectors