Open Source Data Lineage Tools
Your pipelines pass. Your dashboards break anyway. DataHub automatically maps column-level lineage across 60+ platforms so your team finds problems before the standup does.
-
Automated lineage across Snowflake, BigQuery, dbt, Airflow, and 57 more
-
Column-level precision: trace any field from source to BI dashboard
-
Downstream impact analysis before a schema change breaks production
See DataHub lineage in your stack
A DataHub engineer will walk through your specific environment.
What breaks when lineage is manual?
Manual lineage documentation falls behind the moment a pipeline changes. The gap between what your docs say and what your data does is where incidents live.
Pipeline blame, no root cause
A dashboard breaks. You spend hours tracing upstream dependencies by hand, across systems that do not talk to each other.
Audit gaps you cannot close
Compliance asks where a field came from. Your lineage docs are six months stale. You have no defensible answer.
Schema changes hit production
A column gets renamed upstream. Three downstream reports break before anyone knows the change shipped.
Engineer hours lost to context
Your best engineers spend their Fridays explaining data flows they did not build, to stakeholders who needed the answer Tuesday.
A better way to map and trust your data
DataHub replaces manual lineage documentation with automated extraction, column-level precision, and interactive data lineage visualization across your full stack.
Lineage across 60+ platforms
DataHub extracts lineage automatically from query logs, pipeline metadata, and transformation definitions. No manual mapping. Connectors cover Snowflake, BigQuery, Redshift, dbt, Airflow, Looker, Tableau, and 53 more.
- Pre-built connectors for warehouses, lakes, and BI tools
- SQL query parsing extracts lineage from query logs automatically
- OpenLineage support for Spark and streaming pipelines
Column-level lineage precision
Column-level lineage extracted via SQL parsing shows exactly which upstream fields feed each downstream column. When a field changes, you know every report, model, and dataset it touches before the change ships.
- SQL-parsed column paths across joins, CTEs, and transformations
- Toggle column view directly in the lineage graph UI
- Identify field-level dependencies across dbt models and warehouses
Interactive lineage visualization
The DataHub lineage explorer renders your full data graph as an interactive data lineage diagram. Zoom, pan, expand nodes, and filter by time window to understand how your data moves.
- Expand and collapse nodes to control graph depth
- Filter lineage by time window to audit historical state
- Share lineage views with a direct link to any asset
Downstream impact before incidents
Before renaming a column or deprecating a table, DataHub shows every downstream asset affected. Review impact across reports, models, and pipelines without running a query.
- Downstream impact summary for any schema change
- Filter affected assets by type, owner, or platform
- Tag assets for review before a breaking change ships
Connect, contextualize, activate
Three steps to automated data lineage documentation across your existing stack. No pipeline rewrites required.
Connect your data stack
Deploy DataHub and connect your warehouses, pipelines, and BI tools using pre-built connectors. Lineage extraction begins without changes to your existing workflows.
- Connect Snowflake, dbt, Airflow, and 57 more sources
- No pipeline rewrites or instrumentation required
- OpenLineage events ingest natively from Spark and Airflow
Contextualize with metadata
DataHub enriches each asset with ownership, classification, and schema history. Column-level lineage is parsed automatically from SQL logs and transformation definitions.
- SQL parsing extracts column paths across joins and CTEs
- Ownership and classification applied at the asset level
- Schema history tracked automatically across every change
Activate lineage across your team
Engineers, analysts, and data owners query lineage through the UI or API. Impact analysis, audit trails, and dependency graphs are available to every stakeholder.
- GraphQL and REST API for programmatic lineage access
- Downstream impact visible before any change ships
- Audit trails available for compliance and governance reviews
Built for enterprise-grade scale and security
DataHub supports self-hosted and managed deployment, role-based access control, SSO, and audit logging. It runs in the environments your security team already approves.
Supported integrations
- Snowflake
- BigQuery
- Redshift
- dbt
- Airflow
- Spark
- Looker
- Tableau
- Power BI
- Kafka
- Glue
- Hive
Deployment options
- Self-hosted on Kubernetes
- Managed cloud via Acryl Data
- REST and GraphQL API access
Security and compliance
- Role-based access control and SSO
- Full audit logging for compliance reviews
- Runs in your approved infrastructure environment
Trusted by modern data teams
"DataHub gave us visibility into our data pipelines that we did not have before. Column-level lineage changed how our team handles schema changes."



