Apache 2.0 Licensed

Open Source Data Lineage Tools

Your pipelines pass. Your dashboards break anyway. DataHub automatically maps column-level lineage across 60+ platforms so your team finds problems before the standup does.

  • Automated lineage across Snowflake, BigQuery, dbt, Airflow, and 57 more

  • Column-level precision: trace any field from source to BI dashboard

  • Downstream impact analysis before a schema change breaks production

See DataHub lineage in your stack

A DataHub engineer will walk through your specific environment.

Trusted by modern data teams
The real cost

What breaks when lineage is manual?

Manual lineage documentation falls behind the moment a pipeline changes. The gap between what your docs say and what your data does is where incidents live.

Pipeline blame, no root cause

A dashboard breaks. You spend hours tracing upstream dependencies by hand, across systems that do not talk to each other.

Audit gaps you cannot close

Compliance asks where a field came from. Your lineage docs are six months stale. You have no defensible answer.

Schema changes hit production

A column gets renamed upstream. Three downstream reports break before anyone knows the change shipped.

Engineer hours lost to context

Your best engineers spend their Fridays explaining data flows they did not build, to stakeholders who needed the answer Tuesday.

Automated data lineage tools

A better way to map and trust your data

DataHub replaces manual lineage documentation with automated extraction, column-level precision, and interactive data lineage visualization across your full stack.

Lineage across 60+ platforms

DataHub extracts lineage automatically from query logs, pipeline metadata, and transformation definitions. No manual mapping. Connectors cover Snowflake, BigQuery, Redshift, dbt, Airflow, Looker, Tableau, and 53 more.

  • Pre-built connectors for warehouses, lakes, and BI tools
  • SQL query parsing extracts lineage from query logs automatically
  • OpenLineage support for Spark and streaming pipelines

Column-level lineage precision

Column-level lineage extracted via SQL parsing shows exactly which upstream fields feed each downstream column. When a field changes, you know every report, model, and dataset it touches before the change ships.

  • SQL-parsed column paths across joins, CTEs, and transformations
  • Toggle column view directly in the lineage graph UI
  • Identify field-level dependencies across dbt models and warehouses

Interactive lineage visualization

The DataHub lineage explorer renders your full data graph as an interactive data lineage diagram. Zoom, pan, expand nodes, and filter by time window to understand how your data moves.

  • Expand and collapse nodes to control graph depth
  • Filter lineage by time window to audit historical state
  • Share lineage views with a direct link to any asset

Downstream impact before incidents

Before renaming a column or deprecating a table, DataHub shows every downstream asset affected. Review impact across reports, models, and pipelines without running a query.

  • Downstream impact summary for any schema change
  • Filter affected assets by type, owner, or platform
  • Tag assets for review before a breaking change ships
How it works

Connect, contextualize, activate

Three steps to automated data lineage documentation across your existing stack. No pipeline rewrites required.

Connect your data stack

Deploy DataHub and connect your warehouses, pipelines, and BI tools using pre-built connectors. Lineage extraction begins without changes to your existing workflows.

  • Connect Snowflake, dbt, Airflow, and 57 more sources
  • No pipeline rewrites or instrumentation required
  • OpenLineage events ingest natively from Spark and Airflow

Contextualize with metadata

DataHub enriches each asset with ownership, classification, and schema history. Column-level lineage is parsed automatically from SQL logs and transformation definitions.

  • SQL parsing extracts column paths across joins and CTEs
  • Ownership and classification applied at the asset level
  • Schema history tracked automatically across every change

Activate lineage across your team

Engineers, analysts, and data owners query lineage through the UI or API. Impact analysis, audit trails, and dependency graphs are available to every stakeholder.

  • GraphQL and REST API for programmatic lineage access
  • Downstream impact visible before any change ships
  • Audit trails available for compliance and governance reviews
Enterprise ready

Built for enterprise-grade scale and security

DataHub supports self-hosted and managed deployment, role-based access control, SSO, and audit logging. It runs in the environments your security team already approves.

Supported integrations

  • Snowflake
  • BigQuery
  • Redshift
  • dbt
  • Airflow
  • Spark
  • Looker
  • Tableau
  • Power BI
  • Kafka
  • Glue
  • Hive

Deployment options

  • Self-hosted on Kubernetes
  • Managed cloud via Acryl Data
  • REST and GraphQL API access

Security and compliance

  • Role-based access control and SSO
  • Full audit logging for compliance reviews
  • Runs in your approved infrastructure environment
Peer reviewed

Trusted by modern data teams

Gartner Peer Insights
Verified review
Key outcome
Schema change visibility
"DataHub gave us visibility into our data pipelines that we did not have before. Column-level lineage changed how our team handles schema changes."
DE
Data Engineering Lead
Reviewed on Gartner Peer Insights

Frequently asked questions

Most teams connect their first source and see lineage within a day. Pre-built connectors handle extraction configuration, and the DataHub team supports onboarding for complex environments.
DataHub ships with connectors for 60+ platforms including Snowflake, BigQuery, Redshift, dbt, Airflow, Spark, Looker, Tableau, and Power BI. The connector framework is open and extensible.
Yes. DataHub parses SQL query logs to extract column-level lineage automatically. You can trace any field from its raw source through transformations to its downstream BI consumers.
OpenLineage is an open standard for lineage event collection, used widely with Spark and Airflow. DataHub ingests OpenLineage events natively, so pipelines instrumented with the standard feed lineage automatically.
Self-hosted DataHub runs on your own infrastructure using the open source project. Managed DataHub, offered by Acryl Data, adds hosted infrastructure, enterprise support, and additional governance features.

Ready to see your full data lineage?

Book a demo and a DataHub engineer will map lineage through your actual stack. No slides, no generic walkthrough.

Apache 2.0 open source
60+ pre-built connectors
Self-hosted or managed deployment