Open source metadata management platform
Your pipelines pass. Your dashboards break. DataHub gives your team the lineage, context, and governance to find out why before the standup.
- Connect 80+ sources: Snowflake, dbt, Looker, Kafka, and more
- Automated column-level lineage from source to dashboard
- Fine-grained governance without rebuilding your stack
See DataHub govern your data
A DataHub engineer will scope the demo to your environment.
What does fragmented metadata cost you?
Every hour your team spends hunting schemas or tracing broken lineage is an hour not spent building. The invisible cost compounds fast.
Lineage gaps at 3 a.m.
A dashboard breaks. No one can trace which upstream table changed or who owns it. Your team loses hours they do not have.
Governance you cannot prove
Auditors ask who accessed what and when. Without a metadata layer, the answer is a spreadsheet and a prayer.
Stale docs, wrong answers
Static wikis go out of date within days. Engineers stop trusting them. New hires make decisions on bad context.
Stack sprawl, no single view
Your data infrastructure spans multiple tools and clouds. Without a unified metadata layer, no one sees the full picture.
A better way to manage your metadata
DataHub gives platform engineers a production-ready open source data catalog with governance, lineage, and observability built in from day one.
Data catalog for data governance
A unified asset inventory across every source in your stack. Search, tag, and document datasets, pipelines, dashboards, and ML models in one place your team will actually use.
- Full-text search across all metadata assets
- Automated schema and ownership discovery
- Business glossary linked to physical assets
Automated lineage across your stack
Column-level lineage captured automatically from Snowflake, dbt, Spark, Airflow, and 80+ other sources. No manual mapping. No stale diagrams. Trace any broken asset back to its root in seconds.
- Column-level lineage, not just table-level
- Cross-platform lineage stitched automatically
- Impact analysis before any schema change
Open source data governance at scale
Role-based access control, fine-grained policies, and audit logs built into the platform. Governance that your compliance team can verify and your engineers can enforce without a separate tool.
- Fine-grained RBAC with policy inheritance
- Full audit trail for access and changes
- Tag-based classification for sensitive data
An open platform engineers control
Apache 2.0 licensed. Deploy on your infrastructure via Helm and Kubernetes. Extend via GraphQL and OpenAPI. No vendor lock-in, no black-box behavior, and 12,000+ GitHub stars from the community that built it with you.
- Apache 2.0 license, self-hosted or managed
- GraphQL and OpenAPI for full extensibility
- Helm chart deployment on Kubernetes
How it works
Three steps to a governed, searchable, and trusted metadata layer across your entire data platform.
Connect your data sources
Contextualize every asset
Activate governance org-wide
Built for enterprise-grade scale and security
DataHub is deployed by enterprise data teams that need production-grade reliability, security controls, and flexible deployment options.
Deployment options
Security and access control
Extensibility and integrations
Trusted by modern data teams
Platform engineers and data leaders at enterprise organizations rely on DataHub to govern their metadata at scale.
Gartner Peer Insights
Verified review
Outcome
Unified metadata layer across a complex multi-cloud stack
"DataHub gave us the lineage and governance layer we had been trying to build manually for two years. We finally have a single place where engineers and analysts can trust what they are looking at."
Frequently asked questions about open source metadata management
Real questions from platform engineers and data leaders evaluating DataHub.
Take control of your metadata
You will speak with a DataHub engineer about your specific environment, not a generic walkthrough. Bring your stack, your questions, and your constraints.



