Open Source Catalog

Data Governance Built for Enterprise Scale

Your data teams are making decisions on data they cannot fully trust. DataHub gives you the lineage, access controls, and context to govern confidently at scale.

  • Automated lineage across 90+ sources, from Snowflake to dbt
  • Policy-based access control down to the column level
  • One catalog your analysts, engineers, and auditors all trust

See DataHub govern your data

Talk with a DataHub engineer about your specific environment.

Trusted by modern data teams
The real cost

What does data governance failure actually cost?

Ungoverned data does not slow teams down alone. It creates audit exposure, erodes trust, and turns every data incident into a leadership conversation.

Lineage gaps at audit time

When an auditor asks where a number came from, your team spends days reconstructing a trail that should have been automatic.

Access sprawl across the stack

Permissions granted for a project never get revoked. Sensitive data stays visible to people who no longer need it.

No shared definition of correct

Finance says revenue is one number. Product says another. Both are pulling from the same warehouse. Neither team is wrong about their query.

Incident triage without context

A dashboard breaks. No one knows which upstream table changed, who owns it, or what depends on it. The investigation starts from zero.

The platform

A better way to govern your data

DataHub gives your team the lineage, access controls, and shared vocabulary to make governance part of how you work, not a separate process.

End-to-end lineage you can act on

DataHub tracks table-level and column-level lineage automatically across your entire stack. When something breaks, you see what changed, what depends on it, and who owns it.

  • Column-level lineage across Snowflake, dbt, Looker, and more
  • Upstream and downstream impact analysis before changes
  • Lineage propagation for tags, glossary terms, and ownership

Access control that fits your org

DataHub's policy-based access control lets you define who can view, edit, or manage any asset, scoped by domain, tag, container, or glossary term. RBAC roles give you a starting point; policies give you precision.

  • Platform and metadata policies with granular resource targeting
  • Out-of-box roles: Admin, Editor, Reader
  • Scope permissions by domain, tag, container, or URN

A business glossary teams will use

Define terms once and attach them to the assets that use them. When analysts, engineers, and stakeholders share a vocabulary, fewer decisions get made on misunderstood data.

  • Glossary terms linked directly to columns and datasets
  • Ownership and stewardship assigned per term
  • Terms propagated through lineage to downstream assets

Discovery that finds trusted data

DataHub's search understands metadata, lineage, and usage context. Teams find verified, documented assets instead of guessing which table is current.

  • Search across datasets, dashboards, pipelines, and more
  • Usage signals show which assets are actively trusted
  • Filters by domain, owner, tag, and certification status
Process

How it works

Three steps from your existing stack to governed, trusted data across your organization.

Connect your sources

Pre-built connectors for Snowflake, dbt, Looker, Kafka, and 90+ more
Ingestion runs on a schedule or on push, keeping your catalog current
Works with your existing pipelines, no rebuilding required

Contextualize with metadata

Attach ownership, glossary terms, tags, and documentation to every asset
Context propagates through lineage so downstream assets inherit what they need
Stewardship assignments make ownership visible across the org

Activate governance at scale

Enforce access policies and trigger lineage-aware incident triage
Give auditors a verifiable, automatically maintained data trail
Governance becomes part of how your team works, not a separate process
Enterprise ready

Built for enterprise-grade governance

DataHub is deployed by teams that need auditability, scale, and control. Whether you run on-prem or in the cloud, the architecture is designed to meet you where you are.

90+ pre-built connectors

Ingest metadata from the tools your team already uses, including warehouses, BI platforms, orchestrators, and streaming systems.

Flexible deployment options

Run DataHub as a managed cloud service or deploy it in your own environment. Both options share the same feature set and API surface.

API-first architecture

Every action in DataHub is available via GraphQL and REST APIs. Automate metadata workflows and integrate governance into your existing pipelines.

Open source foundation

DataHub started as open source and remains so. The community has contributed connectors, integrations, and features that benefit every user.

Customer voice

Trusted by modern data teams

Financial Services

Gartner Peer Insights

Outcome

Central source of truth for data assets across the organization

"DataHub has become the central place where our data teams understand what data exists, where it comes from, and who is responsible for it."

Gartner Peer Insights Reviewer

Financial Services

FAQ

Frequently asked questions about data governance

Most teams complete their first ingestion within a day using pre-built connectors. Full deployment timelines depend on the number of sources and your internal review process. DataHub's documentation covers common configurations for the most widely used warehouses and BI tools.
DataHub uses a policy-based model. You define who can view, edit, or manage metadata assets, scoped by domain, tag, container, or specific URN. Out-of-box RBAC roles cover most starting configurations, and custom policies let you add precision where your org structure requires it.
DataHub ships with 90+ connectors covering data warehouses, BI tools, orchestration platforms, streaming systems, and more. The connector library is actively maintained by both the DataHub team and the open source community, with new integrations added regularly.
The open source project gives you the full metadata platform to self-host. DataHub Cloud adds managed infrastructure, enterprise support, and additional governance features built for larger teams. Both share the same core architecture and API surface, so teams that start on open source can migrate to Cloud without rebuilding their integrations.
Common indicators include reduced time spent on audit requests, fewer data incidents reaching leadership, and faster onboarding for new analysts. DataHub's usage and lineage data can help you track these over time. The right metrics depend on your team's starting point and the governance problems you are solving first.

Ready to govern your data with confidence?

See how DataHub works in your environment. A DataHub engineer will walk you through lineage, access control, and discovery using your actual stack.

No generic walkthrough Scoped to your environment Talk with an engineer, not a script