Data catalog software built for platform engineers
Your pipelines pass. Your dashboards break anyway. DataHub connects 60+ sources, tracks lineage to the column, and surfaces quality issues before your stakeholders do.
- Connect Snowflake, dbt, Airflow, and 60+ sources without rebuilding pipelines
- Column-level lineage across your full stack, from ingestion to BI
- Catch data quality failures before they reach a dashboard or standup
See DataHub in your environment
A DataHub engineer will walk through your specific stack, not a generic script.
What does metadata debt cost your team?
Every hour spent tracing a broken pipeline is an hour not spent building. The cost compounds quietly until it surfaces in an incident.
Lineage gaps slow every incident
When a pipeline breaks, engineers spend hours tracing impact manually. Without automated lineage, every incident takes longer than it should.
Schema changes break downstream assets
A renamed column in Snowflake silently breaks three dashboards and two ML features. No alert fires. You find out in the standup.
No single source of truth
Definitions live in Confluence, Slack, and tribal knowledge. Different teams trust different numbers. Decisions stall while people argue about the data.
Governance is manual and fragile
Access policies live in spreadsheets. Compliance reviews require manual audits. One personnel change and a sensitive dataset is exposed.
A better way to manage your metadata
DataHub gives platform engineers the automation, depth, and extensibility that legacy catalog tools were never designed to provide.
Automated metadata discovery
DataHub crawls your stack and builds a unified metadata graph automatically. No manual tagging sprints, no stale wikis.
- 60+ pre-built ingestion connectors, maintained and versioned
- Scheduled and event-driven ingestion pipelines
- Custom source support via the open ingestion framework
Column-level lineage end to end
Trace any field from its source table through every transformation to the BI layer. Understand impact before you make a change, not after.
- Automated lineage from dbt, Spark, Airflow, and SQL parsing
- Upstream and downstream impact analysis in one view
- Cross-platform lineage across warehouses, lakes, and BI tools
Data quality assertions and tracking
Define quality assertions on your datasets and track pass or fail status over time. Surface incidents in the catalog before they reach a stakeholder.
- Native assertions for freshness, volume, schema, and custom SQL
- Incident tracking linked to affected downstream assets
- Integrates with Great Expectations, dbt tests, and Soda
API-first platform automation
DataHub exposes every capability through GraphQL and REST APIs. Automate metadata workflows, build internal tooling, and integrate with your existing platform stack.
- Full GraphQL and OpenAPI surface for every metadata operation
- Python SDK for programmatic ingestion and enrichment
- Webhook and event stream support for real-time metadata updates
Three steps to a governed data platform
DataHub works with your existing stack. No rip-and-replace, no rebuilding pipelines, no months-long implementation projects.
Connect your existing sources
Contextualize with lineage and quality
Activate governance across your org
Built for enterprise security and scale
DataHub is deployed by some of the largest data teams in the world. It is designed to run at scale, on your infrastructure, with the security controls your organization requires.
Deployment options
Security and access control
API and extensibility
What platform engineers say about DataHub
"DataHub gave us column-level lineage across our entire warehouse and BI layer. We went from spending half a day tracing a broken pipeline to knowing the impact in minutes. The API-first design meant we could wire it into our existing platform tooling without rebuilding anything."
Frequently asked questions about data catalog software
Bring order to your metadata stack
DataHub connects your sources, maps your lineage, and surfaces quality issues before they become incidents. A DataHub engineer will walk through your specific environment, not a generic demo script.



