Smart anomaly detection

Data anomaly detection that knows your data

For platform engineers tired of threshold babysitting. DataHub's ML-powered Smart Assertions learn your data's patterns and surface real anomalies before they hit production.

  • Detects volume, freshness, column, and custom SQL anomalies
  • Learns seasonality, including Mondays, weekends, and quarter-end spikes
  • Scales across Snowflake, BigQuery, Redshift, and Databricks

See it live with your stack

A DataHub engineer will walk through your environment, not a generic script.

Trusted by modern data teams
The problem

Why do data anomalies keep slipping through?

Static thresholds miss seasonal patterns. Manual rules don't scale. By the time your team finds out, the dashboard is already wrong.

Threshold fatigue at scale

Thousands of tables. Hundreds of thresholds. Every one of them wrong the moment the data changes cadence or volume.

Seasonality blindness

A Monday spike isn't an anomaly. A quiet weekend isn't a failure. Static rules can't tell the difference, so your on-call queue fills with noise.

Scale without coverage

Writing manual rules for every new table isn't a strategy. Coverage gaps grow faster than your team can close them.

Delayed detection

By the time a broken dashboard surfaces a data issue, downstream consumers have already acted on bad numbers.

The solution

A better way to detect data anomalies

Smart Assertions replace brittle thresholds with ML-powered monitoring that learns your data's normal patterns across every dimension.

Volume anomaly detection

Smart Assertions track historical load patterns per table and flag deviations that fall outside learned norms, without requiring a manually set threshold.

  • Learns expected row counts per table and time window
  • Flags drops and spikes relative to historical baselines
  • Covers batch, streaming, and incremental load patterns

Freshness anomaly detection

DataHub monitors update cadence per asset and alerts when a table hasn't refreshed within its expected window, accounting for known schedule variations.

  • Tracks last-updated timestamps across all monitored assets
  • Adapts expected cadence to day-of-week patterns
  • Routes alerts to the owning team via Slack or PagerDuty

Column metric anomaly detection

Monitor null rates, distinct counts, and numeric distributions at the column level. DataHub learns what normal looks like and alerts when values drift outside that range.

  • Tracks null rate, uniqueness, and value distribution
  • Detects schema drift and unexpected type changes
  • Configurable sensitivity per column or dataset

Custom SQL anomaly detection

When built-in monitors don't cover your business rules, Custom SQL Assertions let you define exactly what to measure and what counts as an anomaly for your data.

  • Define assertions using any SQL your warehouse supports
  • Set pass/fail conditions on returned numeric values
  • Schedule runs independently of pipeline cadence
How it works

How it works

Three steps from connection to continuous anomaly detection across your data platform.

Connect your data warehouse

Connects to Snowflake, BigQuery, Redshift, and Databricks
Uses your existing credentials and permissions model
No pipeline rebuilds or schema changes required

Contextualize assets with metadata

Maps tables to owners, domains, and downstream consumers
Every alert carries the context needed to act on it
Ownership follows your existing org structure

Activate Smart Assertions at scale

Enable ML-powered monitoring at the dataset level
DataHub learns patterns within the first observation window
Anomalies surface before they reach downstream consumers
Enterprise ready

Built for enterprise-grade scale and security

Flexible deployment, role-based access control, and broad platform support for detecting anomalies across your entire data estate.

Flexible deployment

Managed cloud service or self-hosted in your VPC
Your data never leaves your environment unless you choose
Apache 2.0 licensed open source core

Role-based access control

Assign monitor ownership by team, domain, or data product
Alert routing rules set by owner, domain, or severity
Permissions follow your existing org structure

Supported platforms

Snowflake, BigQuery, Redshift, and Databricks
dbt, Airflow, Kafka, and 80+ additional connectors
Alerts via Slack, PagerDuty, Teams, or any webhook
Social proof

Trusted by modern data teams

Gartner Peer Insights

Verified reviewer

Outcome

Reduced time investigating data issues

"DataHub gives our team a single place to understand data lineage, ownership, and quality. The observability features have reduced the time we spend investigating data issues."

Verified Reviewer

Gartner Peer Insights

FAQ

Frequently asked questions about data anomaly detection

Most teams complete their first connection and activate Smart Assertions within a single session. The time to first alert depends on how much historical data DataHub has available to learn from. Teams with longer data history see more accurate baselines from the start.
Smart Assertions learn day-of-week and time-of-day patterns from your historical data. A lower row count on a Sunday is evaluated against Sunday baselines, not a flat weekly average. This means detecting anomalies in a dataset with known seasonal patterns works without manual configuration.
Yes. Sensitivity is configurable at the dataset or column level. You can also mark known events, such as a planned backfill, so DataHub excludes them from baseline calculations. This gives your team control over signal quality without disabling monitoring entirely.
DataHub supports Snowflake, BigQuery, Redshift, and Databricks for Smart Assertions today. Custom SQL Assertions are available on any warehouse that accepts standard SQL queries. The full connector list is available on the DataHub website.
Alerts route to Slack, PagerDuty, Microsoft Teams, or any webhook endpoint. Routing rules can be set by dataset owner, domain, or severity level. Each alert includes the asset context, ownership information, and the specific anomaly that triggered it.

Catch data anomalies before they reach production

DataHub's Smart Assertions monitor your data continuously, so your team spends less time investigating incidents and more time building. No long implementation required.

We'll walk through your environment in the demo. No commitment required.

Request a Demo