Back to Blog
Data EngineeringMonitoringObservabilityData Quality

Data Pipeline Monitoring: What to Track and Why

March 20, 20267 min readBy Hybridyn Engineering

The most dangerous data pipeline failure is the one nobody notices. A pipeline that crashes loudly gets fixed in minutes. A pipeline that silently produces wrong data can corrupt weeks of business decisions before anyone realizes.

Monitoring is not optional. It's the difference between "our data is reliable" and "we think our data is probably fine."

What to Monitor

1. Pipeline Execution Status

The basics: did the pipeline run? Did it succeed or fail? How long did it take?

Metrics to track:

  • Run status — success, failure, running, skipped
  • Duration — how long each run takes (and trend over time)
  • Schedule adherence — did it start on time?
  • Retry count — how many retries before success (or ultimate failure)

Why it matters: Duration creep is an early warning sign. If a pipeline that took 5 minutes now takes 25, something changed — data volume grew, a query became inefficient, or a source system is slower.

2. Data Volume

How much data did the pipeline process?

Metrics to track:

  • Row counts — input rows, output rows, filtered rows, error rows
  • Byte volume — data size processed
  • Row count ratios — output/input ratio (should be stable across runs)

Why it matters: A pipeline that usually processes 50,000 rows but suddenly processes 500 has a problem. Either the source is empty (extraction failure) or the filter logic changed. Both need investigation.

3. Data Freshness

How old is the data in your destination?

Metrics to track:

  • Last successful run — when did the pipeline last complete successfully?
  • Data latency — time between event occurrence and availability in the destination
  • SLA compliance — is data available within the promised window?

Why it matters: A dashboard showing "updated 3 hours ago" when the SLA is 15 minutes means downstream consumers are making decisions on stale data.

4. Data Quality

Is the data correct?

Metrics to track:

  • Null rates — percentage of nulls in critical columns
  • Uniqueness — are IDs actually unique?
  • Value distribution — has the distribution of values changed unexpectedly?
  • Schema compliance — does the output match the expected schema?
  • Referential integrity — do foreign keys resolve?

Why it matters: A pipeline can succeed (status: green) while producing garbage data. Quality checks catch issues that execution monitoring misses.

Alert Strategy

Not everything needs an alert. Too many alerts cause fatigue — the team ignores them all, including the critical ones.

Tier 1: Page Immediately

  • Pipeline failure after all retries exhausted
  • SLA breach on critical pipelines
  • Data volume drops to zero
  • Source system connection failure

Tier 2: Alert Within Business Hours

  • Duration exceeding 2x normal
  • Data volume deviation greater than 50%
  • Quality rule failures above threshold
  • Schedule delays over 30 minutes

Tier 3: Report Weekly

  • Gradual duration increases
  • Minor quality score changes
  • Retry frequency trends
  • Resource utilization patterns

Building a Monitoring Dashboard

A good pipeline monitoring dashboard answers three questions at a glance:

  1. Is everything running? — Status overview of all pipelines
  2. Is anything failing? — Failed pipelines with error details
  3. Is the data fresh? — Freshness indicators for critical datasets

Essential Dashboard Panels

Pipeline Health Overview: A grid showing every pipeline with green/yellow/red status. Sort by severity so problems are always at the top.

Recent Failures: A table of failed runs with pipeline name, error message, failure time, and a link to logs. This is what the team looks at first thing every morning.

Duration Trends: A line chart showing pipeline duration over the past 30 days. Spikes and upward trends are immediately visible.

Data Freshness: A table showing each critical dataset, when it was last updated, and whether it meets its SLA.

Common Monitoring Mistakes

1. Only Monitoring Execution

"The pipeline ran successfully" is not the same as "the data is correct." A pipeline can extract zero rows, transform nothing, load an empty table, and report success. You need data quality checks, not just execution checks.

2. Alert Fatigue

If the team gets 50 alerts a day, they'll ignore all of them. Be ruthless about alert thresholds. A quality rule that fires on 0.1% null rate in a column that's naturally nullable isn't helpful.

3. No Historical Context

"The pipeline took 45 minutes" means nothing without context. Is that normal? Was it 5 minutes last week? Monitoring without baselines is guessing.

4. Missing Dependencies

Pipeline B depends on Pipeline A. Pipeline A fails. Pipeline B runs on stale data and "succeeds." Without dependency-aware monitoring, B's success hides A's failure.

Monitoring in Practice

F-Pulse includes built-in pipeline monitoring with run history, duration tracking, status alerts, and row count monitoring. For enterprise teams, D-Pulse adds SLA engine, platform health scores, and integration with Prometheus and Grafana for custom dashboards.

The key insight is that monitoring should be built into the pipeline tool, not bolted on afterward. When monitoring is an afterthought, gaps are inevitable. When it's native, every pipeline gets baseline coverage automatically.

Summary

Monitor pipeline execution (did it run?), data volume (did it process the right amount?), data freshness (is it timely?), and data quality (is it correct?). Set alerts at three tiers to avoid fatigue. Build dashboards that answer the three critical questions. And choose tools where monitoring is built in, not bolted on.

Build data pipelines visually

F-Pulse is open source. Try it in under 3 minutes.