Building a Medallion Architecture ETL Pipeline — Bronze, Silver, Gold Explained
The medallion architecture — Bronze, Silver, Gold — has become the default pattern for organizing data in lakehouses and data platforms. It's simple to explain, hard to get wrong, and flexible enough for teams of any size.
This guide walks through the architecture, the design decisions that matter, and how to implement it with a visual pipeline builder.
What Is the Medallion Architecture?
The medallion architecture organizes data into three layers based on quality and readiness:
Bronze (Raw): Data as it arrived. No transformations, no cleaning. The system of record for your pipeline. If something goes wrong downstream, you can always replay from Bronze.
Silver (Cleaned): Data that has been typed, deduplicated, validated, and standardized. Business logic starts here — renaming columns, casting types, filtering bad records.
Gold (Business-Ready): Aggregated, joined, and modeled data ready for dashboards, reports, and APIs. These are your data products — the tables that business users query directly.
Why Three Layers?
Replayability
Bronze stores raw data. If your Silver transform has a bug, fix it and rerun — the source data hasn't changed.
Quality Gates
Each layer boundary is a checkpoint. You can assert row counts, null rates, uniqueness, and freshness before data moves to the next layer.
Team Boundaries
Different teams can own different layers. Data engineers own Bronze → Silver. Analytics engineers own Silver → Gold. This maps cleanly to RBAC and approval workflows.
Designing Your Bronze Layer
Bronze should be append-only and schema-flexible. Store data in Parquet or JSON, partitioned by ingestion date.
Key decisions:
- Partitioning: By
ingest_dateorsource_system/ingest_date - Format: Parquet (columnar, compressed) for structured data. JSON for semi-structured.
- Retention: Keep Bronze data for at least 90 days. Some teams keep it indefinitely — storage is cheap, reprocessing is not.
- Schema: Don't enforce strict schemas at Bronze. Let the Silver layer handle validation.
Common Sources
- Database CDC (Debezium, WAL-based)
- SaaS API pulls (Stripe, HubSpot, Salesforce)
- Event streams (Kafka, Redpanda)
- File drops (S3, SFTP, GCS)
Designing Your Silver Layer
Silver is where you make data reliable. Every record should be:
- Correctly typed (strings to dates, ints to decimals)
- Deduplicated (by primary key + timestamp)
- Validated (null checks, range checks, referential integrity)
- Standardized (consistent naming, timezone normalization)
SQL Transform Pattern
SELECT
CAST(id AS INTEGER) AS customer_id,
TRIM(LOWER(email)) AS email,
CAST(created_at AS TIMESTAMP) AS created_at,
COALESCE(status, 'unknown') AS status
FROM source_table
WHERE id IS NOT NULL
AND email IS NOT NULL
Data Quality Gates
Between Bronze → Silver, assert:
- Row count within expected range (±20% of prior run)
- Critical columns have <1% null rate
- Primary key uniqueness holds
- Timestamps are within valid ranges
If any gate fails, the pipeline should halt and alert — not silently pass bad data downstream.
Designing Your Gold Layer
Gold tables are business-facing. They should be:
- Named in business terms (
monthly_revenue, notagg_txn_amt_m) - Documented with column descriptions
- Versioned (add columns, never remove without migration)
- Optimized for query patterns (partitioned by common filter columns)
Common Gold Patterns
Fact tables: Events with foreign keys. orders, page_views, support_tickets.
Dimension tables: Slowly changing entities. customers, products, regions.
Aggregate tables: Pre-computed rollups. daily_revenue_by_region, weekly_active_users.
Wide tables: Denormalized joins for dashboard performance. order_detail_with_customer_and_product.
Implementing with F-Pulse
F-Pulse includes a Medallion ETL template that sets up the full three-layer pipeline:
- Source Node — Connect to your database, API, or file system
- Bronze Sink — Write raw data to S3/MinIO as Parquet, partitioned by date
- Transform Node — SQL transform with the expression editor (references upstream data as
source_table) - Data Quality Node — Assert row counts, null rates, and uniqueness
- Silver Sink — Write cleaned data to the silver bucket
- Aggregate Transform — Join and aggregate for business use
- Gold Sink — Write final tables
Each node shows live data preview — you can inspect the output at every stage before promoting to production.
Operational Patterns
Backfill
When you change a Silver transform, you need to reprocess historical Bronze data. Design your pipeline to accept a date range parameter. F-Pulse's scheduling supports backfill runs out of the box.
Late-Arriving Data
Data from APIs often arrives late. Your Silver dedup logic should handle re-processing the same record with an updated timestamp. Use ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC) to keep the latest version.
Schema Evolution
Bronze is append-only, so new columns appear naturally. Silver transforms should handle missing columns with COALESCE or conditional logic. Gold tables should be additive — add columns, don't remove them.
Anti-Patterns to Avoid
- Skipping Bronze: Writing cleaned data directly means you can't replay from raw. Always land raw data first.
- Too many layers: Some teams add "Platinum" or "Diamond" layers. Stick with three — complexity kills maintainability.
- No quality gates: Moving data between layers without validation is just copying files. The gates are the point.
- Gold tables with technical names: If a business user can't understand the table name, it's not Gold yet.
Getting Started
The fastest way to build a medallion pipeline:
- Download F-Pulse (
docker compose up -d) - Open the pipeline builder and select the Medallion ETL template
- Configure your source (database, API, or file)
- Set up Bronze, Silver, and Gold sinks pointing to your storage
- Add data quality gates between each layer
- Schedule and monitor
F-Pulse is free and open source under the MIT license. The Medallion ETL template is included by default. Get started here.
Build data pipelines visually
F-Pulse is open source. Try it in under 3 minutes.