performancebenchmarksDuckDBdata engineering

F-Pulse Performance Benchmarks — DuckDB Execution on a Single Machine

April 15, 20268 min readBy Hybridyn

When evaluating a data pipeline tool, features matter — but performance matters more. A tool that looks great but can't handle your workload is a demo, not a solution.

This post shares real performance numbers from F-Pulse running on a single machine. No cherry-picked numbers, no "up to" claims. Just what you can expect when you run docker compose up -d and start building pipelines.

Test Environment

All benchmarks were run on a commodity machine:

CPU: Intel i7-12700 (12 cores)
RAM: 32 GB DDR4
Storage: NVMe SSD
OS: Ubuntu 22.04 / Docker
F-Pulse: v1.0.0 with DuckDB execution engine

No cloud instances, no distributed compute, no Spark cluster. Just one machine.

Benchmark 1: CSV Ingest + Transform + Output

Pipeline: CSV Source (1M rows) → Filter → Transform (SQL) → Aggregate → Parquet Output

Dataset Size	Rows	Pipeline Time	Memory Peak	Output Size
10 MB	100K	0.8s	120 MB	2.1 MB
100 MB	1M	3.2s	380 MB	18 MB
1 GB	10M	28s	1.8 GB	165 MB
5 GB	50M	2m 15s	4.2 GB	820 MB

Key insight: DuckDB's columnar engine processes 1M rows in ~3 seconds including the full ETL pipeline. For most team workloads (under 10M rows), F-Pulse on a single machine is fast enough that distributed compute adds complexity without benefit.

Benchmark 2: Database-to-Database (PostgreSQL → PostgreSQL)

Pipeline: DB Source (SELECT * with filter) → Transform → Deduplicate → DB Sink (UPSERT)

Source Rows	Pipeline Time	Rows/Second	Memory Peak
50K	1.4s	35,714	90 MB
500K	8.1s	61,728	340 MB
2M	31s	64,516	1.1 GB
10M	2m 40s	62,500	3.4 GB

Key insight: The bottleneck is the database sink (UPSERT), not the pipeline engine. DuckDB processes transforms faster than most databases can write. Batch size tuning on the sink (1000-5000 rows) is the main optimization lever.

Benchmark 3: Concurrent Pipeline Execution

Test: Run N pipelines simultaneously (each: 100K rows, 4 transform nodes)

Concurrent Pipelines	Total Time	Avg per Pipeline	Memory	CPU Usage
1	0.9s	0.9s	140 MB	12%
5	2.1s	0.42s	480 MB	55%
10	3.8s	0.38s	860 MB	82%
25	8.5s	0.34s	1.9 GB	95%
50	17.2s	0.34s	3.6 GB	98%

Key insight: F-Pulse's worker pool handles 25+ concurrent pipelines on a single machine with linear scaling. The per-pipeline time actually decreases with concurrency due to I/O overlap. At 50 concurrent pipelines, CPU is saturated but memory stays manageable.

Benchmark 4: Per-Node Preview Latency

One of F-Pulse's key features is live data preview at every node. How fast is it?

Dataset Size	Preview Latency (per node)
1K rows	12ms
10K rows	45ms
100K rows	180ms
1M rows	620ms

Key insight: Preview is instantaneous (<200ms) for datasets up to 100K rows. Even at 1M rows, the preview loads in under a second. This is what makes "see every step" practical, not just aspirational.

When F-Pulse Is Enough (And When It's Not)

F-Pulse is more than enough for:

Datasets up to 50M rows on a single machine
25+ concurrent pipelines
Sub-second preview for interactive development
Daily/hourly batch ETL for most teams
CDC replication from production databases

Where you outgrow single-machine DuckDB:

Datasets exceed 50M rows regularly and won't fit in RAM
You need sub-minute processing for multi-GB datasets
You're joining datasets that don't fit in memory
You need multi-node parallelism for compliance-driven SLAs

At that point you have two honest options: scale up the single host (DuckDB happily uses every core and spills to disk on 64+ GB RAM machines) or move heavy joins into a warehouse you already operate (Snowflake / BigQuery / Postgres) and let F-Pulse orchestrate the pulls and pushes.

The "Don't Start with Spark" Argument

Most teams reach for Spark too early. Here's why:

Spark's minimum overhead is 10-30 seconds just to initialize a job. F-Pulse processes 1M rows in 3 seconds total.

Spark requires a cluster — even EMR Serverless or Databricks has cold start latency and cost.

Spark's sweet spot is 100M+ rows. Below that, DuckDB on a single machine is faster, cheaper, and simpler.

You don't have to commit early. Start with F-Pulse + DuckDB. If a specific pipeline outgrows it, push the heavy transform into your warehouse and keep F-Pulse as the orchestration layer — no full migration needed.

The right approach: start with F-Pulse + DuckDB. Don't pay the Spark complexity tax until the data actually demands it.

Memory Management

DuckDB is efficient with memory, but large datasets still need attention:

Streaming execution: DuckDB processes data in batches, not all-at-once
Spill to disk: When memory pressure is high, DuckDB spills intermediate results to disk
Configurable limits: Set FPULSE_MAX_MEMORY to cap DuckDB's memory usage
Per-node isolation: Each pipeline step executes independently, releasing memory between steps

How to Run Your Own Benchmarks

F-Pulse includes a sample dataset in data/samples/ for quick testing:

Start F-Pulse: docker compose up -d
Open the builder: http://localhost:5174
Drag a CSV Source → point to data/samples/orders.csv
Add transforms (Filter, Aggregate, etc.)
Check the execution log for timing at each node

For larger tests, generate data with:

-- In the Transform node's SQL editor:
SELECT
  ROW_NUMBER() OVER () AS id,
  'customer_' || (RANDOM() * 1000)::INT AS customer,
  RANDOM() * 500 AS amount,
  CURRENT_DATE - (RANDOM() * 365)::INT AS order_date
FROM GENERATE_SERIES(1, 1000000)

F-Pulse is free and open source. Run your own benchmarks in 3 minutes: Download here.

Build data pipelines visually

F-Pulse is open source. Try it in under 3 minutes.

Get F-Pulse Join D-Pulse Early Access