F-Pulse Performance Benchmarks — DuckDB Execution on a Single Machine
When evaluating a data pipeline tool, features matter — but performance matters more. A tool that looks great but can't handle your workload is a demo, not a solution.
This post shares real performance numbers from F-Pulse running on a single machine. No cherry-picked numbers, no "up to" claims. Just what you can expect when you run docker compose up -d and start building pipelines.
Test Environment
All benchmarks were run on a commodity machine:
- CPU: Intel i7-12700 (12 cores)
- RAM: 32 GB DDR4
- Storage: NVMe SSD
- OS: Ubuntu 22.04 / Docker
- F-Pulse: v1.0.0 with DuckDB execution engine
No cloud instances, no distributed compute, no Spark cluster. Just one machine.
Benchmark 1: CSV Ingest + Transform + Output
Pipeline: CSV Source (1M rows) → Filter → Transform (SQL) → Aggregate → Parquet Output
| Dataset Size | Rows | Pipeline Time | Memory Peak | Output Size |
|---|---|---|---|---|
| 10 MB | 100K | 0.8s | 120 MB | 2.1 MB |
| 100 MB | 1M | 3.2s | 380 MB | 18 MB |
| 1 GB | 10M | 28s | 1.8 GB | 165 MB |
| 5 GB | 50M | 2m 15s | 4.2 GB | 820 MB |
Key insight: DuckDB's columnar engine processes 1M rows in ~3 seconds including the full ETL pipeline. For most team workloads (under 10M rows), F-Pulse on a single machine is fast enough that distributed compute adds complexity without benefit.
Benchmark 2: Database-to-Database (PostgreSQL → PostgreSQL)
Pipeline: DB Source (SELECT * with filter) → Transform → Deduplicate → DB Sink (UPSERT)
| Source Rows | Pipeline Time | Rows/Second | Memory Peak |
|---|---|---|---|
| 50K | 1.4s | 35,714 | 90 MB |
| 500K | 8.1s | 61,728 | 340 MB |
| 2M | 31s | 64,516 | 1.1 GB |
| 10M | 2m 40s | 62,500 | 3.4 GB |
Key insight: The bottleneck is the database sink (UPSERT), not the pipeline engine. DuckDB processes transforms faster than most databases can write. Batch size tuning on the sink (1000-5000 rows) is the main optimization lever.
Benchmark 3: Concurrent Pipeline Execution
Test: Run N pipelines simultaneously (each: 100K rows, 4 transform nodes)
| Concurrent Pipelines | Total Time | Avg per Pipeline | Memory | CPU Usage |
|---|---|---|---|---|
| 1 | 0.9s | 0.9s | 140 MB | 12% |
| 5 | 2.1s | 0.42s | 480 MB | 55% |
| 10 | 3.8s | 0.38s | 860 MB | 82% |
| 25 | 8.5s | 0.34s | 1.9 GB | 95% |
| 50 | 17.2s | 0.34s | 3.6 GB | 98% |
Key insight: F-Pulse's worker pool handles 25+ concurrent pipelines on a single machine with linear scaling. The per-pipeline time actually decreases with concurrency due to I/O overlap. At 50 concurrent pipelines, CPU is saturated but memory stays manageable.
Benchmark 4: Per-Node Preview Latency
One of F-Pulse's key features is live data preview at every node. How fast is it?
| Dataset Size | Preview Latency (per node) |
|---|---|
| 1K rows | 12ms |
| 10K rows | 45ms |
| 100K rows | 180ms |
| 1M rows | 620ms |
Key insight: Preview is instantaneous (<200ms) for datasets up to 100K rows. Even at 1M rows, the preview loads in under a second. This is what makes "see every step" practical, not just aspirational.
When F-Pulse Is Enough (And When It's Not)
F-Pulse is more than enough for:
- Datasets up to 50M rows on a single machine
- 25+ concurrent pipelines
- Sub-second preview for interactive development
- Daily/hourly batch ETL for most teams
- CDC replication from production databases
You need distributed compute (D-Pulse with Spark/Trino) when:
- Datasets exceed 50M rows regularly
- You need sub-minute processing for multi-GB datasets
- You're joining datasets that don't fit in memory
- You need multi-node parallelism for compliance-driven SLAs
The "Don't Start with Spark" Argument
Most teams reach for Spark too early. Here's why:
- Spark's minimum overhead is 10-30 seconds just to initialize a job. F-Pulse processes 1M rows in 3 seconds total.
- Spark requires a cluster — even EMR Serverless or Databricks has cold start latency and cost.
- Spark's sweet spot is 100M+ rows. Below that, DuckDB on a single machine is faster, cheaper, and simpler.
- You can always upgrade later. F-Pulse pipelines use an engine-agnostic IR — the same pipeline definition runs on DuckDB locally or Spark/Trino via D-Pulse. No rewrite needed.
The right approach: start with F-Pulse + DuckDB. If you hit scale limits, upgrade to D-Pulse with distributed engines. Don't pay the Spark complexity tax until the data demands it.
Memory Management
DuckDB is efficient with memory, but large datasets still need attention:
- Streaming execution: DuckDB processes data in batches, not all-at-once
- Spill to disk: When memory pressure is high, DuckDB spills intermediate results to disk
- Configurable limits: Set
FPULSE_MAX_MEMORYto cap DuckDB's memory usage - Per-node isolation: Each pipeline step executes independently, releasing memory between steps
How to Run Your Own Benchmarks
F-Pulse includes a sample dataset in data/samples/ for quick testing:
- Start F-Pulse:
docker compose up -d - Open the builder:
http://localhost:5174 - Drag a CSV Source → point to
data/samples/orders.csv - Add transforms (Filter, Aggregate, etc.)
- Check the execution log for timing at each node
For larger tests, generate data with:
-- In the Transform node's SQL editor:
SELECT
ROW_NUMBER() OVER () AS id,
'customer_' || (RANDOM() * 1000)::INT AS customer,
RANDOM() * 500 AS amount,
CURRENT_DATE - (RANDOM() * 365)::INT AS order_date
FROM GENERATE_SERIES(1, 1000000)
F-Pulse is free and open source. Run your own benchmarks in 3 minutes: Download here.
Build data pipelines visually
F-Pulse is open source. Try it in under 3 minutes.