Back to Blog
performancebenchmarksDuckDBdata engineering

F-Pulse Performance Benchmarks — DuckDB Execution on a Single Machine

April 15, 20268 min readBy Hybridyn

When evaluating a data pipeline tool, features matter — but performance matters more. A tool that looks great but can't handle your workload is a demo, not a solution.

This post shares real performance numbers from F-Pulse running on a single machine. No cherry-picked numbers, no "up to" claims. Just what you can expect when you run docker compose up -d and start building pipelines.

Test Environment

All benchmarks were run on a commodity machine:

  • CPU: Intel i7-12700 (12 cores)
  • RAM: 32 GB DDR4
  • Storage: NVMe SSD
  • OS: Ubuntu 22.04 / Docker
  • F-Pulse: v1.0.0 with DuckDB execution engine

No cloud instances, no distributed compute, no Spark cluster. Just one machine.

Benchmark 1: CSV Ingest + Transform + Output

Pipeline: CSV Source (1M rows) → Filter → Transform (SQL) → Aggregate → Parquet Output

Dataset SizeRowsPipeline TimeMemory PeakOutput Size
10 MB100K0.8s120 MB2.1 MB
100 MB1M3.2s380 MB18 MB
1 GB10M28s1.8 GB165 MB
5 GB50M2m 15s4.2 GB820 MB

Key insight: DuckDB's columnar engine processes 1M rows in ~3 seconds including the full ETL pipeline. For most team workloads (under 10M rows), F-Pulse on a single machine is fast enough that distributed compute adds complexity without benefit.

Benchmark 2: Database-to-Database (PostgreSQL → PostgreSQL)

Pipeline: DB Source (SELECT * with filter) → Transform → Deduplicate → DB Sink (UPSERT)

Source RowsPipeline TimeRows/SecondMemory Peak
50K1.4s35,71490 MB
500K8.1s61,728340 MB
2M31s64,5161.1 GB
10M2m 40s62,5003.4 GB

Key insight: The bottleneck is the database sink (UPSERT), not the pipeline engine. DuckDB processes transforms faster than most databases can write. Batch size tuning on the sink (1000-5000 rows) is the main optimization lever.

Benchmark 3: Concurrent Pipeline Execution

Test: Run N pipelines simultaneously (each: 100K rows, 4 transform nodes)

Concurrent PipelinesTotal TimeAvg per PipelineMemoryCPU Usage
10.9s0.9s140 MB12%
52.1s0.42s480 MB55%
103.8s0.38s860 MB82%
258.5s0.34s1.9 GB95%
5017.2s0.34s3.6 GB98%

Key insight: F-Pulse's worker pool handles 25+ concurrent pipelines on a single machine with linear scaling. The per-pipeline time actually decreases with concurrency due to I/O overlap. At 50 concurrent pipelines, CPU is saturated but memory stays manageable.

Benchmark 4: Per-Node Preview Latency

One of F-Pulse's key features is live data preview at every node. How fast is it?

Dataset SizePreview Latency (per node)
1K rows12ms
10K rows45ms
100K rows180ms
1M rows620ms

Key insight: Preview is instantaneous (<200ms) for datasets up to 100K rows. Even at 1M rows, the preview loads in under a second. This is what makes "see every step" practical, not just aspirational.

When F-Pulse Is Enough (And When It's Not)

F-Pulse is more than enough for:

  • Datasets up to 50M rows on a single machine
  • 25+ concurrent pipelines
  • Sub-second preview for interactive development
  • Daily/hourly batch ETL for most teams
  • CDC replication from production databases

You need distributed compute (D-Pulse with Spark/Trino) when:

  • Datasets exceed 50M rows regularly
  • You need sub-minute processing for multi-GB datasets
  • You're joining datasets that don't fit in memory
  • You need multi-node parallelism for compliance-driven SLAs

The "Don't Start with Spark" Argument

Most teams reach for Spark too early. Here's why:

  1. Spark's minimum overhead is 10-30 seconds just to initialize a job. F-Pulse processes 1M rows in 3 seconds total.
  1. Spark requires a cluster — even EMR Serverless or Databricks has cold start latency and cost.
  1. Spark's sweet spot is 100M+ rows. Below that, DuckDB on a single machine is faster, cheaper, and simpler.
  1. You can always upgrade later. F-Pulse pipelines use an engine-agnostic IR — the same pipeline definition runs on DuckDB locally or Spark/Trino via D-Pulse. No rewrite needed.

The right approach: start with F-Pulse + DuckDB. If you hit scale limits, upgrade to D-Pulse with distributed engines. Don't pay the Spark complexity tax until the data demands it.

Memory Management

DuckDB is efficient with memory, but large datasets still need attention:

  • Streaming execution: DuckDB processes data in batches, not all-at-once
  • Spill to disk: When memory pressure is high, DuckDB spills intermediate results to disk
  • Configurable limits: Set FPULSE_MAX_MEMORY to cap DuckDB's memory usage
  • Per-node isolation: Each pipeline step executes independently, releasing memory between steps

How to Run Your Own Benchmarks

F-Pulse includes a sample dataset in data/samples/ for quick testing:

  1. Start F-Pulse: docker compose up -d
  2. Open the builder: http://localhost:5174
  3. Drag a CSV Source → point to data/samples/orders.csv
  4. Add transforms (Filter, Aggregate, etc.)
  5. Check the execution log for timing at each node

For larger tests, generate data with:

-- In the Transform node's SQL editor:
SELECT
  ROW_NUMBER() OVER () AS id,
  'customer_' || (RANDOM() * 1000)::INT AS customer,
  RANDOM() * 500 AS amount,
  CURRENT_DATE - (RANDOM() * 365)::INT AS order_date
FROM GENERATE_SERIES(1, 1000000)

F-Pulse is free and open source. Run your own benchmarks in 3 minutes: Download here.

Build data pipelines visually

F-Pulse is open source. Try it in under 3 minutes.