Ollamalocal LLMAIF-Pulseprivacydata engineeringqwen

Build AI-Assisted Data Pipelines Without Sending Anything to OpenAI

May 29, 20268 min readBy Hybridyn Engineering

Most "AI for data" tools have the same flaw: to help you, they need to see your data. Or at least your schema. Or at minimum your queries. The pitch is good. The default architecture is "ship the customer's schema to a third-party model and trust the privacy policy."

F-Pulse takes the opposite position. The AI Copilot ships in every OSS install and runs locally on Ollama by default. No API key. No cloud roundtrip. No schema, no credentials, no query traffic ever leaving the host unless you explicitly opt in to a cloud provider and pass your own key.

This post is the architecture and setup for that local-default mode.

The three pillars that force local-first

F-Pulse's product principles are explicit:

Determinism — the pipeline kernel is deterministic. The AI suggests; the system enforces. RBAC, policy, schema validation, and idempotency checks all run before anything touches data. No LLM is ever in the data path.
Auditability — every run is traced and replayable. Each execution captures a SHA-256 snapshot of the IR that ran. An auditor can replay any pipeline six months later from the snapshot.
Sovereignty — local models, local data, local control. No data leaves the host by default.

Local-first AI isn't a side feature of those pillars. It's a consequence. The moment your AI Copilot routinely sends schema or query traffic to a third party, sovereignty is gone. So the default has to be a local model that can actually drive the agent loop.

The local model floor

Not every small model can drive an agent loop. Tool-use requires structured JSON output, multi-turn coherence, and instruction-following good enough to stop when the verification step fails. As of 2026-05-19, the floor for reliable F-Pulse agent use is qwen2.5:7b on Ollama. Sub-7B Qwen 2.5 models advertise tool support but can't reliably drive the loop.

The footprint:

~4.7 GB on disk (Q4_K_M quantization)
~6 GB RAM at runtime
30–60 seconds per agent turn on CPU; sub-10 seconds on a consumer GPU
No GPU required

This is a laptop-class workload. A MacBook Air with 16 GB RAM runs it. A $20/month VPS with a CPU upgrade runs it. The point of the floor is reachability — you should not need a frontier-model budget to get AI-assisted pipeline work.

What the local Copilot can actually do

The Copilot has a bounded 25-tool agent loop organised in three tiers. Most useful work happens in the read tier, which the local model handles cleanly:

Read tier (21 tools) — no confirmation needed, full audit:

workspace_overview — top-level counts
list_pipelines / list_projects / list_schedules / list_alerts / list_executions
list_catalog — available node types + connectors
inspect_connections — connection metadata (never credential values)
get_running_executions — what's running now
get_user_role — caller's role + env + tier
get_installation_health — health score + prioritised punch list in one call
summarize_pipeline — plain-English pipeline summary
recall_history — RAG search across executions, definitions, catalog, docs
query_metrics — execution metrics aggregation

Safe-write tier (4 tools) — standard RBAC, inline preview, idempotency cache:

compose_report — drafts a markdown/PDF report
draft_pipeline_from_intent — turns plain English into a pipeline IR draft
draft_alert_rule — turns "alert me if X" into an alert rule draft
modify_pipeline_step — edits a step's params

High-impact-write tier (1 tool) — strict RBAC, mandatory confirmation card, dry-run-by-default for the first 3 successful runs:

apply_pipeline_draft — promotes a draft into the workspace

For typical day-to-day work — "what's running", "summarize this pipeline", "why did this fail", "draft a pipeline for Stripe charges to Postgres" — a local 7B model handles the loop without breaking a sweat.

The fast lane bypasses the LLM entirely

A common misconception about agentic interfaces is that every interaction costs a model invocation. F-Pulse ships 11+ rule-based intents that bypass the LLM completely for sub-1-second answers:

Greetings — "hi", "hello"
Help — "what can you do", "help"
Product info — "what is F-Pulse", "what tier am I on"
Workspace state — "give me an overview", "list pipelines", "list projects", "list schedules"
Recent activity — "what failed today", "what's running now"
Catalog — "what node types are supported"

The fast lane has a reasoning gate — prompts containing "why", "explain", "compare", "diagnose", "should I", "walk me through" fall through to the LLM. So "list pipelines" is instant; "why did pipeline X fail" goes to qwen2.5:7b.

This is the right architecture for cost and for latency: most of your interactions don't need a model at all.

Setup in 5 commands

The full local-AI stack:

# 1. Install Ollama (https://ollama.com)
curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull the floor model
ollama pull qwen2.5:7b

# 3. Confirm Ollama is serving
curl http://localhost:11434/api/tags

# 4. Bring up F-Pulse OSS
docker compose up -d

# 5. Open the builder
open http://localhost:5174

F-Pulse auto-detects the local Ollama endpoint. Open Settings → AI Providers, confirm Ollama is selected with qwen2.5:7b, and the Copilot is ready. No API key entered. Nothing routed through a third party.

When you'd flip to cloud

Local-default doesn't mean local-only. Some workloads do benefit from frontier-model quality:

Complex SQL generation across denormalized warehouses — a frontier model handles edge cases the local 7B sometimes misses
Long-form root-cause writeups for postmortem documents — Claude/GPT-4 class models produce better prose
Cross-pipeline lineage reasoning at scale — bigger context windows help

For those cases, F-Pulse ships 9 cloud provider integrations as opt-in escape hatches: Anthropic, OpenAI, OpenRouter, Gemini, DeepSeek, Groq, Mistral, Azure, and Custom (OpenAI-compatible). Pick one in Settings, enter your key (the key is encrypted at rest with Fernet — AES-128-CBC + HMAC-SHA256 — and never appears in LLM prompts or responses), and the Copilot uses it instead.

Choosing a cloud provider explicitly means prompts and tool inputs leave the host. F-Pulse makes that an active opt-in, not the default.

The governance scaffolding that survives jailbreaks

The Copilot ships with safeguards that live below the prompt — the runtime enforces them regardless of what the model decides to do:

Sanitization gateway — PII, credentials, API keys, connection strings are stripped before the LLM sees data
Idempotency cache — write tools key on (tool_name + args + tenant); duplicate calls within the TTL replay the cached result instead of re-executing
Dry-run-by-default — new high-impact-write tools force dry-run for the first three successful runs before unlocking live mode
Confirmation card — every write surfaces a before/after preview the user must approve
Trace store — every run persisted with replay-safe step records (input/output hashes, never raw values)
Prompt signing — HMAC integrity check on the system prompt; tampered prompts refuse to load
Tool-tier RBAC — 4 roles × 2 envs × 3 tiers gating who can call which tier
Wallet caps — per-user daily token cap, per-request cap, max-iterations cap, wall-clock cap (300s for local, 120s for cloud, override via FPULSE_AGENT_WALL_CLOCK_S)

A model that decides to ignore the rules can't — the rules are enforced by the runtime, not by the prompt.

The RAG layer (also local)

The Copilot's product knowledge runs through a local RAG layer:

Local embeddings via Ollama nomic-embed-text (768-dim)
sqlite-vec workspace-scoped vector store
recall_history tool — searches execution failures, pipeline definitions, catalog, docs
Daily 03:00 UTC indexer re-indexes failures (last 30 days), pipelines, catalog, docs
Three-layer knowledge architecture — session context (Layer 1) + product RAG (Layer 2) + live workspace state (Layer 3)

Embeddings and retrieval are local. The model never needs to see your raw pipeline definitions to answer questions about them — it sees the relevant retrieved chunks.

The math on local AI

A typical mid-sized team running 50 pipelines, with the Copilot used 30-50 times a day across the team, ships at zero LLM cost on the local default. The Copilot is genuinely useful, not a feature flag for compliance theatre.

If the team upgrades to cloud for a subset of complex tasks, F-Pulse surfaces live pricing via GET /api/ai/providers/compare so the workspace owner can budget. Every LLM call is audit-logged: provider, model, tokens in/out, latency, success/error. Token wallet caps prevent surprise bills.

What this looks like as a daily workflow

A small concrete example.

User in chat: "Stripe charges → Postgres, daily at 6am, dedup on id"

What happens (no cloud call):

The fast lane doesn't match — this is a draft request, not a status query
Local qwen2.5:7b receives the request with Layer 1 context (user role, current page, edition, workspace state)
The model emits a draft_pipeline_from_intent tool call
F-Pulse runtime validates the tool call against the user's role
The draft pipeline IR is generated: Stripe SaaS Source → Schema Mapper → Upsert (id) → Postgres Database Sink → Schedule (cron 0 6 *)
Confirmation card surfaces in the chat with a before/after preview
User clicks Approve
apply_pipeline_draft runs in dry-run mode (it's the first time this user has used this tool), validates the IR end-to-end without writing data
User reviews dry-run output, clicks Live
Pipeline is created, audited, and shows up in the pipeline list

End-to-end: ~45 seconds on a CPU laptop. Zero cloud calls. Zero data leaving the host.

The bottom line

You should not have to pick between AI assistance and data sovereignty. The default in 2026 is starting to flip — Ollama, qwen2.5, and the broader local-LLM ecosystem have made small-model tool-use real. F-Pulse is built around that flip.

If your security team has been blocking AI tooling because "we can't ship our schema to a third party" — this is the architecture that resolves the trade-off.

F-Pulse OSS is Apache 2.0 and ships the AI Copilot with Ollama by default. Get the full stack in 3 minutes.

Build data pipelines visually

F-Pulse is open source. Try it in under 3 minutes.

Get F-Pulse Join D-Pulse Early Access