Build AI-Assisted Data Pipelines Without Sending Anything to OpenAI
Most "AI for data" tools have the same flaw: to help you, they need to see your data. Or at least your schema. Or at minimum your queries. The pitch is good. The default architecture is "ship the customer's schema to a third-party model and trust the privacy policy."
F-Pulse takes the opposite position. The AI Copilot ships in every OSS install and runs locally on Ollama by default. No API key. No cloud roundtrip. No schema, no credentials, no query traffic ever leaving the host unless you explicitly opt in to a cloud provider and pass your own key.
This post is the architecture and setup for that local-default mode.
The three pillars that force local-first
F-Pulse's product principles are explicit:
- Determinism — the pipeline kernel is deterministic. The AI suggests; the system enforces. RBAC, policy, schema validation, and idempotency checks all run before anything touches data. No LLM is ever in the data path.
- Auditability — every run is traced and replayable. Each execution captures a SHA-256 snapshot of the IR that ran. An auditor can replay any pipeline six months later from the snapshot.
- Sovereignty — local models, local data, local control. No data leaves the host by default.
Local-first AI isn't a side feature of those pillars. It's a consequence. The moment your AI Copilot routinely sends schema or query traffic to a third party, sovereignty is gone. So the default has to be a local model that can actually drive the agent loop.
The local model floor
Not every small model can drive an agent loop. Tool-use requires structured JSON output, multi-turn coherence, and instruction-following good enough to stop when the verification step fails. As of 2026-05-19, the floor for reliable F-Pulse agent use is qwen2.5:7b on Ollama. Sub-7B Qwen 2.5 models advertise tool support but can't reliably drive the loop.
The footprint:
- ~4.7 GB on disk (Q4_K_M quantization)
- ~6 GB RAM at runtime
- 30–60 seconds per agent turn on CPU; sub-10 seconds on a consumer GPU
- No GPU required
This is a laptop-class workload. A MacBook Air with 16 GB RAM runs it. A $20/month VPS with a CPU upgrade runs it. The point of the floor is reachability — you should not need a frontier-model budget to get AI-assisted pipeline work.
What the local Copilot can actually do
The Copilot has a bounded 25-tool agent loop organised in three tiers. Most useful work happens in the read tier, which the local model handles cleanly:
Read tier (21 tools) — no confirmation needed, full audit:
workspace_overview— top-level countslist_pipelines/list_projects/list_schedules/list_alerts/list_executionslist_catalog— available node types + connectorsinspect_connections— connection metadata (never credential values)get_running_executions— what's running nowget_user_role— caller's role + env + tierget_installation_health— health score + prioritised punch list in one callsummarize_pipeline— plain-English pipeline summaryrecall_history— RAG search across executions, definitions, catalog, docsquery_metrics— execution metrics aggregation
Safe-write tier (4 tools) — standard RBAC, inline preview, idempotency cache:
compose_report— drafts a markdown/PDF reportdraft_pipeline_from_intent— turns plain English into a pipeline IR draftdraft_alert_rule— turns "alert me if X" into an alert rule draftmodify_pipeline_step— edits a step's params
High-impact-write tier (1 tool) — strict RBAC, mandatory confirmation card, dry-run-by-default for the first 3 successful runs:
apply_pipeline_draft— promotes a draft into the workspace
For typical day-to-day work — "what's running", "summarize this pipeline", "why did this fail", "draft a pipeline for Stripe charges to Postgres" — a local 7B model handles the loop without breaking a sweat.
The fast lane bypasses the LLM entirely
A common misconception about agentic interfaces is that every interaction costs a model invocation. F-Pulse ships 11+ rule-based intents that bypass the LLM completely for sub-1-second answers:
- Greetings — "hi", "hello"
- Help — "what can you do", "help"
- Product info — "what is F-Pulse", "what tier am I on"
- Workspace state — "give me an overview", "list pipelines", "list projects", "list schedules"
- Recent activity — "what failed today", "what's running now"
- Catalog — "what node types are supported"
The fast lane has a reasoning gate — prompts containing "why", "explain", "compare", "diagnose", "should I", "walk me through" fall through to the LLM. So "list pipelines" is instant; "why did pipeline X fail" goes to qwen2.5:7b.
This is the right architecture for cost and for latency: most of your interactions don't need a model at all.
Setup in 5 commands
The full local-AI stack:
# 1. Install Ollama (https://ollama.com)
curl -fsSL https://ollama.com/install.sh | sh
# 2. Pull the floor model
ollama pull qwen2.5:7b
# 3. Confirm Ollama is serving
curl http://localhost:11434/api/tags
# 4. Bring up F-Pulse OSS
docker compose up -d
# 5. Open the builder
open http://localhost:5174
F-Pulse auto-detects the local Ollama endpoint. Open Settings → AI Providers, confirm Ollama is selected with qwen2.5:7b, and the Copilot is ready. No API key entered. Nothing routed through a third party.
When you'd flip to cloud
Local-default doesn't mean local-only. Some workloads do benefit from frontier-model quality:
- Complex SQL generation across denormalized warehouses — a frontier model handles edge cases the local 7B sometimes misses
- Long-form root-cause writeups for postmortem documents — Claude/GPT-4 class models produce better prose
- Cross-pipeline lineage reasoning at scale — bigger context windows help
For those cases, F-Pulse ships 9 cloud provider integrations as opt-in escape hatches: Anthropic, OpenAI, OpenRouter, Gemini, DeepSeek, Groq, Mistral, Azure, and Custom (OpenAI-compatible). Pick one in Settings, enter your key (the key is encrypted at rest with Fernet — AES-128-CBC + HMAC-SHA256 — and never appears in LLM prompts or responses), and the Copilot uses it instead.
Choosing a cloud provider explicitly means prompts and tool inputs leave the host. F-Pulse makes that an active opt-in, not the default.
The governance scaffolding that survives jailbreaks
The Copilot ships with safeguards that live below the prompt — the runtime enforces them regardless of what the model decides to do:
- Sanitization gateway — PII, credentials, API keys, connection strings are stripped before the LLM sees data
- Idempotency cache — write tools key on (tool_name + args + tenant); duplicate calls within the TTL replay the cached result instead of re-executing
- Dry-run-by-default — new high-impact-write tools force dry-run for the first three successful runs before unlocking live mode
- Confirmation card — every write surfaces a before/after preview the user must approve
- Trace store — every run persisted with replay-safe step records (input/output hashes, never raw values)
- Prompt signing — HMAC integrity check on the system prompt; tampered prompts refuse to load
- Tool-tier RBAC — 4 roles × 2 envs × 3 tiers gating who can call which tier
- Wallet caps — per-user daily token cap, per-request cap, max-iterations cap, wall-clock cap (300s for local, 120s for cloud, override via
FPULSE_AGENT_WALL_CLOCK_S)
A model that decides to ignore the rules can't — the rules are enforced by the runtime, not by the prompt.
The RAG layer (also local)
The Copilot's product knowledge runs through a local RAG layer:
- Local embeddings via Ollama
nomic-embed-text(768-dim) - sqlite-vec workspace-scoped vector store
recall_historytool — searches execution failures, pipeline definitions, catalog, docs- Daily 03:00 UTC indexer re-indexes failures (last 30 days), pipelines, catalog, docs
- Three-layer knowledge architecture — session context (Layer 1) + product RAG (Layer 2) + live workspace state (Layer 3)
Embeddings and retrieval are local. The model never needs to see your raw pipeline definitions to answer questions about them — it sees the relevant retrieved chunks.
The math on local AI
A typical mid-sized team running 50 pipelines, with the Copilot used 30-50 times a day across the team, ships at zero LLM cost on the local default. The Copilot is genuinely useful, not a feature flag for compliance theatre.
If the team upgrades to cloud for a subset of complex tasks, F-Pulse surfaces live pricing via GET /api/ai/providers/compare so the workspace owner can budget. Every LLM call is audit-logged: provider, model, tokens in/out, latency, success/error. Token wallet caps prevent surprise bills.
What this looks like as a daily workflow
A small concrete example.
User in chat: "Stripe charges → Postgres, daily at 6am, dedup on id"
What happens (no cloud call):
- The fast lane doesn't match — this is a draft request, not a status query
- Local qwen2.5:7b receives the request with Layer 1 context (user role, current page, edition, workspace state)
- The model emits a
draft_pipeline_from_intenttool call - F-Pulse runtime validates the tool call against the user's role
- The draft pipeline IR is generated: Stripe SaaS Source → Schema Mapper → Upsert (id) → Postgres Database Sink → Schedule (cron
0 6 *) - Confirmation card surfaces in the chat with a before/after preview
- User clicks Approve
apply_pipeline_draftruns in dry-run mode (it's the first time this user has used this tool), validates the IR end-to-end without writing data- User reviews dry-run output, clicks Live
- Pipeline is created, audited, and shows up in the pipeline list
End-to-end: ~45 seconds on a CPU laptop. Zero cloud calls. Zero data leaving the host.
The bottom line
You should not have to pick between AI assistance and data sovereignty. The default in 2026 is starting to flip — Ollama, qwen2.5, and the broader local-LLM ecosystem have made small-model tool-use real. F-Pulse is built around that flip.
If your security team has been blocking AI tooling because "we can't ship our schema to a third party" — this is the architecture that resolves the trade-off.
F-Pulse OSS is Apache 2.0 and ships the AI Copilot with Ollama by default. Get the full stack in 3 minutes.
Build data pipelines visually
F-Pulse is open source. Try it in under 3 minutes.