Pulse-AgentAIAgentsData Engineering

Plan, Act, Verify: Why Data Agents Need a Loop, Not a Prompt

Name: Hybridyn
Author: Hybridyn

April 8, 20268 min readBy Hybridyn Engineering

There are two kinds of AI tools for data teams right now, and they get conflated constantly.

The first kind is chat-with-your-data. You ask a question in natural language, the tool generates a SQL query against your warehouse, returns the result. Useful. Limited. The tool can answer, but it can't act.

The second kind is an agent. You describe an outcome, the tool figures out the steps, executes them, checks the result, and tells you what changed. Much more useful. Much more dangerous if you do it wrong.

The difference between the two is a control loop — Plan, Act, Verify — and ten guardrails that decide what the agent is and isn't allowed to do without permission. We built Pulse-Agent around this loop because every other approach we tried either underdelivered (chat-only) or overreached (autonomous agents that touched production data without asking).

This is the engineering argument for why the loop matters and what's in it.

Why "just call the LLM in a loop" doesn't work

The first version of every data agent looks the same. You give an LLM a system prompt, a list of tools (run_sql, read_file, write_file), and a goal. You let it call tools in a loop until it thinks it's done. Maybe you add a max iteration count. Ship it.

This works for demos. It does not work for production data work, for three reasons.

1. The LLM doesn't know what's safe. The model sees run_sql as one tool. It doesn't know that SELECT * FROM customers LIMIT 10 is fine and DELETE FROM customers WHERE created_at < '2025-01-01' is a career-ending event. From the model's perspective, both are SQL strings. From the database's perspective, one is a question and the other is a fire.

2. The LLM doesn't know what just happened. When a tool returns "success" or returns 1000 rows of result, the model sees that as confirmation that things went well. It doesn't notice that the wrong table got updated. It doesn't notice that the schema drift it introduced will break a downstream dashboard. It just declares success and moves on.

3. The LLM has no skin in the game. If something goes wrong, the model has no consequences. It can't be paged, it can't lose its job, it doesn't have to write the postmortem. So the only thing keeping the agent honest is the structure you put around it.

The structure is the loop.

The loop, in detail

Plan → Act → Verify. Every non-trivial agent action runs through it. Here's what each step actually does.

Plan

Before the agent touches anything, it writes a plan. The plan is a structured document, not a paragraph of prose:

Goal: what the user asked for, in the agent's own words
Steps: an ordered list of tool calls the agent intends to make
Reads: which data sources, files, and tables will be read
Writes: which data sources, files, and tables will be written
Verification: how the agent will know if the plan worked
Risks: anything the agent thinks could go wrong

The plan is shown to the user before anything executes. For read-only plans, the user can let it run automatically. For plans with writes, the user reviews and approves explicitly.

The plan is also the contract. If the agent deviates from the plan during Act, it stops and re-plans. It doesn't just improvise.

This sounds bureaucratic. It is. That's the point. Bureaucracy is what stops agents from doing impressive demo things and accidentally horrifying production things.

Act

The agent executes the plan step by step. Two important properties:

Steps run in declared scope. A step that says "read from staging.events" cannot, mid-execution, decide to read from prod.customers instead. The runtime intercepts the tool call and validates it against the plan. If the agent tries to do something outside the plan, the tool call fails and the agent has to re-plan.

Writes pause for confirmation if not pre-authorized. Read-only steps run uninterrupted. Write steps (DDL, DML, file edits, pipeline triggers) check whether the user pre-approved them at plan time. If not, the runtime stops and asks. The user can approve once, approve always for this session, or reject.

This is where the difference between an agent you can leave running and an agent that needs a babysitter actually lives.

Verify

After Act finishes, the agent checks its own work. Verification is a separate step with separate logic — it's not the model deciding "yeah I think that went well." It's structured checks against the plan:

Did the SQL query return rows? How many?
Did the file write succeed? Is the resulting file the size we expected?
Did the dbt run finish without test failures?
Did the pipeline trigger actually start a new run?
Does the new schema match the schema in the plan?

The verification result is reported back to the user along with what actually happened. If verification fails, the agent doesn't try to silently fix it — it surfaces the failure and asks what to do.

This is the step that catches the "the agent ran the wrong query and declared success" failure mode that pure-loop agents have. Verification is where the lying gets caught.

Ten guardrails that aren't optional

The loop is the structure. The guardrails are the rules inside the structure. They live below the prompt, which means they survive jailbreaks — a model that decides it should ignore the rules can't, because the rules are enforced by the runtime, not the prompt.

Pulse-Agent ships with ten of them:

No destructive SQL without per-call approval. DROP, TRUNCATE, unscoped DELETE, and similar statements fail unless the user explicitly approved that exact query in the plan.
No writes to production warehouses unless tagged writable. Connections are tagged at config time. A connection without the writable tag rejects writes at the runtime layer, period.
No credentials in LLM context. Secrets are stored in a vault, referenced by name in tool calls, and never appear in the model's prompt or response. The model literally cannot see them.
No silent file edits. Every file write produces a diff, the diff is logged, and the user can review it. There is no version of "the agent quietly changed line 47 of your dbt model."
No expensive queries without cost preview. Before running a query that could scan a large table, the agent runs EXPLAIN or the warehouse-equivalent and shows the estimated cost. The user approves the cost before the query runs.
No schema changes without impact preview. Before any DDL, the agent walks downstream lineage and lists what will break. You see the blast radius before you act.
No cross-environment moves without confirmation. Promoting code or data from dev to prod requires an explicit confirmation step. The agent will not silently move things across environments.
No calls outside the configured allowlist. External APIs, MCP servers, and webhook endpoints must be on an allowlist. The runtime blocks any call to a destination not on the list.
No result retention beyond the session unless requested. Query results, file contents, and tool outputs are dropped when the session ends. The agent does not build a long-lived index of your data without you asking it to.
No background actions in ambient mode without opt-in. Ambient mode (the background watcher) only acts when you explicitly enable it. By default, it observes and notifies — it doesn't do anything on its own.

These aren't suggestions. They're enforced at the runtime layer, below the prompt. A user can disable any individual guardrail per workspace if they have admin permissions and an explicit reason — but the default is on, for everything, always.

Why the loop matters more than the model

Here's the thing that gets lost in most agent discussions. The frontier LLM you pick matters far less than the loop you put around it.

A weak model inside a strong loop will outperform a frontier model inside no loop, because the loop catches the mistakes. A frontier model with no structure will produce more impressive demos and more catastrophic failures, because the failures are subtle and confident.

The model is replaceable. Pulse-Agent supports nine providers — Claude, OpenAI, Gemini, Azure OpenAI, Ollama, DeepSeek, Mistral, Groq, and custom — and you can swap between them in one config line. The loop is the same regardless of which model you pick. That's deliberate. The loop is the product. The model is a substrate.

This is also why running locally with Ollama is the default. Most data work doesn't need a frontier model. Most data work needs a competent model that can read SQL, follow a plan, and stop when the verification step fails. A 7B local model inside a strict loop can do that work without ever sending your schema to a third-party API.

What this looks like in practice

A small concrete example. The user says: "Our daily orders ETL has been running 3x slower for the last week. Find out why."

Without a loop, an agent might do something like this:

Connect to the warehouse.
Run a query to look at recent runs.
See some big numbers.
Generate an explanation that sounds plausible.
Declare done.

With Plan → Act → Verify, the same request looks like this:

Plan. The agent writes a plan: read the orchestrator's run history for the orders ETL, identify the slow steps, query the warehouse for query plans on the slow steps, look at the table sizes and partition stats, compare to a week ago. All reads, no writes. Verification: produce a written root cause with evidence. User approves.
Act. The agent runs each step in order. It hits a permission error on the orchestrator API and stops. It surfaces the error. User grants the permission. Agent re-plans the same step, retries, succeeds. Continues.
Verify. The agent checks whether it actually has enough evidence to write a root cause. It notices that the partition stats query returned no data because of a typo in the table name. It surfaces the gap, fixes the query, re-runs. Final root cause: the orders source table started writing 8x more rows per day after a feature launch on the 2nd, and the silver-layer transform is doing a full scan instead of an incremental merge.

The second version takes longer. It's also the version you can trust at 3am.

The bottom line

If you're building data agents and you don't have a Plan → Act → Verify loop with enforced guardrails, you don't have an agent. You have a chatbot with the safety off.

If you're evaluating agents and the vendor demos them by pasting a prompt into a chat box and watching the magic happen, ask three questions:

Show me the plan before it executes.
Show me what happens when I ask it to do something destructive.
Show me the verification step.

If the answers are vague, the loop isn't there. If the loop isn't there, you're going to find out the hard way.

Pulse-Agent is the loop. It's free, it runs locally by default, and it works with whichever data tools you already have. You can read about the 34 built-in skills that ship inside the loop, or just download it and watch it ask permission before touching anything. That part is the whole point.

Build data pipelines visually

F-Pulse is open source. Try it in under 3 minutes.

Get F-Pulse Request D-Pulse Demo