All posts

AI Engineering

Loop Engineering: Stop Prompting Your Agent, Start Writing Loops

The people who built these coding agents have quietly stopped prompting them. They write loops instead — programs that prompt the agent, check its work, and decide what to do next. Here is what loop engineering actually is, the primitives that make a loop work, and the trap that turns it into an expensive way to produce nothing.

Jithin Kumar PalepuJune 15, 202612 min read

Loop engineering is the shift from writing prompts to writing the program that prompts the agent for you. Instead of typing instructions turn by turn, you hand the system a verifiable goal and let it run a loop: observe the state, choose an action, execute it, check the result, then decide whether to continue, retry, or stop. A prompt gives an agent a task. A loop gives an agent a job with a standard.

What is loop engineering?

For two years the headline skill in AI was prompt engineering: phrase the request well enough and the model does the rest. Loop engineering moves the model from a static call-and-response tool to an active participant in an event loop. You stop being the person in the chair feeding it instructions, and instead design the system that feeds it — on a schedule, against a goal, with a check at the end.

The structure is the same every time: discover → plan → execute → verify → iterate. If verification passes, the loop stops. If it fails, the loop tries again. The whole thing only works because the goal is something a machine can check — tests pass, the code compiles, an exit code is zero, a metric moves in the right direction. That verification gate is the entire game, and it is where most loops live or die.

A prompt gives an agent a task. A loop gives an agent a job with a standard.

Why are the people who built these tools saying this?

The clearest signal came from inside the labs. Boris Cherny, who leads Claude Code at Anthropic, put it bluntly: “I don't prompt Claude anymore. I have loops that are running. They're the ones prompting Claude and figuring out what to do. My job is to write loops.” He has described running a few hundred agents that read his GitHub, Slack, and Twitter and decide what work to pick up — and disclosed a recent month in which Claude Code wrote every line of code across 259 pull requests, without him opening an IDE.

The same week, Peter Steinberger, creator of the open-source agent project OpenClaw, posted the directive that gave the movement its slogan: “You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents.” When the person who built the agent and the person who built the most-starred new repo on GitHub independently say the same thing, it is worth taking seriously — and worth being skeptical about, because both are unusually equipped to make loops behave.

This is the natural next rung on a ladder this site has been climbing: from prompt engineering, to context engineering, to harness engineering. Loop engineering is what you get when the harness stops being something you operate and becomes something that operates on its own.

What are the primitives of a working loop?

There is no single agreed definition of what a loop needs, but the most useful breakdown comes from Google engineer Addy Osmani, who splits it into six parts. Get these right and the loop runs itself; skip one and you usually find out which at 3am, in your billing dashboard.

Automations

The heartbeat. A scheduled cron job, a /goal command, a CI hook — something that starts the loop again and again without you. This is what makes a loop an actual loop.

Why it matters Without a trigger, you have a script you ran once — not a loop.

Worktrees

Isolated branch environments (git worktrees) so two or more agents working in parallel do not step on each other. Each gets a clean checkout; results merge back when they are done.

Why it matters Parallel agents that share a working directory will overwrite each other's code.

Skills

Markdown files (SKILL.md) holding persistent project knowledge and conventions the agent would otherwise guess at on every run. Write down what the loop should already know.

Why it matters Re-deriving project context every cycle wastes tokens and invites drift.

Connectors

MCP servers and plugins that wire the agent into the tools you already use — GitHub, Jira, Slack, internal databases, staging APIs. Osmani counts these as a primitive distinct from skills.

Why it matters A loop that can't touch your real systems can only ever pretend to do the work.

Sub-agents

Specialized agents that divide the labor: one explores, one implements, and a separate evaluator grades the output against a strict rubric. Splitting the writer from the checker is what keeps the loop honest.

Why it matters The model that wrote the code is the worst possible judge of whether it's correct.

Memory

An external tracking system — a markdown progress file, a Linear board, anything outside the single conversation that records what is done and what comes next, so the loop can pick up where it left off.

Why it matters LLM context windows clear out; the loop's progress has to live somewhere that survives.

What does a loop done right look like?

The cleanest public example is Andrej Karpathy's autoresearch project, released in March 2026. It is 630 lines of Python that let an AI agent run machine-learning experiments overnight with no human in the seat: read the training script, form a hypothesis, edit the code, run a short five-minute training job, evaluate the result against a single scalar metric, keep the wins, discard the losses, repeat. On Karpathy's own setup it ran roughly 700 experiments over two days on a single GPU and found an 11% training speedup through 20 optimizations he says he had missed in 20 years on that codebase.

Notice why it works. The action space is tiny — the agent may only edit one file, train.py. The verification is exact — a scalar metric on a fixed evaluation, no judgment call required. There is nothing fuzzy for the loop to drift into. That is the template, not the overnight runtime. A more ordinary version of the same shape: read yesterday's CI failures, spin up a worktree, draft a fix, run the tests, and open a pull request only if they pass.

Why is loopmaxxing the new tokenmaxxing?

Here is the anti-pattern, and it is already everywhere. “Tokenmaxxing” was the brute-force habit of throwing massive inference budgets or thousands of samples at a problem to force a good answer. “Loopmaxxing” is its successor: replacing thoughtful software architecture with an open-ended while (true) and assuming the agent will eventually figure it out if it just runs long enough.

It will not. The failure mode the community has nicknamed the “Ralph Wiggum loop” — plan, code, test, lint, repeat, forever — falls apart at the “Done?” decision. If the goal is fuzzy (“refactor this to be better,” “optimize the layout”), there is nothing concrete to verify against, so the agent optimizes a hallucinated metric and drifts indefinitely. One documented session burned through 47,000 tokens with the agent calling the same search tool 73 times in a row, each query slightly different, never deciding to stop.

An agent reviewing its own sub-agents against a fuzzy goal will spin into endless retries, optimizing for metrics it invented. You end up billing for the memory and context a human engineer simply retains.

This is also why the agents-versus-workflows question still matters. A deterministic workflow is often the right answer; a loop is only worth its unpredictability when the task genuinely needs the agent to decide what to do next, and when you can verify the result.

How do you keep a loop from running away?

A loop needs to know when to stop with the same rigor you would put into the goal itself. Production systems converge on the same three guardrails plus an escape hatch:

  1. Max-iteration limits. A hard ceiling on how many times the loop may run before it gives up. The simplest backstop, and the one most often forgotten.
  2. No-progress detection. Exit when repeated iterations produce nothing new — this is what would have killed the 73-identical-calls case. Track whether state actually changed, not just whether the loop ran.
  3. Token and cost budgets. A spend ceiling enforced as a hard stop, not a polite suggestion. When the budget is gone, the loop ends, full stop.
  4. Escalation paths. When the loop is stuck, hand off to a human or a different agent instead of retrying into the void.
while not done:
    state  = observe()
    action = agent.decide(state)
    result = execute(action)

    if verify(result):        # machine-checkable success
        done = True
    elif no_progress(result): # state stopped changing
        escalate(); break
    elif iterations >= MAX_ITERATIONS or spend >= BUDGET:
        escalate(); break     # hard ceilings, not suggestions

The success condition (verify) is the one you should design first and trust most. Tests pass, output matches expected, the build is green, the metric improved. If you cannot write that function, you do not have a loop yet — you have a while (true) with a billing address.

Should you be building loops yet?

For most developers, most of the time, the honest answer is: not for everything, and not yet. Loops shine on work that is repetitive, has a clean verification signal, and benefits from running while you sleep — CI triage, dependency bumps, ML experiment sweeps, large mechanical migrations. They are a poor fit for ambiguous, taste-driven, or one-off work, which is most of what most people do.

And the part the slogans leave out: the verification burden stays yours. As Osmani warns, unattended loops create unattended mistakes, and shipping code faster than you understand it builds a quiet “comprehension debt” that compounds. The danger is not that the loop fails loudly — it is that it succeeds plausibly while you stop reading the output. The capability behind all of this is real: models like Claude Fable 5 can now run autonomously for hours, which is exactly what makes a well-built loop powerful and a badly built one expensive.

Build the loop. But build it like someone who intends to stay the engineer.

Everything that matters in AI,
straight to your inbox.

Join 12,000+ readers — daily, free, no spam.