AI Retrieval

Direct Corpus Interaction: Why Your AI Agents Need a Terminal, Not Just a Vector Database

Direct Corpus Interaction throws away the vector database and hands the agent a terminal — grep, find, sed over the raw documents. No embeddings, no index, no top-k cutoff. Here is how it works, why Claude Code already proved it, and why the honest answer is hybrid.

Jithin Kumar PalepuJune 1, 202614 min read

For years, the answer to “how do I make an LLM use my documents?” has been the same: chunk them, embed them, stuff them in a vector database, and retrieve the top-k most similar pieces. It works. It also quietly throws away anything that didn't score high enough on a fuzzy similarity metric — and you can never go back for it.

Direct Corpus Interaction (DCI) proposes something almost rude in its simplicity: delete the vector database. Give the agent a terminal instead. Let it search the raw documents the way a developer does — grep for an exact phrase, find the right file, read it, refine, search again. No embeddings. No offline index. No chunking-strategy tuning. It runs on any corpus the moment you point it at one.

If that sounds familiar, it should: it is exactly how Claude Code searches your codebase. DCI generalises that observation from code to any document corpus — and the implications for how we build retrieval systems are bigger than they first appear. This is a direct sequel to our argument in harness engineering: retrieval is just another part of the harness, and the best harness might not need a vector database at all.

The one-sentence version

DCI throws away the vector database. Instead of pre-embedding your corpus and retrieving top-k chunks by similarity, you give an agent a terminal and let it search the raw documents directly with tools like ripgrep, find, and sed — iterating, refining, and reading files the way a developer does.

The core contrast: passive top-k vs. active search

The cleanest way to understand DCI is to put it next to the pipeline almost everyone runs today.

Traditional vector RAG

chunk docs -> embed -> store in vector DB -> embed query
-> similarity search -> top-k chunks -> stuff into prompt -> generate

Retrieval here is passive and one-shot. Every piece of evidence must pass through an embedding-similarity score before the model ever reasons about it. Anything weakly matched but actually relevant gets cut at the top-k boundary — and the model has no way to go back for it. The query you embedded is the only question that ever gets asked.

Direct Corpus Interaction

agent gets a query -> runs `rg "exact phrase"` / `find` on raw files
-> reads what came back -> reformulates -> searches again
-> cross-checks across files -> answers with evidence

Retrieval here is active and iterative. The model decides how to search, sees real document content, and adapts. It can phrase-match, narrow, expand the surrounding context, and cross-reference across files — refining its hypothesis as it goes. No embeddings, no vector DB, no offline index build.

Vector RAG asks one fuzzy question and keeps the top few answers. DCI lets the agent keep asking sharper questions until it actually has the evidence.

Why are people excited about it?

The strongest real-world evidence isn't the paper — it's Claude Code. Anthropic prototyped vector RAG early for code search, then switched to plain glob / grep / read, and it worked better. DCI is the argument that this wasn't a quirk of code; it generalises to any corpus. Three reasons it holds up:

Exact-match matters. Vector similarity is fuzzy; grep "Section 12(b)" is not. Legal, financial, and code corpora are full of precise tokens — clause numbers, error codes, function names — that embeddings blur together.
Composable, multi-step search beats a fixed top-k. The agent can phrase-match, then narrow, then expand context, then cross-reference — building up evidence across turns instead of betting everything on one similarity query.
Zero index maintenance. No re-embedding when documents change, no stale index, no chunk-size tuning. Point it at the files and go.

What's inside DCI-Agent-Lite

The reference implementation, DCI-Agent-Lite, is a lightweight agent built on a small coding-agent framework (“Pi”). Three parts matter:

Bash search tools as the retrieval primitive

Instead of an embedding model and a vector store, the agent's retrieval surface is the terminal. It composes searches, reads the hits, and decides what to do next.

Why it matters — The whole retrieval layer is ripgrep, find, and sed — the same primitives a developer reaches for.

A context-management layer

Because the agent reads raw documents across many turns, it burns context quickly. DCI-Agent-Lite offers three strategies: truncation (cap each tool output), compaction (replace old results with placeholders), and summarisation (condense old history). This is exactly the context engineering problem we covered in context engineering.

Why it matters — This is the real engineering challenge — reading raw files over many turns blows through the context window fast.

A provider-agnostic LLM backend

DCI is a harness pattern, not a model feature. Any capable model that can call tools and reason over results can drive it.

Why it matters — It runs against OpenAI, Anthropic, or a local vLLM server — the technique isn't tied to one model.

An honest caveat on the numbers

The DCI paper reports beating dense-retrieval RAG baselines on multi-hop QA benchmarks — HotpotQA, 2WikiMultiHopQA, Natural Questions, and TREC-COVID. I am deliberately not quoting exact score deltas, model lists, or latency figures here, because I could not cleanly verify them from the source tables, and I would rather link you to the primary source than repeat a number I can't stand behind. Read the paper for the figures.

The tradeoff, though, is obvious and worth stating plainly: iterative searching means more LLM calls, which means higher latency and token cost than a single-pass retrieval. You are buying better, adaptive retrieval with compute. There is also a closely related concurrent paper worth reading — Interact-RAG, “Reason and Interact with the Corpus, Beyond Black-Box Retrieval.”

DCI vs. vector RAG: which, when?

DCI is not a drop-in replacement for vector RAG. They are good at different things — it is a different axis, not a strict upgrade.

Vector RAG is best at

Semantic, conceptual “what's similar” queries
Huge corpora — scales to millions of documents
Low latency: one-shot retrieval
Predictable, cheap per query

DCI is best at

Exact terms, clauses, codes, identifiers
Multi-hop, cross-document reasoning
Evolving documents (no re-indexing)
Corpora you want to query immediately

The pragmatic take: go hybrid

The place most teams land is not “DCI instead of RAG” — it is both. Keep vector search as a fast first-pass recall layer, and give the agent terminal tools to drill into the raw documents when similarity isn't enough: exact clauses, tables, cross-references, the things embeddings blur. Vector search gets you to the neighbourhood; DCI reads the actual street signs.

This is the same lesson as agents vs. workflows and the architecture wars: the answer is rarely the pure form of either extreme. Use the cheap, deterministic path for what it's good at, and escalate to the agentic path when the cheap path falls short.

What this means if you run a RAG pipeline today

If you already have a classic retrieve-then-generate stack — a document processor, page-level parsing, chunking, embeddings, generation — you don't rip it out. You add DCI as a fallback retriever: when the vector layer returns weak or ambiguous hits, hand the agent terminal tools to go read the source directly. Three practical moves:

Keep vector search as first-pass recall. It's fast and it scales — let it narrow millions of docs to a working set.
Add a sandboxed terminal over that working set. Give the agent rg/find/read scoped to the documents the user is allowed to see — governance first.
Budget the loop. Cap iterations and tool-output size, and apply truncation/compaction/summarisation so the agent doesn't blow the context window or the bill.

The bottom line

Vector RAG isn't dead — but the assumption that retrieval must mean “embed everything and fetch top-k” is. The most capable retrieval system we have evidence for — Claude Code — searches like a developer, not like a similarity index. Direct Corpus Interaction takes that seriously and gives the agent the one tool we somehow forgot to hand it: a terminal.

Vector search gets you to the neighbourhood. A terminal lets the agent read the actual street signs.

Sources

Keep Reading