All guides
Patterns·12 min read

Retrieval Patterns: How to Make AI Cite Your Sources, Not Invent Them

Why 'ground it in these documents' actually works, what good source-anchoring looks like in the output, and the retrieval patterns that separate a cited answer from a confident hallucination.

TB

Todd Burner

Founder, Hone Labs

An analyst at a research-heavy firm pasted a strategy framework into a chatbot and asked it to find the academic paper the framework came from. The model returned a clean, confident citation: author, journal, year, page range, a one-line summary of the methodology. It looked exactly like the citations she copies out of databases every week. She went to pull the PDF. The paper did not exist. Not the page range, not the volume, not the authors in that combination. The model had not found a source. It had written one.

This is the failure mode that does the most quiet damage in professional knowledge work, because it does not look like a failure. A hallucinated citation has the same surface texture as a real one — the same shape, the same confidence, the same formatting. The model is not malfunctioning when it does this. It is doing the only thing it knows how to do: generating the most plausible continuation of the text in front of it. When the text in front of it is “cite the source for this framework” and no source is actually present, the most plausible continuation is a citation that sounds right.

The instinct, when this happens, is to blame the model — to conclude it is “not smart enough yet” and wait for a better one. That instinct is wrong, and it is expensive. The fabricated citation is not primarily a model problem. It is a retrieval problem. The model wrote a source because there was no source in front of it to quote. Put the right source in front of it and the same model stops inventing and starts citing. The thesis of this piece is a single, load-bearing reframe: hallucination is a retrieval problem before it is a model problem — and retrieval is something you can control.

Why grounding works

Start with the mechanism, because everything downstream follows from it. A large language model does not retrieve answers from a database of facts. It generates text one token at a time, choosing what is statistically likely to come next given everything in its context window — the conversation, the instructions, and whatever source material you have supplied. It has a finely tuned sense of plausible and implausible. It has no native sense of true and false.

That single fact explains both the hallucinated citation and its cure. When the model must produce a specific claim — a statistic, a quote, a source, a date — it will reach for whatever makes the surrounding text coherent. If the real answer is sitting in its context window, the most coherent continuation is the real answer. If the real answer is absent, the most coherent continuation is a fabrication that fits the pattern. The model fills the gap either way. Your only lever is whether the gap exists.

Grounding closes the gap. The umbrella technique is retrieval-augmented generation: before the model answers, a retrieval step searches a corpus you control, pulls the passages most relevant to the question, and places them in the context window. Now the model is not composing from the diffuse statistical average of its training data. It is writing from specific text it can see. The effect is not subtle. The same model, on the same task, swings from confidently wrong to reliably anchored depending entirely on whether you fed it the truth.

This is also why “just ground it in these documents” actually works as an instruction — when there is a real retrieval layer behind it. It is not a magic phrase. It is a description of a mechanism. You are not asking the model to try harder. You are changing what it is writing from.

In Hone Studio

This is the platform's native ground. Enable Knowledge Base mode and the Assistant searches your uploaded documents to ground every answer in your frameworks, proposals, and approved materials — with inline citations and a sources panel showing exactly which entries were retrieved. The Knowledge Base searches by meaning, not just keywords, so the retrieval step finds the right passage even when your question doesn't use the document's exact wording. The model still writes the answer. What changes is that it writes it from your sources instead of from the statistical average of the internet.

Anatomy of a grounded answer

It helps to know what you are looking for, because grounding is visible in the output if you know its shape. A genuinely grounded answer has three structural parts, in order, for every load-bearing claim: a claim, a source marker that points somewhere, and — crucially — a retrievable passage that the marker actually leads to. An ungrounded answer has the first part, sometimes the second, and almost never the third. The fabricated citation in the opening scene had a claim and a marker. It had nothing the marker pointed to.

The difference between the two is not cosmetic. It changes the entire economics of using the output. Lay them side by side across the dimensions that matter for high-stakes work.

Dimension
Ungrounded answer
Grounded answer
Accuracy
Plausible by construction; true only by coincidence. Fills gaps with confident invention.
Anchored to specific passages it can quote. Gaps surface as “not in the source” rather than fabrication.
Verifiability
You must re-derive every claim from scratch — the cost of checking approaches the cost of doing it by hand.
Each claim links to the passage behind it. Checking collapses from “research it again” to “open the citation.”
Trust
Earned only after exhaustive review; one fabricated detail poisons confidence in the whole.
Earned at the passage level — you trust the claims you've traced and flag the ones you haven't.
Failure shape
Silent. Looks identical to a correct answer until someone checks.
Loud. A broken or thin citation is visible before the answer ships.

The row that earns its keep is the last one. Ungrounded errors are silent — they are indistinguishable from correct output until a human does the work of checking. Grounded errors are loud — a citation that points at a thin or off-topic passage is a flag you can see before the deliverable goes out the door. Grounding does not make the model infallible. It makes the model's failures visible, which for high-stakes work is most of the battle. A 2025 randomized controlled trial from METR found that experienced practitioners were measurably slower with AI even while believing they were faster — and the verification burden was a large part of where the time went. Visible failures are cheaper to verify than silent ones. That is the whole argument for grounding, stated as time.

Five retrieval patterns

Knowing that grounding works is not the same as knowing how to invoke it. The patterns below are the practical moves that separate a cited answer from a confident hallucination. They compound — each one tightens the loop the previous one opened — and the first three work even in a plain chat window with documents pasted in, while the last two reward a real retrieval layer.

Pattern 01

Name the corpus

Tell the model explicitly which body of material it is answering from. “Using only the three proposals I've uploaded” is a different instruction than a bare question. Naming the corpus does two things: it scopes the retrieval, and it gives the model permission to say “that isn't in these documents” instead of reaching outside them to fill the gap.

Pattern 02

Force source-anchoring

Require a source marker on every load-bearing claim, not as a footnote afterthought but as a condition of the answer. “For each recommendation, cite the specific document and section it draws on.” A claim that cannot be anchored is a claim the model is composing rather than retrieving — and now you can see which is which.

Pattern 03

Ask for quote-then-claim

Invert the usual order. Have the model quote the relevant passage first, then state the claim it supports. “Quote the sentence, then tell me what it means for our position.” This is the single highest-leverage pattern, because it forces the retrieval to happen before the generation. A model that has to paste the actual text before interpreting it cannot fabricate the text without the fabrication being obvious.

Pattern 04

Constrain to retrieved-only

Close the escape hatch. “Answer only from the retrieved material. Do not supplement with general knowledge.” Without this, a model that finds partial support in your corpus will cheerfully top it off with plausible-sounding background — and the seam between what was retrieved and what was invented is invisible in the final text. Anthropic's prompting and retrieval guidance makes the same point: explicit instructions to stay within provided context materially reduce confabulation.

Pattern 05

Flag the gaps when the corpus is silent

Make absence a first-class output. “If the documents don't cover something I asked about, say so explicitly rather than inferring.” The most dangerous answers are the ones where the corpus was silent and the model spoke anyway. An honest “the uploaded materials don't address Q3 pricing” is worth more than a confident paragraph invented to cover the hole.

Notice what these five patterns have in common. None of them is about clever phrasing or magic words. Every one of them is about controlling what the model writes from and forcing the retrieval step into the open where you can inspect it. That is the difference between prompt tricks, which have reached diminishing returns, and retrieval discipline, which has not. McKinsey's State of AI research finds that the organizations actually capturing value are the ones redesigning how work meets the model — not the ones hunting for a better incantation. Retrieval discipline is that redesign, at the level of a single answer.

What citations can and can't promise

Here is where careful practitioners and careless ones part ways. A citation is a powerful thing, but it is precise about what it guarantees, and the precision matters. A citation tells you where to look. It does not tell you the claim is correct, the passage is being read in context, or the inference drawn from it is sound. Citations are assistive: they locate and support. They are not authoritative, and treating them as a stamp of correctness reintroduces exactly the silent-failure problem grounding was supposed to fix.

The assistive line

A citation that points to a real passage can still support a wrong claim — if the passage is misread, taken out of context, or stretched past what it actually says. Grounding collapses the cost of verification; it does not remove the need for it. The discipline is to treat every citation as an invitation to check, not a substitute for checking. Open the passage. Confirm it says what the answer claims it says. The win is that this now takes seconds instead of an hour — not that you get to skip it.

This is not a hedge; it is the correct mental model, and it is what keeps grounding honest. The goal of retrieval is not to remove the human from the loop. It is to move the human from re-deriving every claim to spot-checking the ones that matter — to turn verification from an act of reconstruction into an act of confirmation. Every AI output is still a draft until a person with judgment signs off on it. What changes is how fast that person can do their job, and how much of their attention is left for the claims that actually carry weight.

In Hone Studio

In Knowledge Base mode the Assistant attaches inline citations and a sources panel to its answers — source markers link each claim back to the specific document passage it drew on, and the panel shows which Knowledge Base entries were retrieved as context. Those citations are deliberately framed as assistive: they trace claims back to your source material so a person can verify fast, not as a guarantee of correctness. The posture is human-in-the-loop by design — every output is a draft for review. The citation is the fast path to the source. The judgment stays yours.

When you outgrow copy-paste

The first three patterns work in any chat window: name the corpus, force anchoring, ask for quote-then-claim, and paste your documents into the prompt. For a single short document and a one-off question, that is genuinely enough, and you should not over-engineer it. But the copy-paste approach has hard limits, and you hit them faster than you expect.

The first limit is volume. You cannot paste a hundred-page proposal archive, three years of board reports, and a style guide into a chat window — there is not room, and even where there is, the model's attention thins across very long inputs. The second limit is selection: pasting requires you to already know which document holds the answer. The entire value of retrieval is finding the right passage when you don't know where it lives. The third limit is repetition. Every new conversation starts from an empty window. You re-assemble the same context — the same documents, the same voice notes, the same constraints — every single time, and that re-assembly is a tax you pay on every task forever.

A real retrieval layer dissolves all three. It indexes the whole corpus, searches it by meaning to surface the right passages for this question, and stands as persistent infrastructure rather than something you rebuild per session. Patterns four and five — constrain-to-retrieved and flag-the-gaps — only become fully reliable when there is a genuine retrieval step to constrain and a known corpus whose silence is meaningful. That is the line where pasting documents into a chat stops being a workaround and starts being a bottleneck.

Stage 3 · Confident hallucination

No corpus in front of the model. It fills every gap with plausible invention. Failures are silent.

Stage 2 · Paste-and-pray

You paste the document you already guessed was right. Works for one short source; breaks on volume and selection.

Stage 1 · Real retrieval

The corpus is indexed. The right passage finds the question. Citations are attached.

Each layer in suppresses a failure mode the layer out couldn't.

The durable skill

Two years from now the models will be better at saying “I don't know.” The hallucination rates will keep falling. Some of the patterns above will feel less necessary on routine tasks because the defaults will have improved. It would be a mistake to read this piece as a fixed set of prompt recipes that the next model generation will retire.

The durable skill is not the five patterns. It is the reflex underneath them: treating sources as inputs to the work, not as afterthoughts to be retrofitted onto a finished answer. The careless practitioner generates first and looks for citations second — and gets a fabricated paper that doesn't exist. The disciplined practitioner decides what the model is allowed to write from before it writes a word, and the citation falls out of the process because the source was there from the start.

The opening scene had the order backwards. The analyst asked a model to find the source for an answer it had already composed in its head. The model obliged, the way it always will, by composing a source to match. Reverse the order — source first, claim second — and the same model that invents becomes a model that cites. The fabricated citation was never a sign the technology wasn't ready. It was a sign the source wasn't in the room. Put it in the room. That is the whole craft.

Want to see this in your firm's context?

Book a 30-minute demo and we'll walk you through Hone Studio using your organization's actual work product.