Why answer engines need to cite — and how
There is a small but important argument that we keep having in the answer-engine space: are citations a UX feature or a system invariant? People who treat them as a UX feature ship them as tooltips, small superscripts that fade in on hover, optional footnotes. People who treat them as a system invariant make the citation part of the data path — every chunk that enters the prompt carries a stable source identifier, and every assertion in the response must be tied to one.
We are firmly in the second camp. This post is about why, and what the plumbing looks like.
What an answer without a citation actually is
A paragraph of text from a model, with no source, is a confident summary of something the model thinks it remembers. It might be correct. It might be a clean paraphrase of three sources you fed it. It might also be a confident interpolation between the training distribution and the prompt — which is to say, a hallucination wearing a paragraph as a costume.
You cannot tell the difference by reading the paragraph. The whole point of the modern LLM output is that it is fluent regardless of whether it is grounded. Fluency was the constraint, ten years ago. Fluency is not the constraint any more. Groundedness is the constraint, and groundedness is not visible at the surface.
Citations are the visible part of groundedness. A citation says: this clause came from this URL, byte range so-and-so, retrieved at this time. A click should land the reader on the source paragraph, ideally highlighted. If the citation cannot be verified — if it leads to a 404, or to a paragraph that does not actually support the claim — then the system has lied, and a human can catch it.
This is the only mechanism by which an answer engine can be safely deployed inside an organisation that cares about being right. Not auditing later. Not eval suites. The citation, rendered next to the claim, audited by the person reading.
What “citation as invariant” means in code
Treating citations as a system invariant has three consequences for the code.
One. Every chunk that enters the prompt has to carry a stable identifier and a source URL, all the way through the pipeline. There is no point at which a chunk is “the text” without its provenance. In Polymathy, this is enforced by the response shape itself: the handler returns a HashMap<u64, (String, String)> where the key is the chunk ID, the first string is the URL, and the second is the chunk text. You cannot get the text without getting the URL. The seam is built so you cannot lose the provenance even by accident.
Two. The prompt template that turns chunks into a paragraph has to mark each chunk with its identifier inside the prompt, and instruct the model to cite using those identifiers. The format does not matter — [1], <src id="chunk_7"/>, whatever — as long as it is parseable on the way out. The model’s output is then a sequence of sentences interleaved with citation markers; the renderer maps each marker back to the URL through the chunk ID. This is mundane work but it has to be there.
Three. The renderer has to refuse to render a claim with no citation. That sounds extreme; it isn’t. If the model produced a sentence without a citation marker, either you missed one when stuffing chunks into the prompt, or the model hallucinated. Both cases warrant a visible warning, not a silent fluent sentence. The strict version of this is to drop uncited sentences entirely; the gentler version is to render them with a visible “no source” badge so the reader knows what they are looking at.
The shape that makes this easy
Polymathy’s /v1/search endpoint exists because we noticed this pattern repeating across three or four projects. Every team building an answer experience ended up writing the same thing: take query, hit metasearch, take URLs, fetch each one in parallel, chunk the bytes, assign chunk IDs, pair with source URLs, hand off downstream. Every team also ended up with subtle bugs in the chunk-to-URL mapping, because the mapping was an afterthought added at the last layer instead of baked in at the first.
So we baked it in at the first. The response from Polymathy is a map of chunk_id to [url, text]. The chunk ID is a u64, sequential, with no semantic meaning. The URL is the canonical source URL as it appeared in the SearxNG response — not the URL after redirects, not the URL with tracking parameters stripped, but the URL the retrieval layer thought it was. The text is the chunk as produced by the content processor.
Downstream, you stitch the chunk text into your prompt template along with the chunk ID. The model emits paragraphs with citation markers referencing those chunk IDs. Your renderer maps each marker back to the URL via the chunk map you got from Polymathy. The mapping is never lost because it never has to be reconstructed.
This is not a clever idea. It is the unclever idea, written down.
What citation gets you that ranking does not
Ranking — the BM25, the dense vectors, the rerankers — answers the question “which of these documents is most relevant to the query?”. Citation answers a different question: “which sentence in the answer paragraph is supported by which source?”. These are not the same question, and conflating them is how teams ship answer engines that look great in demos and are useless in production.
A perfectly ranked retrieval can still produce an ungrounded answer if the synthesis layer is allowed to interpolate. Conversely, a mediocre retrieval can still produce a trustworthy answer if every claim is bound to a real source — the answer might be incomplete, but it will not be wrong, and the user can tell when it is incomplete because they can see which sources were actually used.
So when we say Polymathy is a piece of answer-engine infrastructure, we mean it specifically as a piece of the citation pipeline. It does not produce the citation marker. It does not render the citation. It does carry the URL all the way from SearxNG to the chunk map, so that when the layer above it does produce a citation, the URL is right there, unambiguous, never reconstructed.
What we are explicitly not doing
We are not implementing a citation format in Polymathy. The choice between [1], footnotes, inline anchors, or a sidebar of sources is downstream — it belongs to the team building the answer UI, and they will know more about their users than we will. We are not implementing a verification step that checks whether the cited URL actually supports the claim — that is research-grade work and there are people doing it better. We are not implementing a freshness signal that says “this source was retrieved 14 seconds ago” — again, downstream.
What we are doing is making sure the URL never gets lost between the retrieval response and the chunk handed to your prompt template. That is one small invariant, enforced by the response shape. Everything else about citation is the next team’s problem, and the cleaner that seam is, the easier that team’s life becomes.
A small request
If you are building an internal answer engine and you find yourself rendering paragraphs without citations because “the model is usually right,” please rethink. The model is sometimes right. The citation is what tells your users when. Build the plumbing for citations first, before anything else, even before the model is good. Then the model can get better around the citation invariant instead of in spite of it.
That is the whole reason Polymathy ships the way it does.
Filed under Notes. Source on GitHub; docs at docs.skelfresearch.com/polymathy.