← Notes

From 10 blue links to one cited paragraph: the shift in internal search

For about a decade, the shape of internal search inside companies was stable. You typed a query into a box on an intranet portal, and you got back ten blue links to a docs site, a Confluence space, a few Slack threads, and the occasional ticket. The ranking was probably tuned by someone who once read a Lucene whitepaper. The interaction model came straight from 2004 Google.

That model is now visibly breaking. Not because the underlying retrieval stopped working — BM25 is still doing exactly what it always did — but because the people on the other side of the search box have changed their expectations. They have spent two years pasting questions into ChatGPT and getting a paragraph back. When they then turn to the company wiki and get ten blue links, the experience feels worse than it did in 2019. The links did not get worse. The baseline got better.

This piece is about the shift from ten blue links to one cited paragraph, what it actually requires under the hood, and why we ended up writing Polymathy as a tiny Rust service whose only job is to sit between a search engine and a content processor.

The old contract

The old internal-search contract was: you ask, I rank, you click, you read, you decide. The search engine’s job ended at the ranking. Everything after — opening tabs, scanning paragraphs, reconciling contradictions between three docs pages — was the user’s problem. That was fine when the corpus was small and the user was an expert.

It worked because the user did the integration step in their head. They knew which doc was authoritative, which Slack thread was a stale opinion, which ticket was actually closed. The search engine never had to know any of that. It just had to rank.

The ranking was tractable because relevance signals were stable: term frequency, click-through, recency, source weight. You could tune a config file and ship a measurable improvement. There was a whole job description — search relevance engineer — built on this premise.

What changed

What changed is not that ranking stopped working. What changed is that the cost of synthesising the answer from ranked sources collapsed. An LLM with a few thousand tokens of context can read three docs pages, notice they disagree about the default port, surface the disagreement, and link out to each source. The synthesis step that used to take the user three minutes now takes the system three seconds.

Once that synthesis is cheap, the ten-link UI starts to feel like making users do unpaid labour. Why am I clicking through five tabs to find out our default port is 8080? Why isn’t the system just telling me, with the three sources it pulled it from?

This is the answer-engine shift. It is not really a UI redesign. It is a shift in where the integration work happens — from the user’s head to the server. The retrieval layer (rank these documents) is still there. The answer layer (read them, produce a paragraph, cite your sources) sits on top.

What an answer engine actually needs

If you peel back the marketing, an answer engine for an internal corpus needs four things:

  1. A way to get a small set of relevant documents for a query. This is classical retrieval. BM25, dense vectors, hybrid, whatever. The point is: given a query, hand me ten candidate sources.
  2. A way to extract usable text from those sources. Strip the nav. Strip the footer. Get the body. Chunk it sensibly. If you skip this step, you end up stuffing 40k tokens of HTML chrome into a prompt and wondering why the model hallucinates.
  3. A way to embed the chunks so you can do a second, finer-grained ranking inside the candidate set. Cosine similarity against the query embedding, or something more elaborate.
  4. A way to produce a sourced paragraph from the surviving chunks. This is the LLM call. It is also the place where the citation contract gets enforced: every claim should map to a chunk, every chunk should map to a URL.

Most teams trying to build this end up writing the same middleware between steps 1 and 3: a thing that takes the URLs out of the retrieval response, fetches each one, hands the bytes to a chunker, and assembles a payload for the LLM to chew on. That middleware is annoying to write and very annoying to write well: parallel fetches, timeouts per source, partial-failure handling, content-type sniffing, the eternal question of how big a chunk should be.

That middleware is what Polymathy is. It is not the retrieval layer. It is not the LLM. It is the fetch-and-chunk shim in the middle, exposed as a single HTTP endpoint, with an OpenAPI spec, written in Rust because the rest of our stack is in Rust and we are tired of debugging GIL-flavoured slowdowns under fan-out load.

The piece that quietly matters: citations

The thing that moves an internal-search rollout from “cute demo” to “people actually use this for compliance work” is citations. Not as a footer, not as a tooltip, but as the load-bearing part of the answer. Every assertion in the paragraph has to be checkable, with one click, against the source it came from.

This is the part where the cited-paragraph model differs most sharply from the consumer chatbot UX. In a consumer chatbot, citations are decorative. In internal search, citations are the product. They are what lets a security engineer trust the answer about MFA requirements; they are what lets a legal team forward the result to a regulator; they are what lets a new hire know whether the docs page they are reading is the current authority or a 2021 fossil.

The plumbing for this is unglamorous. You need to keep, for every chunk you put into the prompt, a stable reference back to the source URL and ideally the byte range. You need to make the LLM emit citations inline. You need to render them in a way that does not lie — if the model paraphrased two sources into one sentence, both citations need to be there.

Polymathy’s contract makes this easier than it might be: the response from /v1/search is a map of chunk_id to [source_url, content]. The chunk ID is the thing you stitch into the prompt; the URL is the thing you render under the citation marker. Nothing more, nothing less. Whatever LLM call you make downstream has access to both, by construction.

The next layer up

The shift is not finished. Going from ten blue links to one cited paragraph is the first move. The next move — the one we are watching for in internal search — is the move from one paragraph to a small, structured answer object. A spec sheet. A decision tree. A draft policy doc with the supporting passages in the margin.

That shift will need different middleware again. Not just fetch-and-chunk, but fetch-and-chunk-and-classify, where the classifier knows the structure of the answer it is supposed to fill. Polymathy in its current form will not be that thing. But the seam it carves out — the boundary between “here are some URLs” and “here are some chunks” — is one of the few seams that survives every refactor of the stack above it.

That is why it gets to be its own small service.


Filed under Notes. Source on GitHub; docs at docs.skelfresearch.com/polymathy.