Motivation
The whole point of AskMyDocs is a grounded answer: every claim traceable to retrieved, cited context — or an honest refusal. A chat turn is therefore a fixed pipeline, not a single vector lookup.The chat turn
Hybrid retrieval
Retrieval never relies on vector similarity alone:- Vector search over
pgvectorand keyword search over a Postgres FTS GIN index run in parallel, each over-retrieving ~3× the finalk. - The
Rerankerfuses them — shipped defaults0.55·vector + 0.25·keyword + 0.05·heading(configurable viakb.reranking.*) — then applies a canonical boost and a status penalty, so human-acceptedcanonical docs outrankautooutrank raw (the anti-hallucination firewall).
Graph expansion + anti-repetition
After reranking, two config-gated steps fold in institutional memory:GraphExpanderwalks 1 hop ofkb_edgesfrom the canonical seeds and adds the neighbours under a 📎 RELATED CONTEXT block.RejectedApproachInjectorsurfaces dismissed options under a ⚠ REJECTED APPROACHES block so the model stops re-proposing them.
The typed prompt + citations
The prompt is composed fromresources/views/prompts/kb_rag.blade.php with typed
blocks (⚠ rejected, 📎 related, primary ## Context). The response carries:
answer— the grounded text;citations— the exact chunks that grounded it;meta— provider, model, latency, retrieved-chunk count, filters echoed back.
Filters
The chat request accepts afilters object (project keys, tags, source types,
date windows, evidence tiers, explicit doc_ids). Legacy callers using the bare
{question, project_key} payload keep working — project_key is wrapped into
filters.project_keys internally.
The refusal contract
When retrieval surfaces nothing relevant above threshold, the controller returns a deterministic refusal — a typedrefusal_reason (e.g. no_relevant_context),
not a fabricated answer and not an HTTP error. The machine-readable reason never
localizes; only the human-visible body does. Every refusal also increments a
content-gap rollup so editors know what to write next. The refusal path also
short-circuits the expensive LLM call — proven by tests that assert the
provider shouldNotReceive('chat').
Streaming UI
The React chat at/app/chat streams over SSE on the Vercel AI SDK v6
UIMessageChunk wire format, with stop / regenerate / branch / inline-edit /
token-cost meter / suggested-follow-ups, and inline citations. The stateless JSON
API (POST /api/kb/chat) is the headless equivalent.
Gotchas & operations
- Logging never breaks the user path —
ChatLogManager::log()is wrapped in try/catch; never hoist logging into the hot path. - A refusal is not an error — map it to a 200 with the typed reason, never a 4xx/5xx, and never an empty answer.
- New retrieval services must honour the reranker’s canonical boost + status penalty (or add an ADR explaining the deviation).
Retrieval pipeline (architecture)
The reranker fusion weights and request lifecycle in depth.
Grounding & evidence tiers
Grounded-or-refuse + the evidence-strength axis.