Motivation
The quality of a grounded answer is bounded by the quality of retrieval. A single-signal vector search is brittle: it misses lexical-exact matches (acronyms, error codes, identifiers), weights a stale blog post like a ratified standard, and has no notion of related decisions. AskMyDocs’ retrieval is a multi-signal pipeline with a trust gradient: it over-retrieves, fuses several signals, applies a canonical boost and a status penalty, then optionally expands along the graph and injects dismissed approaches.Theory & background
The pipeline composes four ideas:- Over-retrieval + rerank. Embedding similarity alone is a coarse filter.
Retrieving a wider candidate set (
candidate_multiplier × limit) and then reranking lets weaker-vector-but-strong-keyword matches survive. - Weighted-sum fusion. Multiple normalised signals (vector, keyword, heading, tag, recency, …) combine linearly into one comparable score.
- Trust gradient. The canonical layer adds a priority boost and a status
penalty, so a ratified
accepteddecision outranks asupersededone and an auto-compiled page sits below human-curated content. - Graph + counter-evidence. Beyond the top-k, a 1-hop graph walk pulls in structurally related docs, and a separate channel injects rejected approaches so the model stops re-proposing dismissed options.
Design
KbSearchService::searchWithContext() is the entry point. Its core,
search(), runs semantic search (and optionally FTS when
KB_HYBRID_SEARCH_ENABLED=true), assembles the candidate set, and hands it to
the Reranker. searchWithContext() then layers graph expansion and
rejected-approach injection on top and returns a typed SearchResult.
The reranker fusion
TheReranker::rerank() score for a chunk is a linear combination of normalised
signals plus the canonical adjustment:
config/kb.php):
| Signal | Weight (default) | Env |
|---|---|---|
| vector | 0.55 | KB_RERANK_VECTOR_WEIGHT |
| keyword | 0.25 | KB_RERANK_KEYWORD_WEIGHT |
| heading | 0.05 | KB_RERANK_HEADING_WEIGHT |
| tag overlap | 0.05 | KB_RERANK_TAG_OVERLAP_WEIGHT |
| preamble | 0.05 | KB_RERANK_PREAMBLE_WEIGHT |
| recency | 0.02 | KB_RERANK_RECENCY_WEIGHT |
| status active | 0.02 | KB_RERANK_STATUS_WEIGHT |
| mention boost | 0.50 | KB_RERANK_MENTION_BOOST_WEIGHT |
kb.canonical.*): priority_weight 0.001 (multiplies
retrieval_priority 0–100), superseded_penalty 0.40, deprecated_penalty
0.40, archived_penalty 0.60, auto_tier_penalty 0.02. Vector scores are
min-max normalised before fusion when KB_RERANK_NORMALIZE_SCORES=true. After
fusion, kb.diversification.max_chunks_per_doc (default 3) caps how many chunks
of a single document can occupy the top-k.
Hybrid fusion mode
When hybrid search is enabled, semantic and FTS result lists are merged via Reciprocal Rank Fusion (score = weight / (rrf_k + rank), KB_RRF_K=60,
KB_HYBRID_SEMANTIC_WEIGHT=0.70, KB_HYBRID_FTS_WEIGHT=0.30) before reranking.
The meta.search_strategy.fusion_method field records which path ran
(rerank_weighted_sum | rrf | semantic_only).
Graph expansion & rejected injection
GraphExpander::expand() walks a fixed 1 hop of kb_edges from the
canonical seed docs in the primary set, restricted to the edge-type allow-list
(KB_GRAPH_EXPANSION_EDGE_TYPES) and capped at KB_GRAPH_EXPANSION_MAX_NODES=20.
(The KB_GRAPH_EXPANSION_HOPS config key exists but the expander is currently
1-hop regardless of its value.) Expanded chunks carry
metadata.origin='graph_expansion'. RejectedApproachInjector::pick()
vector-correlates the query against rejected-approach canonical docs (summary
chunks only) and returns up to KB_REJECTED_INJECTION_MAX_DOCS=3 above
KB_REJECTED_MIN_SIMILARITY=0.40. Both degrade to empty for a tenant with no
canonical docs.
Data model / contract
SearchResult (app/Services/Kb/Retrieval/SearchResult.php):
meta carries the top-level counts (primary_count / expanded_count /
rejected_count / runner_up_count) and retrieval_ms, a nested
meta.search_strategy object (semantic_enabled, fts_enabled,
fusion_method, graph_expansion_enabled, rejected_injection_enabled,
filters_applied), and a nested meta.retrieval_stats object
(candidates_pre_threshold, candidates_post_threshold, min_score_used,
max_score_used). The prompt is composed from these typed blocks — a
⚠ REJECTED APPROACHES section, a 📎 RELATED CONTEXT section, and the primary
## Context.
Decision rationale (ADR-style)
- Why over-retrieve then rerank, not just top-k by cosine? Pure cosine
drops lexical-exact matches whose embeddings sit slightly further out. A 3×
candidate window costs one wider SQL read and lets keyword and heading signals
rescue them. Rejected as alternative: raising
default_min_similarity(loses recall uniformly). - Why a linear weighted sum, not a learned reranker? Transparency and
tunability. Every weight is a config knob an operator can reason about and the
kb:benchmarkharness validates; a learned cross-encoder is a future option but would forfeit the auditable trust gradient. - Why bake canonical status into the score, not filter on it? Filtering would hide superseded decisions entirely; penalising keeps them retrievable (for “why did we move off X?”) while ensuring the current decision ranks first. See ADR 0001 (canonical layer) and grounding & evidence tiers.
Worked example
Gotchas & operations
- Weights are tuned together. Raising
vector_weightwithout lowering the rest skews the whole gradient — re-runkb:benchmarkafter changes. - Penalties dominate boosts by design. A
0.40status penalty outweighs the≤0.10canonical priority boost, so a superseded doc never outranks an active peer on priority alone. - No canonical docs → identical to plain hybrid RAG. The boost, graph expansion, and rejected injection all no-op.
Canonical graph
The nodes/edges graph expansion walks.
Grounding & evidence tiers
How the trust gradient maps to evidence tiers.