← Back to lessons|engineering
Passkeys not supported in this browser
Guard Design Patterns for Heuristic Answer Selection in QA Pipelines
Seven guard patterns for heuristic answer selection: subset guards, similarity threshold, reasoning dump rejection, non-ASCII filter, place-type guard, function word guard, and 1-token expansion guard. Each discovered through specific regressions.
lesson_learnedguard-patternsheuristicsanswer-selectionqa-pipelinefolregression-preventionhotpotqa
Created 2/20/2026, 5:49:40 AM
Content
Developed 7 guard patterns for selective FOL answer selection and completion that together achieved EM +14%, F1 +5.1% on HotpotQA. Each guard was discovered through specific regressions: (1) Subset guards (Rule 1a/1b): If FOL tokens are strict subset of baseline, prefer baseline — FOL drops qualifiers like 'New Hampshire', 'US$', 'and English folk-song'. If baseline is strict subset of FOL, prefer baseline — FOL adds noise like 'graffiti artists'. (2) Similarity threshold (0.70): Above threshold, FOL is a formatting improvement. Below, answers genuinely disagree. (3) Reasoning dump guard: Reject FOL answers >100 chars or starting with 'From the triples...' — catches LLM reasoning leaking into answers. (4) Non-ASCII guard: Reject FOL with foreign alphabetic characters baseline doesn't have — catches Wikipedia foreign language content. (5) Place-type guard: Don't expand answers ending with place words (mansion, castle, stadium) — prevents 'Gracie Mansion' → 'Archibald Gracie Mansion'. (6) Function word guard: In FOL-guided contraction, block if ALL dropped words are function words (and, or, but) — these are structurally important in lists. (7) 1-token guard: For single-word answers, extra expansion must be a title/honorific or single letter initial. Key implementation: src/fol/fol_qa_prompt.py selective_fol_answer() and complete_answer_with_triples().