How would you mitigate hallucinations in an LLM response?

Question

Accepted Answer

## Mitigating Hallucinations in LLM Responses Hallucinations occur when an LLM generates plausible-sounding but factually incorrect content. They are a fundamental property of current LLMs — you cannot eliminate them, but you can reduce and detect them. ### Root Causes | Cause | Description | |-------|-------------| | **Training distribution gaps** | Model generates from patterns, not facts | | **High temperature** | More randomness = more hallucinations | | **Long context confusion** | Model loses track in very long prompts | | **Out-of-domain queries** | Model guesses when it doesn't know | | **Ambiguous prompts** | Model fills ambiguity with invented detail | ### Mitigation Strategies **1. Grounding with RAG (Most Effective)** ```python GROUNDED_PROMPT = '''Answer the question using ONLY the information in the context below. If the answer is not explicitly stated in the context, respond with: "I don't have enough information to answer that confidently." Context: {context} Question: {question} Answer (based only on context above):''' ``` **2. Set Temperature to 0** ```python response = client.chat.completions.create( model="gpt-4o", messages=messages, temperature=0, # Deterministic — minimises creative hallucination ) ``` **3. Self-Consistency Checking** ```python def self_consistent_answer(question: str, n: int = 5) -> str: '''Generate N answers, return only if majority agree.''' from collections import Counter answers = [call_llm(question, temperature=0.3) for _ in range(n)] # Simple: check if answers share key phrases # In production: use embedding similarity clustering from difflib import SequenceMatcher def similarity(a, b): return SequenceMatcher(None, a, b).ratio() # Find answer most similar to all others (centroid) best_answer = None best_score = -1 for candidate in answers: avg_sim = sum(similarity(candidate, other) for other in answers if other != candidate) / (n - 1) if avg_sim > best_score: best_score = avg_sim best_answer = candidate return best_answer if best_score > 0.7 else "Could not reach consensus on an answer." ``` **4. Explicit Uncertainty Instruction** ```python UNCERTAINTY_PROMPT = '''When answering: - If you are confident (> 90%): Answer directly. - If uncertain (50-90%): Prefix with "I believe..." - If guessing (< 50%): Say "I'm not certain, but..." or decline to answer. Never invent specific numbers, dates, names, or citations.''' ``` **5. Citation Requirements** ```python CITATION_PROMPT = '''Answer the question and cite the specific source chunk for each claim. Format: [claim] (Source: {source_name}, paragraph {N}) If you cannot cite a source, do not make the claim.''' ``` **6. Post-hoc Hallucination Detection** ```python def detect_hallucination(answer: str, context: str, question: str) -> dict: '''Use LLM to check if answer is supported by context.''' from openai import OpenAI import json client = OpenAI() response = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": f'''Is every claim in the answer supported by the context? Context: {context} Answer: {answer} Respond as JSON: {{"supported": true/false, "unsupported_claims": ["..."]}}''' }], response_format={"type": "json_object"}, temperature=0, ) return json.loads(response.choices[0].message.content) ``` ### Detection & Monitoring ```python import random def monitored_rag_call(question: str, context: str) -> str: answer = generate_answer(question, context) # Sample 10% of responses for hallucination detection if random.random() < 0.10: result = detect_hallucination(answer, context, question) if not result["supported"]: logger.warning("hallucination_detected", question=question, unsupported=result["unsupported_claims"]) return answer ``` ### Mitigation Priority | Technique | Effectiveness | Cost | Implementation | |-----------|-------------|------|---------------| | RAG grounding | Very high | Medium | Build retrieval pipeline | | Temperature = 0 | High | None | One parameter | | Explicit uncertainty | Medium | None | Prompt engineering | | Self-consistency | High | 5× tokens | More API calls | | Citation requirement | High | Low | Prompt engineering | | Post-hoc detection | Detection only | 10% overhead | Eval framework | > **The hallucination paradox:** The more convincingly an LLM writes, the harder it is to spot hallucinations. Always prioritise grounding in verified sources over fluency.

How would you mitigate hallucinations in an LLM response?

Answer

Mitigating Hallucinations in LLM Responses

Root Causes

Mitigation Strategies

Detection & Monitoring

Mitigation Priority

Related Concepts

Design a safety filtering system for an LLM chatbot.

Explain BLEU, ROUGE, and METEOR scores. When would you use each?

How would you run a blind evaluation of two LLM models?

How do LLMs set their maximum context window? Explain the role of architecture, training, and API configuration.

Cause	Description
Training distribution gaps	Model generates from patterns, not facts
High temperature	More randomness = more hallucinations
Long context confusion	Model loses track in very long prompts
Out-of-domain queries	Model guesses when it doesn't know
Ambiguous prompts	Model fills ambiguity with invented detail

Technique	Effectiveness	Cost	Implementation
RAG grounding	Very high	Medium	Build retrieval pipeline
Temperature = 0	High	None	One parameter
Explicit uncertainty	Medium	None	Prompt engineering
Self-consistency	High	5× tokens	More API calls
Citation requirement	High	Low	Prompt engineering
Post-hoc detection	Detection only	10% overhead	Eval framework