How would you mitigate hallucinations in an LLM response?
#gen-ai#llm#safety
Answer
Mitigating Hallucinations in LLM Responses
Hallucinations occur when an LLM generates plausible-sounding but factually incorrect content. They are a fundamental property of current LLMs — you cannot eliminate them, but you can reduce and detect them.
Root Causes
| Cause | Description |
|---|---|
| Training distribution gaps | Model generates from patterns, not facts |
| High temperature | More randomness = more hallucinations |
| Long context confusion | Model loses track in very long prompts |
| Out-of-domain queries | Model guesses when it doesn't know |
| Ambiguous prompts | Model fills ambiguity with invented detail |
Mitigation Strategies
1. Grounding with RAG (Most Effective)
pythonGROUNDED_PROMPT = '''Answer the question using ONLY the information in the context below. If the answer is not explicitly stated in the context, respond with: "I don't have enough information to answer that confidently." Context: {context} Question: {question} Answer (based only on context above):'''
2. Set Temperature to 0
pythonresponse = client.chat.completions.create( model="gpt-4o", messages=messages, temperature=0, # Deterministic — minimises creative hallucination )
3. Self-Consistency Checking
pythondef self_consistent_answer(question: str, n: int = 5) -> str: '''Generate N answers, return only if majority agree.''' from collections import Counter answers = [call_llm(question, temperature=0.3) for _ in range(n)] # Simple: check if answers share key phrases # In production: use embedding similarity clustering from difflib import SequenceMatcher def similarity(a, b): return SequenceMatcher(None, a, b).ratio() # Find answer most similar to all others (centroid) best_answer = None best_score = -1 for candidate in answers: avg_sim = sum(similarity(candidate, other) for other in answers if other != candidate) / (n - 1) if avg_sim > best_score: best_score = avg_sim best_answer = candidate return best_answer if best_score > 0.7 else "Could not reach consensus on an answer."
4. Explicit Uncertainty Instruction
pythonUNCERTAINTY_PROMPT = '''When answering: - If you are confident (> 90%): Answer directly. - If uncertain (50-90%): Prefix with "I believe..." - If guessing (< 50%): Say "I'm not certain, but..." or decline to answer. Never invent specific numbers, dates, names, or citations.'''
5. Citation Requirements
pythonCITATION_PROMPT = '''Answer the question and cite the specific source chunk for each claim. Format: [claim] (Source: {source_name}, paragraph {N}) If you cannot cite a source, do not make the claim.'''
6. Post-hoc Hallucination Detection
pythondef detect_hallucination(answer: str, context: str, question: str) -> dict: '''Use LLM to check if answer is supported by context.''' from openai import OpenAI import json client = OpenAI() response = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": f'''Is every claim in the answer supported by the context? Context: {context} Answer: {answer} Respond as JSON: {{"supported": true/false, "unsupported_claims": ["..."]}}''' }], response_format={"type": "json_object"}, temperature=0, ) return json.loads(response.choices[0].message.content)
Detection & Monitoring
pythonimport random def monitored_rag_call(question: str, context: str) -> str: answer = generate_answer(question, context) # Sample 10% of responses for hallucination detection if random.random() < 0.10: result = detect_hallucination(answer, context, question) if not result["supported"]: logger.warning("hallucination_detected", question=question, unsupported=result["unsupported_claims"]) return answer
Mitigation Priority
| Technique | Effectiveness | Cost | Implementation |
|---|---|---|---|
| RAG grounding | Very high | Medium | Build retrieval pipeline |
| Temperature = 0 | High | None | One parameter |
| Explicit uncertainty | Medium | None | Prompt engineering |
| Self-consistency | High | 5× tokens | More API calls |
| Citation requirement | High | Low | Prompt engineering |
| Post-hoc detection | Detection only | 10% overhead | Eval framework |
The hallucination paradox: The more convincingly an LLM writes, the harder it is to spot hallucinations. Always prioritise grounding in verified sources over fluency.