Concept #51Hardadvanced-topics

How would you mitigate hallucinations in an LLM response?

#gen-ai#llm#safety

Answer

Mitigating Hallucinations in LLM Responses

Hallucinations occur when an LLM generates plausible-sounding but factually incorrect content. They are a fundamental property of current LLMs — you cannot eliminate them, but you can reduce and detect them.

Root Causes

CauseDescription
Training distribution gapsModel generates from patterns, not facts
High temperatureMore randomness = more hallucinations
Long context confusionModel loses track in very long prompts
Out-of-domain queriesModel guesses when it doesn't know
Ambiguous promptsModel fills ambiguity with invented detail

Mitigation Strategies

1. Grounding with RAG (Most Effective)

python
GROUNDED_PROMPT = '''Answer the question using ONLY the information in the context below.
If the answer is not explicitly stated in the context, respond with:
"I don't have enough information to answer that confidently."

Context:
{context}

Question: {question}

Answer (based only on context above):'''

2. Set Temperature to 0

python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    temperature=0,  # Deterministic — minimises creative hallucination
)

3. Self-Consistency Checking

python
def self_consistent_answer(question: str, n: int = 5) -> str:
    '''Generate N answers, return only if majority agree.'''
    from collections import Counter

    answers = [call_llm(question, temperature=0.3) for _ in range(n)]

    # Simple: check if answers share key phrases
    # In production: use embedding similarity clustering
    from difflib import SequenceMatcher

    def similarity(a, b):
        return SequenceMatcher(None, a, b).ratio()

    # Find answer most similar to all others (centroid)
    best_answer = None
    best_score = -1
    for candidate in answers:
        avg_sim = sum(similarity(candidate, other) for other in answers if other != candidate) / (n - 1)
        if avg_sim > best_score:
            best_score = avg_sim
            best_answer = candidate

    return best_answer if best_score > 0.7 else "Could not reach consensus on an answer."

4. Explicit Uncertainty Instruction

python
UNCERTAINTY_PROMPT = '''When answering:
- If you are confident (> 90%): Answer directly.
- If uncertain (50-90%): Prefix with "I believe..."
- If guessing (< 50%): Say "I'm not certain, but..." or decline to answer.
Never invent specific numbers, dates, names, or citations.'''

5. Citation Requirements

python
CITATION_PROMPT = '''Answer the question and cite the specific source chunk for each claim.
Format: [claim] (Source: {source_name}, paragraph {N})
If you cannot cite a source, do not make the claim.'''

6. Post-hoc Hallucination Detection

python
def detect_hallucination(answer: str, context: str, question: str) -> dict:
    '''Use LLM to check if answer is supported by context.'''
    from openai import OpenAI
    import json

    client = OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content":
            f'''Is every claim in the answer supported by the context?
Context: {context}
Answer: {answer}
Respond as JSON: {{"supported": true/false, "unsupported_claims": ["..."]}}'''
        }],
        response_format={"type": "json_object"},
        temperature=0,
    )
    return json.loads(response.choices[0].message.content)

Detection & Monitoring

python
import random

def monitored_rag_call(question: str, context: str) -> str:
    answer = generate_answer(question, context)

    # Sample 10% of responses for hallucination detection
    if random.random() < 0.10:
        result = detect_hallucination(answer, context, question)
        if not result["supported"]:
            logger.warning("hallucination_detected",
                question=question, unsupported=result["unsupported_claims"])

    return answer

Mitigation Priority

TechniqueEffectivenessCostImplementation
RAG groundingVery highMediumBuild retrieval pipeline
Temperature = 0HighNoneOne parameter
Explicit uncertaintyMediumNonePrompt engineering
Self-consistencyHigh5× tokensMore API calls
Citation requirementHighLowPrompt engineering
Post-hoc detectionDetection only10% overheadEval framework

The hallucination paradox: The more convincingly an LLM writes, the harder it is to spot hallucinations. Always prioritise grounding in verified sources over fluency.