Design a prompt for sentiment analysis. What could go wrong?
#gen-ai#prompt-engineering
Answer
Designing Effective Prompts for Sentiment Analysis
Sentiment analysis is a common first task when deploying LLMs. Here's how to design robust prompts and handle the failure modes.
Basic Prompt Design
pythonfrom openai import OpenAI from enum import Enum import json client = OpenAI() class Sentiment(str, Enum): POSITIVE = "positive" NEGATIVE = "negative" NEUTRAL = "neutral" MIXED = "mixed" SYSTEM_PROMPT = '''You are a sentiment analysis expert for an e-commerce platform. Classify the sentiment of customer reviews. Consider: - Overall tone, not just individual words - Sarcasm and irony (e.g. "Oh great, another broken product" = Negative) - Mixed sentiments (praise one aspect, criticise another = Mixed) Respond ONLY with valid JSON matching this schema: {"sentiment": "positive|negative|neutral|mixed", "confidence": 0.0-1.0, "reasoning": "brief explanation"}''' def analyze_sentiment(review: str) -> dict: response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": f"Review: {review}"} ], temperature=0, response_format={"type": "json_object"} ) return json.loads(response.choices[0].message.content) # Test result = analyze_sentiment("Product is amazing but delivery took 3 weeks — unacceptable!") print(result) # {"sentiment": "mixed", "confidence": 0.95, "reasoning": "Positive product quality, negative delivery experience"}
Handling Failure Modes
| Failure Mode | Example | Fix |
|---|---|---|
| Sarcasm misclassified | "Oh great, another defect 🙄" → Positive | Add sarcasm instruction + examples |
| Domain-specific terms | "This knife has terrible flex" (flex = good for bakers) | Add domain context to system prompt |
| Mixed sentiment collapsed | "Love the product, hate the price" → Positive | Explicitly define Mixed class |
| JSON parsing failure | LLM outputs extra text | Use text |
| Multilingual input | French review misclassified | Add: "Reviews may be in any language" |
| Emoji-heavy reviews | "😍😍😍" | Include emoji examples in few-shot |
Production-Ready Implementation
pythonimport json from typing import Optional def safe_analyze_sentiment(review: str, fallback: Optional[str] = None) -> dict: try: result = analyze_sentiment(review) # Validate response schema assert result["sentiment"] in ["positive", "negative", "neutral", "mixed"] assert 0.0 <= result["confidence"] <= 1.0 return result except (json.JSONDecodeError, KeyError, AssertionError) as e: # Retry with simplified prompt simplified = client.chat.completions.create( model="gpt-4o", messages=[{ "role": "user", "content": f"Rate this review as positive, negative, or neutral ONLY: {review}" }], temperature=0 ) sentiment = simplified.choices[0].message.content.strip().lower() return {"sentiment": sentiment, "confidence": 0.7, "reasoning": "Simplified fallback"}
Prompt Engineering Best Practices for Classification
- Be explicit about edge cases (sarcasm, mixed sentiment, emojis)
- Define your classes precisely — what separates Neutral from Mixed?
- Use structured output (JSON) to prevent parsing errors
- Set for deterministic classificationtext
temperature=0 - Include few-shot examples for ambiguous cases
- Version your prompts — small changes can significantly affect accuracy
Key lesson: Always validate and sanitise user input before embedding it in a prompt. A user can inject "Ignore all previous instructions" — treat user content as untrusted data, not trusted instructions.