How does RAG work with Pinecone vector database?
#gen-ai#rag#vector-db
Answer
RAG with Pinecone Vector Database
Pinecone is a managed vector database optimized for production-scale semantic search. Here's a complete RAG implementation.
Setup
bashpip install pinecone-client openai anthropic
Step 1: Create Pinecone Index
pythonfrom pinecone import Pinecone, ServerlessSpec pc = Pinecone(api_key="your-pinecone-api-key") pc.create_index( name="rag-kb", dimension=1536, # OpenAI ada-002 embedding size metric="cosine", spec=ServerlessSpec(cloud="aws", region="us-east-1") ) index = pc.Index("rag-kb")
Step 2: Index Documents
pythonfrom openai import OpenAI import uuid openai_client = OpenAI() def get_embedding(text: str) -> list[float]: return openai_client.embeddings.create( model="text-embedding-ada-002", input=text ).data[0].embedding documents = [ {"text": "Pinecone is a managed vector database.", "source": "docs"}, {"text": "RAG combines retrieval with generation.", "source": "docs"}, {"text": "Embeddings map text to high-dimensional vectors.", "source": "docs"}, ] vectors = [ { "id": str(uuid.uuid4()), "values": get_embedding(doc["text"]), "metadata": {"text": doc["text"], "source": doc["source"]} } for doc in documents ] index.upsert(vectors=vectors, namespace="main")
Step 3: RAG Query
pythonfrom anthropic import Anthropic anthropic_client = Anthropic() def rag_with_pinecone(question: str) -> str: # Embed question query_vec = get_embedding(question) # Semantic search results = index.query( namespace="main", vector=query_vec, top_k=3, include_metadata=True ) # Filter by score threshold chunks = [ m["metadata"]["text"] for m in results["matches"] if m["score"] > 0.75 ] if not chunks: return "No relevant information found." context = "\n".join(chunks) # Generate with Claude response = anthropic_client.messages.create( model="claude-opus-4-6", max_tokens=512, messages=[{"role": "user", "content": f"Answer using only this context:\n{context}\n\nQuestion: {question}"}] ) return response.content[0].text print(rag_with_pinecone("What is Pinecone used for?"))
Production Considerations
| Consideration | Best Practice |
|---|---|
| Chunking | Split docs into 256-512 token chunks before indexing |
| Score threshold | Filter results with similarity < 0.75 |
| Namespaces | Separate namespaces for different domains |
| Metadata | Store source, date, doc_id for filtering |
| Updates | Upsert with same ID to update existing vectors |
| Hybrid search | Combine Pinecone dense + BM25 sparse |
Metadata Filtering Example
python# Only search within a specific category results = index.query( namespace="main", vector=query_vec, top_k=5, include_metadata=True, filter={"source": {"$eq": "product_docs"}} # Pinecone metadata filter )
Cost Estimate
| Component | Cost |
|---|---|
| Pinecone (starter) | Free up to 1M vectors |
| OpenAI embedding | $0.0001 per 1K tokens |
| Claude generation | 15/M output |
| Per 1000 queries | ~$0.50-5 depending on doc size |