Concept #88Mediumextended-ai-concepts

How does RAG work with Pinecone vector database?

#gen-ai#rag#vector-db

Answer

RAG with Pinecone Vector Database

Pinecone is a managed vector database optimized for production-scale semantic search. Here's a complete RAG implementation.

Setup

bash
pip install pinecone-client openai anthropic

Step 1: Create Pinecone Index

python
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-pinecone-api-key")

pc.create_index(
    name="rag-kb",
    dimension=1536,        # OpenAI ada-002 embedding size
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

index = pc.Index("rag-kb")

Step 2: Index Documents

python
from openai import OpenAI
import uuid

openai_client = OpenAI()

def get_embedding(text: str) -> list[float]:
    return openai_client.embeddings.create(
        model="text-embedding-ada-002", input=text
    ).data[0].embedding

documents = [
    {"text": "Pinecone is a managed vector database.", "source": "docs"},
    {"text": "RAG combines retrieval with generation.", "source": "docs"},
    {"text": "Embeddings map text to high-dimensional vectors.", "source": "docs"},
]

vectors = [
    {
        "id": str(uuid.uuid4()),
        "values": get_embedding(doc["text"]),
        "metadata": {"text": doc["text"], "source": doc["source"]}
    }
    for doc in documents
]

index.upsert(vectors=vectors, namespace="main")

Step 3: RAG Query

python
from anthropic import Anthropic

anthropic_client = Anthropic()

def rag_with_pinecone(question: str) -> str:
    # Embed question
    query_vec = get_embedding(question)

    # Semantic search
    results = index.query(
        namespace="main",
        vector=query_vec,
        top_k=3,
        include_metadata=True
    )

    # Filter by score threshold
    chunks = [
        m["metadata"]["text"]
        for m in results["matches"]
        if m["score"] > 0.75
    ]

    if not chunks:
        return "No relevant information found."

    context = "\n".join(chunks)

    # Generate with Claude
    response = anthropic_client.messages.create(
        model="claude-opus-4-6",
        max_tokens=512,
        messages=[{"role": "user", "content":
            f"Answer using only this context:\n{context}\n\nQuestion: {question}"}]
    )
    return response.content[0].text

print(rag_with_pinecone("What is Pinecone used for?"))

Production Considerations

ConsiderationBest Practice
ChunkingSplit docs into 256-512 token chunks before indexing
Score thresholdFilter results with similarity < 0.75
NamespacesSeparate namespaces for different domains
MetadataStore source, date, doc_id for filtering
UpdatesUpsert with same ID to update existing vectors
Hybrid searchCombine Pinecone dense + BM25 sparse

Metadata Filtering Example

python
# Only search within a specific category
results = index.query(
    namespace="main",
    vector=query_vec,
    top_k=5,
    include_metadata=True,
    filter={"source": {"$eq": "product_docs"}}  # Pinecone metadata filter
)

Cost Estimate

ComponentCost
Pinecone (starter)Free up to 1M vectors
OpenAI embedding$0.0001 per 1K tokens
Claude generation3/Minput,3/M input, 15/M output
Per 1000 queries~$0.50-5 depending on doc size