How does RAG work with Pinecone vector database?

Question

Accepted Answer

## RAG with Pinecone Vector Database

Pinecone is a managed vector database optimized for production-scale semantic search. Here's a complete RAG implementation.

### Setup

```bash
pip install pinecone-client openai anthropic
```

### Step 1: Create Pinecone Index

```python
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-pinecone-api-key")

pc.create_index(
    name="rag-kb",
    dimension=1536,        # OpenAI ada-002 embedding size
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

index = pc.Index("rag-kb")
```

### Step 2: Index Documents

```python
from openai import OpenAI
import uuid

openai_client = OpenAI()

def get_embedding(text: str) -> list[float]:
    return openai_client.embeddings.create(
        model="text-embedding-ada-002", input=text
    ).data[0].embedding

documents = [
    {"text": "Pinecone is a managed vector database.", "source": "docs"},
    {"text": "RAG combines retrieval with generation.", "source": "docs"},
    {"text": "Embeddings map text to high-dimensional vectors.", "source": "docs"},
]

vectors = [
    {
        "id": str(uuid.uuid4()),
        "values": get_embedding(doc["text"]),
        "metadata": {"text": doc["text"], "source": doc["source"]}
    }
    for doc in documents
]

index.upsert(vectors=vectors, namespace="main")
```

### Step 3: RAG Query

```python
from anthropic import Anthropic

anthropic_client = Anthropic()

def rag_with_pinecone(question: str) -> str:
    # Embed question
    query_vec = get_embedding(question)

# Semantic search
    results = index.query(
        namespace="main",
        vector=query_vec,
        top_k=3,
        include_metadata=True
    )

# Filter by score threshold
    chunks = [
        m["metadata"]["text"]
        for m in results["matches"]
        if m["score"] > 0.75
    ]

if not chunks:
        return "No relevant information found."

context = "
".join(chunks)

# Generate with Claude
    response = anthropic_client.messages.create(
        model="claude-opus-4-6",
        max_tokens=512,
        messages=[{"role": "user", "content":
            f"Answer using only this context:
{context}

Question: {question}"}]
    )
    return response.content[0].text

print(rag_with_pinecone("What is Pinecone used for?"))
```

### Production Considerations

| Consideration | Best Practice |
|-------------|--------------|
| **Chunking** | Split docs into 256-512 token chunks before indexing |
| **Score threshold** | Filter results with similarity < 0.75 |
| **Namespaces** | Separate namespaces for different domains |
| **Metadata** | Store source, date, doc_id for filtering |
| **Updates** | Upsert with same ID to update existing vectors |
| **Hybrid search** | Combine Pinecone dense + BM25 sparse |

### Metadata Filtering Example

```python
# Only search within a specific category
results = index.query(
    namespace="main",
    vector=query_vec,
    top_k=5,
    include_metadata=True,
    filter={"source": {"$eq": "product_docs"}}  # Pinecone metadata filter
)
```

### Cost Estimate

| Component | Cost |
|-----------|------|
| Pinecone (starter) | Free up to 1M vectors |
| OpenAI embedding | $0.0001 per 1K tokens |
| Claude generation | $3/M input, $15/M output |
| Per 1000 queries | ~$0.50-5 depending on doc size |

How does RAG work with Pinecone vector database?

Answer

RAG with Pinecone Vector Database

Setup

Step 1: Create Pinecone Index

Step 2: Index Documents

Step 3: RAG Query

Production Considerations

Metadata Filtering Example

Cost Estimate

Related Concepts

What is AI?

What are all the current types of AI?

What is Machine Learning (ML)?

What is Deep Learning in AI?

What is an LLM?

Consideration	Best Practice
Chunking	Split docs into 256-512 token chunks before indexing
Score threshold	Filter results with similarity < 0.75
Namespaces	Separate namespaces for different domains
Metadata	Store source, date, doc_id for filtering
Updates	Upsert with same ID to update existing vectors
Hybrid search	Combine Pinecone dense + BM25 sparse

Component	Cost
Pinecone (starter)	Free up to 1M vectors
OpenAI embedding	$0.0001 per 1K tokens
Claude generation	$3/M input,$ 15/M output
Per 1000 queries	~$0.50-5 depending on doc size