What is Semantic Search?

Question

Accepted Answer

## What is Semantic Search?

Semantic search finds results based on **meaning and intent**, not just exact keyword matches. It uses vector embeddings to represent text as high-dimensional vectors, then measures similarity between vectors to surface conceptually related content — even when different words are used.

### How It Works

1. **Embed** the query and all documents using an embedding model (e.g., `text-embedding-3-small`)
2. **Index** document vectors in a vector store (FAISS, Pinecone, Chroma)
3. **Search** by computing cosine similarity or dot product between query and stored vectors
4. **Return** the top-K most similar documents

### Semantic Search vs Keyword Search

| Feature | Keyword Search (BM25/TF-IDF) | Semantic Search |
|---------|-------------------------------|-----------------|
| Matching | Exact word match | Meaning / intent |
| "Car" vs "Automobile" | No match | Match |
| Handles synonyms | No | Yes |
| Speed | Faster | Slightly slower |
| Relevance | Lexical | Conceptual |
| Infrastructure | Elasticsearch, Solr | Vector DB (Pinecone, FAISS) |

### Code Example

```python
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")

# Document corpus
docs = [
    "How to train a neural network",
    "Best practices for LLM fine-tuning",
    "Introduction to gradient descent",
]

# Embed and index documents
doc_embeddings = model.encode(docs)
index = faiss.IndexFlatL2(doc_embeddings.shape[1])
index.add(np.array(doc_embeddings))

# Query by meaning, not keywords
query = "How do I teach a model new skills?"
query_vec = model.encode([query])

distances, indices = index.search(query_vec, k=2)
for i in indices[0]:
    print(docs[i])
# Output: "Best practices for LLM fine-tuning", "How to train a neural network"
```

### Use Cases

* **RAG pipelines** — retrieve relevant documents for LLM context grounding
* **Enterprise search** — search internal knowledge bases by meaning
* **Recommendation systems** — find similar products, articles, or users
* **Duplicate detection** — identify near-duplicate content semantically

### Hybrid Search (Best of Both Worlds)

Combine semantic + keyword search using **Reciprocal Rank Fusion (RRF)**:

```python
# LangChain hybrid retriever example
from langchain.retrievers import EnsembleRetriever

retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, faiss_retriever],
    weights=[0.4, 0.6]  # 40% keyword, 60% semantic
)
```

> **Best Practice:** Use hybrid search in production RAG systems — semantic handles meaning, keyword handles exact matches (product IDs, names, codes).

Popular tools: [FAISS](https://faiss.ai), [Pinecone](https://www.pinecone.io), [Weaviate](https://weaviate.io), [Chroma](https://www.trychroma.com), [Qdrant](https://qdrant.tech)

What is Semantic Search?

Answer

What is Semantic Search?

How It Works

Semantic Search vs Keyword Search

Code Example

Use Cases

Hybrid Search (Best of Both Worlds)

Related Concepts

What is AI?

What are all the current types of AI?

What is Machine Learning (ML)?

What is Deep Learning in AI?

What is an LLM?

Feature	Keyword Search (BM25/TF-IDF)	Semantic Search
Matching	Exact word match	Meaning / intent
"Car" vs "Automobile"	No match	Match
Handles synonyms	No	Yes
Speed	Faster	Slightly slower
Relevance	Lexical	Conceptual
Infrastructure	Elasticsearch, Solr	Vector DB (Pinecone, FAISS)