How would you structure a production LangChain application?

Question

Accepted Answer

## Structuring a Production LangChain Application

A production LangChain app needs clear separation of concerns, configuration management, error handling, and observability — not just a working demo.

### Recommended Project Structure

```
my_rag_app/
├── src/
│   ├── config.py          # All configuration (env vars, model settings)
│   ├── chains/
│   │   ├── rag_chain.py   # RAG pipeline
│   │   └── agent.py       # Agent definition
│   ├── retrieval/
│   │   ├── indexer.py     # Document ingestion + indexing
│   │   └── retriever.py   # Retrieval logic
│   ├── memory/
│   │   └── store.py       # Conversation memory
│   └── api/
│       └── routes.py      # FastAPI endpoints
├── tests/
│   ├── test_rag.py
│   └── test_retrieval.py
├── data/
│   └── documents/
├── .env
└── pyproject.toml
```

### Configuration Module (`config.py`)

```python
from pydantic_settings import BaseSettings
from functools import lru_cache

class Settings(BaseSettings):
    # LLM
    openai_api_key: str
    llm_model: str = "gpt-4o"
    llm_temperature: float = 0.0
    llm_max_tokens: int = 2048

# Retrieval
    embedding_model: str = "text-embedding-3-small"
    top_k: int = 5
    chunk_size: int = 512
    chunk_overlap: int = 64

# Vector Store
    pinecone_api_key: str = ""
    pinecone_index: str = "production"

class Config:
        env_file = ".env"

@lru_cache()
def get_settings() -> Settings:
    return Settings()
```

### RAG Chain (`chains/rag_chain.py`)

```python
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.chains import RetrievalQA
from langchain.prompts import ChatPromptTemplate
from src.config import get_settings

settings = get_settings()

SYSTEM_TEMPLATE = '''You are a helpful assistant. Answer using ONLY the provided context.
If the answer is not in the context, say "I don't have enough information."

Context:
{context}'''

def build_rag_chain(retriever):
    llm = ChatOpenAI(
        model=settings.llm_model,
        temperature=settings.llm_temperature,
        api_key=settings.openai_api_key,
    )

prompt = ChatPromptTemplate.from_messages([
        ("system", SYSTEM_TEMPLATE),
        ("human", "{question}"),
    ])

return RetrievalQA.from_chain_type(
        llm=llm,
        retriever=retriever,
        chain_type_kwargs={"prompt": prompt},
        return_source_documents=True,
    )
```

### API Layer with FastAPI

```python
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import time

app = FastAPI(title="RAG API", version="1.0.0")

class QueryRequest(BaseModel):
    question: str
    user_id: str | None = None

class QueryResponse(BaseModel):
    answer: str
    sources: list[str]
    latency_ms: int

@app.post("/query", response_model=QueryResponse)
async def query_endpoint(request: QueryRequest):
    start = time.perf_counter()
    try:
        result = rag_chain.invoke({"query": request.question})
        latency = int((time.perf_counter() - start) * 1000)

return QueryResponse(
            answer=result["result"],
            sources=[doc.metadata.get("source", "") for doc in result["source_documents"]],
            latency_ms=latency,
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))
```

### LangSmith for Observability

```python
import os

# Enable LangSmith tracing (set in .env)
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "ls_..."
os.environ["LANGCHAIN_PROJECT"] = "my-rag-production"

# Every chain invocation is now automatically traced
# View at: smith.langchain.com
```

### Key Production Principles

| Concern | Approach |
|---------|---------|
| **Configuration** | Pydantic Settings + env files |
| **Error handling** | Try/except at API boundary, not inside chains |
| **Observability** | LangSmith tracing or custom callbacks |
| **Testing** | Unit-test each component; integration-test the full chain |
| **Versioning** | Pin all dependency versions (`poetry.lock`) |
| **Rate limits** | Async + semaphore for batch processing |

> **Most important lesson:** Keep your chains as simple as possible. Every layer of abstraction adds debugging complexity. Only abstract when you have clear duplication.

How would you structure a production LangChain application?

Answer

Structuring a Production LangChain Application

Recommended Project Structure

Configuration Module (
text
`config.py`
)

RAG Chain (
text
`chains/rag_chain.py`
)

API Layer with FastAPI

LangSmith for Observability

Key Production Principles

Related Concepts

Explain decorators in Python. How would you use them in an LLM application?

What are context managers? How would you use them for LLM resource management?

Explain async/await in Python. Why is it important for API-heavy applications?

What are generators in Python? How are they used in streaming LLM responses?

Explain list comprehensions vs. loops in Python. When is each appropriate?

Concern	Approach
Configuration	Pydantic Settings + env files
Error handling	Try/except at API boundary, not inside chains
Observability	LangSmith tracing or custom callbacks
Testing	Unit-test each component; integration-test the full chain
Versioning	Pin all dependency versions ( text `poetry.lock` )
Rate limits	Async + semaphore for batch processing

How would you structure a production LangChain application?

Answer

Structuring a Production LangChain Application

Recommended Project Structure

Configuration Module (textCopyconfig.py)

RAG Chain (textCopychains/rag_chain.py)

API Layer with FastAPI

LangSmith for Observability

Key Production Principles

Related Concepts

Explain decorators in Python. How would you use them in an LLM application?

What are context managers? How would you use them for LLM resource management?

Explain async/await in Python. Why is it important for API-heavy applications?

What are generators in Python? How are they used in streaming LLM responses?

Explain list comprehensions vs. loops in Python. When is each appropriate?

Configuration Module (
text
`config.py`
)

RAG Chain (
text
`chains/rag_chain.py`
)