Concept #27Mediumpython-for-gen-ai

How would you structure a production LangChain application?

#gen-ai#langchain

Answer

Structuring a Production LangChain Application

A production LangChain app needs clear separation of concerns, configuration management, error handling, and observability — not just a working demo.

Recommended Project Structure

text
my_rag_app/
ā”œā”€ā”€ src/
│   ā”œā”€ā”€ config.py          # All configuration (env vars, model settings)
│   ā”œā”€ā”€ chains/
│   │   ā”œā”€ā”€ rag_chain.py   # RAG pipeline
│   │   └── agent.py       # Agent definition
│   ā”œā”€ā”€ retrieval/
│   │   ā”œā”€ā”€ indexer.py     # Document ingestion + indexing
│   │   └── retriever.py   # Retrieval logic
│   ā”œā”€ā”€ memory/
│   │   └── store.py       # Conversation memory
│   └── api/
│       └── routes.py      # FastAPI endpoints
ā”œā”€ā”€ tests/
│   ā”œā”€ā”€ test_rag.py
│   └── test_retrieval.py
ā”œā”€ā”€ data/
│   └── documents/
ā”œā”€ā”€ .env
└── pyproject.toml

Configuration Module (
text
config.py
)

python
from pydantic_settings import BaseSettings
from functools import lru_cache

class Settings(BaseSettings):
    # LLM
    openai_api_key: str
    llm_model: str = "gpt-4o"
    llm_temperature: float = 0.0
    llm_max_tokens: int = 2048

    # Retrieval
    embedding_model: str = "text-embedding-3-small"
    top_k: int = 5
    chunk_size: int = 512
    chunk_overlap: int = 64

    # Vector Store
    pinecone_api_key: str = ""
    pinecone_index: str = "production"

    class Config:
        env_file = ".env"

@lru_cache()
def get_settings() -> Settings:
    return Settings()

RAG Chain (
text
chains/rag_chain.py
)

python
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.chains import RetrievalQA
from langchain.prompts import ChatPromptTemplate
from src.config import get_settings

settings = get_settings()

SYSTEM_TEMPLATE = '''You are a helpful assistant. Answer using ONLY the provided context.
If the answer is not in the context, say "I don't have enough information."

Context:
{context}'''

def build_rag_chain(retriever):
    llm = ChatOpenAI(
        model=settings.llm_model,
        temperature=settings.llm_temperature,
        api_key=settings.openai_api_key,
    )

    prompt = ChatPromptTemplate.from_messages([
        ("system", SYSTEM_TEMPLATE),
        ("human", "{question}"),
    ])

    return RetrievalQA.from_chain_type(
        llm=llm,
        retriever=retriever,
        chain_type_kwargs={"prompt": prompt},
        return_source_documents=True,
    )

API Layer with FastAPI

python
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import time

app = FastAPI(title="RAG API", version="1.0.0")

class QueryRequest(BaseModel):
    question: str
    user_id: str | None = None

class QueryResponse(BaseModel):
    answer: str
    sources: list[str]
    latency_ms: int

@app.post("/query", response_model=QueryResponse)
async def query_endpoint(request: QueryRequest):
    start = time.perf_counter()
    try:
        result = rag_chain.invoke({"query": request.question})
        latency = int((time.perf_counter() - start) * 1000)

        return QueryResponse(
            answer=result["result"],
            sources=[doc.metadata.get("source", "") for doc in result["source_documents"]],
            latency_ms=latency,
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

LangSmith for Observability

python
import os

# Enable LangSmith tracing (set in .env)
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "ls_..."
os.environ["LANGCHAIN_PROJECT"] = "my-rag-production"

# Every chain invocation is now automatically traced
# View at: smith.langchain.com

Key Production Principles

ConcernApproach
ConfigurationPydantic Settings + env files
Error handlingTry/except at API boundary, not inside chains
ObservabilityLangSmith tracing or custom callbacks
TestingUnit-test each component; integration-test the full chain
VersioningPin all dependency versions (
text
poetry.lock
)
Rate limitsAsync + semaphore for batch processing

Most important lesson: Keep your chains as simple as possible. Every layer of abstraction adds debugging complexity. Only abstract when you have clear duplication.