How would you structure a production LangChain application?
#gen-ai#langchain
Answer
Structuring a Production LangChain Application
A production LangChain app needs clear separation of concerns, configuration management, error handling, and observability ā not just a working demo.
Recommended Project Structure
textmy_rag_app/ āāā src/ ā āāā config.py # All configuration (env vars, model settings) ā āāā chains/ ā ā āāā rag_chain.py # RAG pipeline ā ā āāā agent.py # Agent definition ā āāā retrieval/ ā ā āāā indexer.py # Document ingestion + indexing ā ā āāā retriever.py # Retrieval logic ā āāā memory/ ā ā āāā store.py # Conversation memory ā āāā api/ ā āāā routes.py # FastAPI endpoints āāā tests/ ā āāā test_rag.py ā āāā test_retrieval.py āāā data/ ā āāā documents/ āāā .env āāā pyproject.toml
Configuration Module (textconfig.py
)
text
config.pypythonfrom pydantic_settings import BaseSettings from functools import lru_cache class Settings(BaseSettings): # LLM openai_api_key: str llm_model: str = "gpt-4o" llm_temperature: float = 0.0 llm_max_tokens: int = 2048 # Retrieval embedding_model: str = "text-embedding-3-small" top_k: int = 5 chunk_size: int = 512 chunk_overlap: int = 64 # Vector Store pinecone_api_key: str = "" pinecone_index: str = "production" class Config: env_file = ".env" @lru_cache() def get_settings() -> Settings: return Settings()
RAG Chain (textchains/rag_chain.py
)
text
chains/rag_chain.pypythonfrom langchain_openai import ChatOpenAI, OpenAIEmbeddings from langchain.chains import RetrievalQA from langchain.prompts import ChatPromptTemplate from src.config import get_settings settings = get_settings() SYSTEM_TEMPLATE = '''You are a helpful assistant. Answer using ONLY the provided context. If the answer is not in the context, say "I don't have enough information." Context: {context}''' def build_rag_chain(retriever): llm = ChatOpenAI( model=settings.llm_model, temperature=settings.llm_temperature, api_key=settings.openai_api_key, ) prompt = ChatPromptTemplate.from_messages([ ("system", SYSTEM_TEMPLATE), ("human", "{question}"), ]) return RetrievalQA.from_chain_type( llm=llm, retriever=retriever, chain_type_kwargs={"prompt": prompt}, return_source_documents=True, )
API Layer with FastAPI
pythonfrom fastapi import FastAPI, HTTPException from pydantic import BaseModel import time app = FastAPI(title="RAG API", version="1.0.0") class QueryRequest(BaseModel): question: str user_id: str | None = None class QueryResponse(BaseModel): answer: str sources: list[str] latency_ms: int @app.post("/query", response_model=QueryResponse) async def query_endpoint(request: QueryRequest): start = time.perf_counter() try: result = rag_chain.invoke({"query": request.question}) latency = int((time.perf_counter() - start) * 1000) return QueryResponse( answer=result["result"], sources=[doc.metadata.get("source", "") for doc in result["source_documents"]], latency_ms=latency, ) except Exception as e: raise HTTPException(status_code=500, detail=str(e))
LangSmith for Observability
pythonimport os # Enable LangSmith tracing (set in .env) os.environ["LANGCHAIN_TRACING_V2"] = "true" os.environ["LANGCHAIN_API_KEY"] = "ls_..." os.environ["LANGCHAIN_PROJECT"] = "my-rag-production" # Every chain invocation is now automatically traced # View at: smith.langchain.com
Key Production Principles
| Concern | Approach |
|---|---|
| Configuration | Pydantic Settings + env files |
| Error handling | Try/except at API boundary, not inside chains |
| Observability | LangSmith tracing or custom callbacks |
| Testing | Unit-test each component; integration-test the full chain |
| Versioning | Pin all dependency versions ( text |
| Rate limits | Async + semaphore for batch processing |
Most important lesson: Keep your chains as simple as possible. Every layer of abstraction adds debugging complexity. Only abstract when you have clear duplication.