Explain me detailly about the Folder Structure need to be followed in Python along with the flow diagram?
#python#folder-structure#project-structure#architecture#best-practices#packaging#gen-ai
Answer
Python Folder Structure Best Practices
A well-organised folder structure makes your code maintainable, testable, and scalable. The right structure depends on the project type ā script, library, web app, or Gen AI application.
1. Basic Python Project Structure
The standard structure recommended by the Python Packaging Authority (PyPA):
textmy_project/ āāā src/ ā āāā my_package/ ā āāā __init__.py # makes it a Python package ā āāā main.py # entry point ā āāā config.py # configuration & settings ā āāā utils.py # shared utility functions āāā tests/ ā āāā __init__.py ā āāā test_main.py ā āāā test_utils.py āāā docs/ # documentation āāā pyproject.toml # project metadata & dependencies āāā README.md āāā .env # environment variables (never commit) āāā .gitignore
python# pyproject.toml [project] name = "my_package" version = "1.0.0" requires-python = ">=3.11" dependencies = [ "openai>=1.0", "langchain>=0.2", ] [project.optional-dependencies] dev = ["pytest", "ruff", "mypy"]
2. Gen AI Application Structure
The recommended structure for a production Gen AI app (RAG, agents, LLM APIs):
textgen_ai_app/ āāā src/ ā āāā app/ ā āāā __init__.py ā āāā api/ # API layer (FastAPI routes) ā ā āāā __init__.py ā ā āāā routes.py # endpoint definitions ā ā āāā schemas.py # Pydantic request/response models ā āāā core/ # app-wide config & infrastructure ā ā āāā __init__.py ā ā āāā config.py # settings (env vars, constants) ā ā āāā logging.py # logging setup ā āāā models/ # AI model wrappers ā ā āāā __init__.py ā ā āāā llm.py # LLM client (OpenAI, Anthropic, etc.) ā ā āāā embedder.py # embedding model client ā āāā pipelines/ # end-to-end AI workflows ā ā āāā __init__.py ā ā āāā rag.py # RAG pipeline ā ā āāā ingestion.py # document ingestion pipeline ā āāā services/ # business logic layer ā ā āāā __init__.py ā ā āāā vector_store.py # vector DB operations ā ā āāā retriever.py # retrieval logic ā āāā utils/ # shared helpers ā āāā __init__.py ā āāā text.py # text processing utilities ā āāā validators.py # input validation āāā tests/ ā āāā unit/ ā ā āāā test_pipelines.py ā ā āāā test_services.py ā āāā integration/ ā āāā test_api.py āāā data/ ā āāā raw/ # original source documents ā āāā processed/ # cleaned, chunked documents ā āāā embeddings/ # cached embeddings āāā notebooks/ # Jupyter notebooks for exploration ā āāā 01_data_exploration.ipynb ā āāā 02_model_evaluation.ipynb āāā configs/ ā āāā dev.yaml # development config ā āāā prod.yaml # production config āāā scripts/ ā āāā ingest_docs.py # one-off ingestion script ā āāā evaluate_pipeline.py # evaluation script āāā docker/ ā āāā Dockerfile ā āāā docker-compose.yml āāā pyproject.toml āāā .env.example # template (commit this) āāā .env # actual secrets (never commit) āāā README.md āāā .gitignore
3. Key Files Explained
textcore/config.py
ā Centralised Settings
text
core/config.pypythonfrom pydantic_settings import BaseSettings from functools import lru_cache class Settings(BaseSettings): # LLM openai_api_key: str openai_model: str = "gpt-4o" temperature: float = 0.7 max_tokens: int = 2048 # Vector DB chroma_persist_dir: str = "./data/embeddings" embedding_model: str = "text-embedding-3-small" # App app_name: str = "GenAI App" debug: bool = False log_level: str = "INFO" class Config: env_file = ".env" @lru_cache def get_settings() -> Settings: return Settings() # Usage anywhere in the app settings = get_settings() print(settings.openai_model) # gpt-4o
textmodels/llm.py
ā LLM Wrapper
text
models/llm.pypythonfrom openai import OpenAI from app.core.config import get_settings settings = get_settings() class LLMClient: def __init__(self): self._client = OpenAI(api_key=settings.openai_api_key) self._model = settings.openai_model def generate(self, prompt: str, system: str = "You are a helpful assistant.") -> str: response = self._client.chat.completions.create( model=self._model, messages=[ {"role": "system", "content": system}, {"role": "user", "content": prompt}, ], temperature=settings.temperature, max_tokens=settings.max_tokens, ) return response.choices[0].message.content
textpipelines/rag.py
ā RAG Pipeline
text
pipelines/rag.pypythonfrom app.models.llm import LLMClient from app.models.embedder import EmbedderClient from app.services.retriever import Retriever class RAGPipeline: def __init__(self): self.llm = LLMClient() self.embedder = EmbedderClient() self.retriever = Retriever() def run(self, query: str) -> str: # 1. Embed the query query_vector = self.embedder.embed(query) # 2. Retrieve relevant docs docs = self.retriever.search(query_vector, k=5) # 3. Build augmented prompt context = "\n\n".join(docs) prompt = f"Answer using this context:\n{context}\n\nQuestion: {query}" # 4. Generate response return self.llm.generate(prompt)
textapi/routes.py
ā FastAPI Routes
text
api/routes.pypythonfrom fastapi import APIRouter, Depends from app.api.schemas import QueryRequest, QueryResponse from app.pipelines.rag import RAGPipeline router = APIRouter(prefix="/api/v1") def get_pipeline() -> RAGPipeline: return RAGPipeline() @router.post("/query", response_model=QueryResponse) async def query(request: QueryRequest, pipeline: RAGPipeline = Depends(get_pipeline)): answer = pipeline.run(request.question) return QueryResponse(answer=answer)
4. Request Flow Diagram
5. LangChain / Agent Project Structure
For more complex multi-agent Gen AI systems:
textagent_project/ āāā src/ ā āāā app/ ā āāā agents/ ā ā āāā researcher.py # research agent ā ā āāā writer.py # content writer agent ā ā āāā reviewer.py # review/critique agent ā āāā tools/ ā ā āāā web_search.py # Tavily / SerpAPI tool ā ā āāā code_executor.py # Python REPL tool ā ā āāā file_reader.py # document reading tool ā āāā memory/ ā ā āāā conversation.py # conversation buffer memory ā ā āāā vector_memory.py # long-term vector memory ā āāā chains/ ā āāā summarise.py # summarisation chain ā āāā qa_chain.py # Q&A chain
6. text.gitignore
for Python Gen AI Projects
text
.gitignoretext# Python __pycache__/ *.py[cod] *.egg-info/ dist/ build/ .venv/ venv/ # Environment & secrets .env *.key *.pem # Data & models data/raw/ data/processed/ *.pkl *.bin *.safetensors chroma_db/ # Notebooks .ipynb_checkpoints/ # IDE .vscode/ .idea/ # OS .DS_Store Thumbs.db
7. Folder Structure Rules
| Rule | Why |
|---|---|
| One text | Makes directory a Python package |
| Keep text text | Single source of truth for all settings |
| Separate text text | Pipelines orchestrate; services execute |
| Put secrets in text | Security ā never commit API keys |
| Use text | Prevents accidental imports of non-installed package |
| Separate text text | Different speed and setup requirements |
| Keep text text | Exploration only, not production code |
| Use text | Keeps text |
Quick Reference
Pro tip: Start with the Gen AI app structure even for small projects. Adding layers later is painful ā removing them is easy. A good folder structure is the first step to a production-ready Gen AI system.
Learn more at Python Packaging Guide and Cookiecutter Data Science.