Explain async/await in Python. Why is it important for API-heavy applications?
#gen-ai#python
Answer
Async/Await in Python
Async/await enables concurrent I/O-bound operations without threads. Instead of waiting idle for a network response, the event loop can run other coroutines — dramatically improving throughput for API-heavy applications.
The Basics
pythonimport asyncio async def fetch_completion(prompt: str) -> str: # 'await' yields control back to the event loop while waiting await asyncio.sleep(1) # Simulate network I/O return f"Response to: {prompt}" async def main(): result = await fetch_completion("What is RAG?") print(result) asyncio.run(main())
Why It Matters for LLM Applications
LLM API calls typically take 1–30 seconds. Without async, processing 100 requests serially would take 100× longer. With async, you can issue all requests concurrently and wait for all to finish together.
pythonimport asyncio from openai import AsyncOpenAI client = AsyncOpenAI() async def single_completion(prompt: str, idx: int) -> tuple[int, str]: response = await client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": prompt}] ) return idx, response.choices[0].message.content async def batch_completions(prompts: list[str]) -> list[str]: # Issue ALL requests concurrently tasks = [single_completion(p, i) for i, p in enumerate(prompts)] results = await asyncio.gather(*tasks) # Sort by original index to preserve order results.sort(key=lambda x: x[0]) return [r[1] for r in results] # 10 API calls in ~3s instead of 30s prompts = [f"Summarise topic {i}" for i in range(10)] answers = asyncio.run(batch_completions(prompts))
Concurrency with Rate Limiting
APIs have rate limits. Use a semaphore to cap concurrent requests:
pythonimport asyncio from openai import AsyncOpenAI client = AsyncOpenAI() async def rate_limited_completion(prompt: str, semaphore: asyncio.Semaphore) -> str: async with semaphore: # Max N concurrent requests response = await client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": prompt}] ) return response.choices[0].message.content async def process_batch(prompts: list[str], max_concurrent: int = 5) -> list[str]: semaphore = asyncio.Semaphore(max_concurrent) tasks = [rate_limited_completion(p, semaphore) for p in prompts] return await asyncio.gather(*tasks)
Streaming Async Responses
pythonasync def stream_response(prompt: str): async with client.chat.completions.stream( model="gpt-4o", messages=[{"role": "user", "content": prompt}] ) as stream: async for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) asyncio.run(stream_response("Explain transformers in detail"))
Sync vs Async Comparison
| Aspect | Synchronous | Async |
|---|---|---|
| 10 API calls (3s each) | ~30s | ~3s |
| Complexity | Simple | Moderate |
| Best for | Single requests, scripts | Batch processing, web servers |
| Thread safety | One GIL per thread | Single thread, event loop |
| Frameworks | Flask | FastAPI, aiohttp |
When to use async: Any time your application makes multiple API calls, database queries, or I/O operations that can overlap. FastAPI uses async natively — define your route handlers as
for automatic concurrency.textasync def