Concept #21Mediumpython-for-gen-ai

Explain async/await in Python. Why is it important for API-heavy applications?

#gen-ai#python

Answer

Async/Await in Python

Async/await enables concurrent I/O-bound operations without threads. Instead of waiting idle for a network response, the event loop can run other coroutines — dramatically improving throughput for API-heavy applications.

The Basics

python
import asyncio

async def fetch_completion(prompt: str) -> str:
    # 'await' yields control back to the event loop while waiting
    await asyncio.sleep(1)  # Simulate network I/O
    return f"Response to: {prompt}"

async def main():
    result = await fetch_completion("What is RAG?")
    print(result)

asyncio.run(main())

Why It Matters for LLM Applications

LLM API calls typically take 1–30 seconds. Without async, processing 100 requests serially would take 100× longer. With async, you can issue all requests concurrently and wait for all to finish together.

python
import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI()

async def single_completion(prompt: str, idx: int) -> tuple[int, str]:
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return idx, response.choices[0].message.content

async def batch_completions(prompts: list[str]) -> list[str]:
    # Issue ALL requests concurrently
    tasks = [single_completion(p, i) for i, p in enumerate(prompts)]
    results = await asyncio.gather(*tasks)

    # Sort by original index to preserve order
    results.sort(key=lambda x: x[0])
    return [r[1] for r in results]

# 10 API calls in ~3s instead of 30s
prompts = [f"Summarise topic {i}" for i in range(10)]
answers = asyncio.run(batch_completions(prompts))

Concurrency with Rate Limiting

APIs have rate limits. Use a semaphore to cap concurrent requests:

python
import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI()

async def rate_limited_completion(prompt: str, semaphore: asyncio.Semaphore) -> str:
    async with semaphore:  # Max N concurrent requests
        response = await client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content

async def process_batch(prompts: list[str], max_concurrent: int = 5) -> list[str]:
    semaphore = asyncio.Semaphore(max_concurrent)
    tasks = [rate_limited_completion(p, semaphore) for p in prompts]
    return await asyncio.gather(*tasks)

Streaming Async Responses

python
async def stream_response(prompt: str):
    async with client.chat.completions.stream(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    ) as stream:
        async for chunk in stream:
            if chunk.choices[0].delta.content:
                print(chunk.choices[0].delta.content, end="", flush=True)

asyncio.run(stream_response("Explain transformers in detail"))

Sync vs Async Comparison

AspectSynchronousAsync
10 API calls (3s each)~30s~3s
ComplexitySimpleModerate
Best forSingle requests, scriptsBatch processing, web servers
Thread safetyOne GIL per threadSingle thread, event loop
FrameworksFlaskFastAPI, aiohttp

When to use async: Any time your application makes multiple API calls, database queries, or I/O operations that can overlap. FastAPI uses async natively — define your route handlers as

text
async def
for automatic concurrency.