Explain async/await in Python. Why is it important for API-heavy applications?

Question

Accepted Answer

## Async/Await in Python

**Async/await** enables concurrent I/O-bound operations without threads. Instead of waiting idle for a network response, the event loop can run other coroutines — dramatically improving throughput for API-heavy applications.

### The Basics

```python
import asyncio

async def fetch_completion(prompt: str) -> str:
    # 'await' yields control back to the event loop while waiting
    await asyncio.sleep(1)  # Simulate network I/O
    return f"Response to: {prompt}"

async def main():
    result = await fetch_completion("What is RAG?")
    print(result)

asyncio.run(main())
```

### Why It Matters for LLM Applications

LLM API calls typically take 1–30 seconds. Without async, processing 100 requests serially would take 100× longer. With async, you can issue all requests concurrently and wait for all to finish together.

```python
import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI()

async def single_completion(prompt: str, idx: int) -> tuple[int, str]:
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return idx, response.choices[0].message.content

async def batch_completions(prompts: list[str]) -> list[str]:
    # Issue ALL requests concurrently
    tasks = [single_completion(p, i) for i, p in enumerate(prompts)]
    results = await asyncio.gather(*tasks)

# Sort by original index to preserve order
    results.sort(key=lambda x: x[0])
    return [r[1] for r in results]

# 10 API calls in ~3s instead of 30s
prompts = [f"Summarise topic {i}" for i in range(10)]
answers = asyncio.run(batch_completions(prompts))
```

### Concurrency with Rate Limiting

APIs have rate limits. Use a semaphore to cap concurrent requests:

```python
import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI()

async def rate_limited_completion(prompt: str, semaphore: asyncio.Semaphore) -> str:
    async with semaphore:  # Max N concurrent requests
        response = await client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content

async def process_batch(prompts: list[str], max_concurrent: int = 5) -> list[str]:
    semaphore = asyncio.Semaphore(max_concurrent)
    tasks = [rate_limited_completion(p, semaphore) for p in prompts]
    return await asyncio.gather(*tasks)
```

### Streaming Async Responses

```python
async def stream_response(prompt: str):
    async with client.chat.completions.stream(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    ) as stream:
        async for chunk in stream:
            if chunk.choices[0].delta.content:
                print(chunk.choices[0].delta.content, end="", flush=True)

asyncio.run(stream_response("Explain transformers in detail"))
```

### Sync vs Async Comparison

| Aspect | Synchronous | Async |
|--------|-------------|-------|
| **10 API calls (3s each)** | ~30s | ~3s |
| **Complexity** | Simple | Moderate |
| **Best for** | Single requests, scripts | Batch processing, web servers |
| **Thread safety** | One GIL per thread | Single thread, event loop |
| **Frameworks** | Flask | FastAPI, aiohttp |

> **When to use async:** Any time your application makes multiple API calls, database queries, or I/O operations that can overlap. FastAPI uses async natively — define your route handlers as `async def` for automatic concurrency.

Explain async/await in Python. Why is it important for API-heavy applications?

Answer

Async/Await in Python

The Basics

Why It Matters for LLM Applications

Concurrency with Rate Limiting

Streaming Async Responses

Sync vs Async Comparison

Related Concepts

Explain decorators in Python. How would you use them in an LLM application?

What are context managers? How would you use them for LLM resource management?

What are generators in Python? How are they used in streaming LLM responses?

Explain list comprehensions vs. loops in Python. When is each appropriate?

What's the difference between == and is in Python?

Aspect	Synchronous	Async
10 API calls (3s each)	~30s	~3s
Complexity	Simple	Moderate
Best for	Single requests, scripts	Batch processing, web servers
Thread safety	One GIL per thread	Single thread, event loop
Frameworks	Flask	FastAPI, aiohttp