Is Python single-threaded or multi-threaded? Explain with examples. How does thread locking work in Python?

Question

Accepted Answer

## Python Threading: Single-Threaded vs Multi-Threaded

Python **supports multi-threading** via the `threading` module, but the **Global Interpreter Lock (GIL)** makes it effectively single-threaded for CPU-bound tasks. Understanding this distinction is critical for writing performant Python code, especially in Gen AI pipelines.

---

### The Global Interpreter Lock (GIL)

The GIL is a mutex in CPython that allows **only one thread to execute Python bytecode at a time**, even on multi-core CPUs.

| Aspect | Single-Threaded | Multi-Threaded (with GIL) | Multi-Processing |
|--------|----------------|--------------------------|------------------|
| **Concurrency** | None | Yes (concurrent, not parallel) | Yes (true parallelism) |
| **CPU-bound tasks** | Baseline | No speedup (GIL bottleneck) | Linear speedup |
| **I/O-bound tasks** | Blocking | Significant speedup | Speedup (but heavier) |
| **Memory** | Single space | Shared memory | Separate memory per process |
| **Overhead** | None | Low | High (process creation) |

---

### Example 1: Multi-Threading for I/O-Bound Tasks

Threading shines when threads spend time **waiting for I/O** (API calls, file reads, network requests) — the GIL is released during I/O.

```python
import threading
import time
import requests

def fetch_url(url: str, results: list, index: int) -> None:
    """Fetch a URL - I/O bound task where threading helps."""
    response = requests.get(url)
    results[index] = len(response.content)
    print(f"Thread {index}: {url} -> {len(response.content)} bytes")

urls = [
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
]

# --- Sequential (slow) ---
start = time.time()
for url in urls:
    requests.get(url)
print(f"Sequential: {time.time() - start:.2f}s")  # ~3 seconds

# --- Multi-threaded (fast) ---
start = time.time()
results = [None] * len(urls)
threads = []
for i, url in enumerate(urls):
    t = threading.Thread(target=fetch_url, args=(url, results, i))
    threads.append(t)
    t.start()

for t in threads:
    t.join()  # Wait for all threads to complete
print(f"Threaded: {time.time() - start:.2f}s")  # ~1 second
```

---

### Example 2: GIL Limitation for CPU-Bound Tasks

```python
import threading
import time

def cpu_heavy_task(n: int) -> int:
    """CPU-bound task - GIL prevents parallel execution."""
    total = 0
    for i in range(n):
        total += i * i
    return total

N = 10_000_000

# --- Sequential ---
start = time.time()
cpu_heavy_task(N)
cpu_heavy_task(N)
print(f"Sequential: {time.time() - start:.2f}s")

# --- Multi-threaded (NOT faster due to GIL) ---
start = time.time()
t1 = threading.Thread(target=cpu_heavy_task, args=(N,))
t2 = threading.Thread(target=cpu_heavy_task, args=(N,))
t1.start()
t2.start()
t1.join()
t2.join()
print(f"Threaded: {time.time() - start:.2f}s")  # Same or slower!
```

> **Key Insight:** For CPU-bound work, use `multiprocessing` instead of `threading` to bypass the GIL.

---

### Thread Locking in Python

When multiple threads access **shared data**, you need locks to prevent **race conditions** — where threads read/write data simultaneously and corrupt it.

### Example 3: Race Condition Without Lock

```python
import threading

counter = 0

def increment_without_lock(n: int) -> None:
    """Unsafe: race condition on shared counter."""
    global counter
    for _ in range(n):
        counter += 1  # NOT atomic: read -> increment -> write

threads = []
for _ in range(10):
    t = threading.Thread(target=increment_without_lock, args=(100_000,))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

print(f"Expected: 1,000,000")
print(f"Actual:   {counter:,}")  # Often less than 1,000,000!
```

### Example 4: Thread-Safe with Lock

```python
import threading

counter = 0
lock = threading.Lock()

def increment_with_lock(n: int) -> None:
    """Safe: lock protects shared counter."""
    global counter
    for _ in range(n):
        with lock:  # Acquires lock, releases automatically
            counter += 1

threads = []
for _ in range(10):
    t = threading.Thread(target=increment_with_lock, args=(100_000,))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

print(f"Expected: 1,000,000")
print(f"Actual:   {counter:,}")  # Always 1,000,000
```

---

### Types of Locks in Python

| Lock Type | Class | Use Case |
|-----------|-------|----------|
| **Basic Lock** | `threading.Lock()` | Simple mutual exclusion |
| **Reentrant Lock** | `threading.RLock()` | Same thread can acquire lock multiple times |
| **Semaphore** | `threading.Semaphore(n)` | Allow up to `n` threads simultaneously |
| **Event** | `threading.Event()` | Signal between threads (set/wait) |
| **Condition** | `threading.Condition()` | Wait for a condition with notify/wait |

### Example 5: RLock and Semaphore

```python
import threading
import time

# --- RLock: same thread can acquire multiple times ---
rlock = threading.RLock()

def recursive_task(depth: int) -> None:
    if depth <= 0:
        return
    with rlock:  # Same thread re-acquires - no deadlock
        print(f"Depth {depth}, Thread {threading.current_thread().name}")
        recursive_task(depth - 1)

recursive_task(3)

# --- Semaphore: limit concurrent access ---
semaphore = threading.Semaphore(3)  # Max 3 concurrent threads

def rate_limited_api_call(call_id: int) -> None:
    with semaphore:
        print(f"Call {call_id} started")
        time.sleep(1)  # Simulate API call
        print(f"Call {call_id} done")

threads = [threading.Thread(target=rate_limited_api_call, args=(i,)) for i in range(10)]
for t in threads:
    t.start()
for t in threads:
    t.join()
# Only 3 calls run at a time
```

---

### When to Use What in Gen AI Applications

| Scenario | Approach | Why |
|----------|----------|-----|
| **Calling multiple LLM APIs** | `threading` or `asyncio` | I/O-bound — GIL released during network wait |
| **Embedding large document batches** | `multiprocessing` | CPU-bound preprocessing |
| **Shared token counter across threads** | `threading.Lock()` | Prevent race condition on counter |
| **Rate-limiting API calls** | `threading.Semaphore(n)` | Limit concurrent requests to `n` |
| **Async RAG pipeline** | `asyncio` | Best for high-concurrency I/O patterns |

> **Best Practice:** For modern Gen AI applications, prefer `asyncio` over `threading` for I/O-bound concurrency, and `multiprocessing` (or libraries like `Ray`) for CPU-bound parallelism. Use locks only when threads must share mutable state.

**Resources:**
- [Python threading documentation](https://docs.python.org/3/library/threading.html)
- [Python GIL explained](https://realpython.com/python-gil/)
- [multiprocessing documentation](https://docs.python.org/3/library/multiprocessing.html)

Lock Type	Class	Use Case
Basic Lock	text `threading.Lock()`	Simple mutual exclusion
Reentrant Lock	text `threading.RLock()`	Same thread can acquire lock multiple times
Semaphore	text `threading.Semaphore(n)`	Allow up to text `n` threads simultaneously
Event	text `threading.Event()`	Signal between threads (set/wait)
Condition	text `threading.Condition()`	Wait for a condition with notify/wait

Scenario	Approach	Why
Calling multiple LLM APIs	text `threading` or text `asyncio`	I/O-bound — GIL released during network wait
Embedding large document batches	text `multiprocessing`	CPU-bound preprocessing
Shared token counter across threads	text `threading.Lock()`	Prevent race condition on counter
Rate-limiting API calls	text `threading.Semaphore(n)`	Limit concurrent requests to text `n`
Async RAG pipeline	text `asyncio`	Best for high-concurrency I/O patterns

Is Python single-threaded or multi-threaded? Explain with examples. How does thread locking work in Python?

Answer

Python Threading: Single-Threaded vs Multi-Threaded

The Global Interpreter Lock (GIL)

Example 1: Multi-Threading for I/O-Bound Tasks

Example 2: GIL Limitation for CPU-Bound Tasks

Thread Locking in Python

Example 3: Race Condition Without Lock

Example 4: Thread-Safe with Lock

Types of Locks in Python

Example 5: RLock and Semaphore

When to Use What in Gen AI Applications

Additional Resources

Related Concepts

Explain decorators in Python. How would you use them in an LLM application?

What are context managers? How would you use them for LLM resource management?

Explain async/await in Python. Why is it important for API-heavy applications?

What are generators in Python? How are they used in streaming LLM responses?

Explain list comprehensions vs. loops in Python. When is each appropriate?

Aspect	Single-Threaded	Multi-Threaded (with GIL)	Multi-Processing
Concurrency	None	Yes (concurrent, not parallel)	Yes (true parallelism)
CPU-bound tasks	Baseline	No speedup (GIL bottleneck)	Linear speedup
I/O-bound tasks	Blocking	Significant speedup	Speedup (but heavier)
Memory	Single space	Shared memory	Separate memory per process
Overhead	None	Low	High (process creation)