Concept #193Mediumpython-for-gen-ai

Is Python single-threaded or multi-threaded? Explain with examples. How does thread locking work in Python?

#python#threading#gil#concurrency#locks#multiprocessing

Answer

Python Threading: Single-Threaded vs Multi-Threaded

Python supports multi-threading via the

text
threading
module, but the Global Interpreter Lock (GIL) makes it effectively single-threaded for CPU-bound tasks. Understanding this distinction is critical for writing performant Python code, especially in Gen AI pipelines.


The Global Interpreter Lock (GIL)

The GIL is a mutex in CPython that allows only one thread to execute Python bytecode at a time, even on multi-core CPUs.

AspectSingle-ThreadedMulti-Threaded (with GIL)Multi-Processing
ConcurrencyNoneYes (concurrent, not parallel)Yes (true parallelism)
CPU-bound tasksBaselineNo speedup (GIL bottleneck)Linear speedup
I/O-bound tasksBlockingSignificant speedupSpeedup (but heavier)
MemorySingle spaceShared memorySeparate memory per process
OverheadNoneLowHigh (process creation)

Example 1: Multi-Threading for I/O-Bound Tasks

Threading shines when threads spend time waiting for I/O (API calls, file reads, network requests) — the GIL is released during I/O.

python
import threading
import time
import requests

def fetch_url(url: str, results: list, index: int) -> None:
    """Fetch a URL - I/O bound task where threading helps."""
    response = requests.get(url)
    results[index] = len(response.content)
    print(f"Thread {index}: {url} -> {len(response.content)} bytes")

urls = [
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
]

# --- Sequential (slow) ---
start = time.time()
for url in urls:
    requests.get(url)
print(f"Sequential: {time.time() - start:.2f}s")  # ~3 seconds

# --- Multi-threaded (fast) ---
start = time.time()
results = [None] * len(urls)
threads = []
for i, url in enumerate(urls):
    t = threading.Thread(target=fetch_url, args=(url, results, i))
    threads.append(t)
    t.start()

for t in threads:
    t.join()  # Wait for all threads to complete
print(f"Threaded: {time.time() - start:.2f}s")  # ~1 second

Example 2: GIL Limitation for CPU-Bound Tasks

python
import threading
import time

def cpu_heavy_task(n: int) -> int:
    """CPU-bound task - GIL prevents parallel execution."""
    total = 0
    for i in range(n):
        total += i * i
    return total

N = 10_000_000

# --- Sequential ---
start = time.time()
cpu_heavy_task(N)
cpu_heavy_task(N)
print(f"Sequential: {time.time() - start:.2f}s")

# --- Multi-threaded (NOT faster due to GIL) ---
start = time.time()
t1 = threading.Thread(target=cpu_heavy_task, args=(N,))
t2 = threading.Thread(target=cpu_heavy_task, args=(N,))
t1.start()
t2.start()
t1.join()
t2.join()
print(f"Threaded: {time.time() - start:.2f}s")  # Same or slower!

Key Insight: For CPU-bound work, use

text
multiprocessing
instead of
text
threading
to bypass the GIL.


Thread Locking in Python

When multiple threads access shared data, you need locks to prevent race conditions — where threads read/write data simultaneously and corrupt it.

Example 3: Race Condition Without Lock

python
import threading

counter = 0

def increment_without_lock(n: int) -> None:
    """Unsafe: race condition on shared counter."""
    global counter
    for _ in range(n):
        counter += 1  # NOT atomic: read -> increment -> write

threads = []
for _ in range(10):
    t = threading.Thread(target=increment_without_lock, args=(100_000,))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

print(f"Expected: 1,000,000")
print(f"Actual:   {counter:,}")  # Often less than 1,000,000!

Example 4: Thread-Safe with Lock

python
import threading

counter = 0
lock = threading.Lock()

def increment_with_lock(n: int) -> None:
    """Safe: lock protects shared counter."""
    global counter
    for _ in range(n):
        with lock:  # Acquires lock, releases automatically
            counter += 1

threads = []
for _ in range(10):
    t = threading.Thread(target=increment_with_lock, args=(100_000,))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

print(f"Expected: 1,000,000")
print(f"Actual:   {counter:,}")  # Always 1,000,000

Types of Locks in Python

Lock TypeClassUse Case
Basic Lock
text
threading.Lock()
Simple mutual exclusion
Reentrant Lock
text
threading.RLock()
Same thread can acquire lock multiple times
Semaphore
text
threading.Semaphore(n)
Allow up to
text
n
threads simultaneously
Event
text
threading.Event()
Signal between threads (set/wait)
Condition
text
threading.Condition()
Wait for a condition with notify/wait

Example 5: RLock and Semaphore

python
import threading
import time

# --- RLock: same thread can acquire multiple times ---
rlock = threading.RLock()

def recursive_task(depth: int) -> None:
    if depth <= 0:
        return
    with rlock:  # Same thread re-acquires - no deadlock
        print(f"Depth {depth}, Thread {threading.current_thread().name}")
        recursive_task(depth - 1)

recursive_task(3)

# --- Semaphore: limit concurrent access ---
semaphore = threading.Semaphore(3)  # Max 3 concurrent threads

def rate_limited_api_call(call_id: int) -> None:
    with semaphore:
        print(f"Call {call_id} started")
        time.sleep(1)  # Simulate API call
        print(f"Call {call_id} done")

threads = [threading.Thread(target=rate_limited_api_call, args=(i,)) for i in range(10)]
for t in threads:
    t.start()
for t in threads:
    t.join()
# Only 3 calls run at a time

When to Use What in Gen AI Applications

ScenarioApproachWhy
Calling multiple LLM APIs
text
threading
or
text
asyncio
I/O-bound — GIL released during network wait
Embedding large document batches
text
multiprocessing
CPU-bound preprocessing
Shared token counter across threads
text
threading.Lock()
Prevent race condition on counter
Rate-limiting API calls
text
threading.Semaphore(n)
Limit concurrent requests to
text
n
Async RAG pipeline
text
asyncio
Best for high-concurrency I/O patterns

Best Practice: For modern Gen AI applications, prefer

text
asyncio
over
text
threading
for I/O-bound concurrency, and
text
multiprocessing
(or libraries like
text
Ray
) for CPU-bound parallelism. Use locks only when threads must share mutable state.

Resources: