Thread-Safe Two-Sum: Concurrency Bugs Hide in Plain Sight
Two-sum is the “hello world” of coding interviews. Most engineers can write it in under two minutes. So when the interviewer says “now make it work across multiple threads,” it’s tempting to think: just add a lock, right? Slap a mutex around the whole thing and you’re done.
The non-obvious part is that this kills all your parallelism on reads, and reads are almost always the hot path. Let me walk through exactly why the naive approach breaks, why the obvious fix is too slow, and how a read-write lock gets you to something actually correct and fast.
The Problem
You have a shared list of integers and a target. Multiple threads are constantly checking whether any two numbers in the list sum to the target. Meanwhile, other threads are occasionally adding new numbers to the list. You want the reads to run as fast as possible, and you want writes to never corrupt the data.
Here’s what the naive, single-threaded version looks like:
def two_sum(numbers: list[int], target: int) -> tuple[int, int] | None:
seen: dict[int, int] = {}
for i, num in enumerate(numbers):
complement = target - num
if complement in seen:
return (seen[complement], i)
seen[num] = i
return None
Clean. But the moment you share numbers across threads without coordination, things break in ways that are hard to reproduce.
Why the Naive Approach Breaks
Picture two threads. Thread A is halfway through iterating numbers when Thread B appends three new values. In CPython, appending to a list is not atomic from the perspective of the iterator. Thread A’s iterator can now point past the end of the list it started with, or it can pick up partial state from the append. The result is either an IndexError, a missed pair, or a spurious result depending on timing.
The maddening part is that this fails maybe one time in ten thousand. Your tests pass. Your staging environment looks clean. The bug shows up in production under load, and by then you have no stack trace pointing at the race.
Here is a minimal reproduction:
import threading
import time
numbers: list[int] = list(range(100))
target = 197 # 98 + 99
def reader() -> None:
for _ in range(10_000):
# iterating while another thread mutates is undefined behavior
result = two_sum(numbers, target)
def writer() -> None:
for i in range(100, 200):
numbers.append(i)
time.sleep(0.0001)
t1 = threading.Thread(target=reader)
t2 = threading.Thread(target=writer)
t1.start()
t2.start()
t1.join()
t2.join()
Run this enough times and you’ll see it crash or produce wrong answers. The race is real.
The Obvious Fix and Why It’s Too Conservative
The first instinct is to wrap everything in a single threading.Lock:
lock = threading.Lock()
def two_sum_locked(numbers: list[int], target: int, lock: threading.Lock) -> tuple[int, int] | None:
with lock:
return two_sum(numbers, target)
def add_number_locked(numbers: list[int], value: int, lock: threading.Lock) -> None:
with lock:
numbers.append(value)
This is correct. It is also slow in any workload where reads outnumber writes. A plain Lock is exclusive: only one thread holds it at a time, regardless of whether it is reading or writing. If you have ten reader threads and one writer thread, nine readers are always blocked even though concurrent reads are perfectly safe (they do not modify the list at all).
The mutex treats a harmless read the same as a dangerous write. That is too conservative.
The Better Solution: Read-Write Lock
The insight is that reads can coexist freely. Multiple threads reading the same list at the same time is completely safe. Only writes need exclusion, and writes need to exclude both other writers and active readers.
This is exactly the read-write lock pattern: allow any number of concurrent readers, but give writers exclusive access.
Python’s threading module does not ship a native RWLock, but you can build one cleanly with a Lock and a Condition:
import threading
from typing import Optional
class ReadWriteLock:
"""
Allows concurrent reads but exclusive writes.
Writers wait for all readers to finish before acquiring.
New readers block while a writer is waiting or active.
"""
def __init__(self) -> None:
self._read_ready = threading.Condition(threading.Lock())
self._readers: int = 0
def acquire_read(self) -> None:
with self._read_ready:
self._readers += 1
def release_read(self) -> None:
with self._read_ready:
self._readers -= 1
if self._readers == 0:
self._read_ready.notify_all()
def acquire_write(self) -> None:
self._read_ready.acquire()
while self._readers > 0:
self._read_ready.wait()
def release_write(self) -> None:
self._read_ready.release()
class ThreadSafeNumberList:
def __init__(self) -> None:
self._numbers: list[int] = []
self._lock = ReadWriteLock()
def add(self, value: int) -> None:
self._lock.acquire_write()
try:
self._numbers.append(value)
finally:
self._lock.release_write()
def two_sum(self, target: int) -> Optional[tuple[int, int]]:
self._lock.acquire_read()
try:
seen: dict[int, int] = {}
for i, num in enumerate(self._numbers):
complement = target - num
if complement in seen:
return (seen[complement], i)
seen[num] = i
return None
finally:
self._lock.release_read()
A quick smoke test to verify correctness under contention:
import threading
import time
def run_demo() -> None:
store = ThreadSafeNumberList()
for i in range(50):
store.add(i)
results: list[Optional[tuple[int, int]]] = []
results_lock = threading.Lock()
def reader(target: int) -> None:
for _ in range(1_000):
result = store.two_sum(target)
with results_lock:
results.append(result)
def writer(start: int) -> None:
for i in range(start, start + 50):
store.add(i)
time.sleep(0.00005)
threads = [
threading.Thread(target=reader, args=(97,)),
threading.Thread(target=reader, args=(97,)),
threading.Thread(target=reader, args=(97,)),
threading.Thread(target=writer, args=(50,)),
]
for t in threads:
t.start()
for t in threads:
t.join()
print(f"Completed {len(results)} reads without error")
if __name__ == "__main__":
run_demo()
Key Insights and Trade-Offs
A plain Lock is correct but forces readers to serialize. If reads are your bottleneck, that serialization is pure waste. The read-write lock removes that waste by distinguishing between “operations that can coexist” and “operations that cannot.”
There are a few things to watch here. This implementation is write-preferring in the sense that a writer must wait for all current readers to drain, but new readers that arrive while the writer is waiting will not be blocked until the writer finishes. A production implementation would typically track pending writers and block new readers once a writer is queued, to prevent writer starvation. For a write-light workload like two-sum, this simplified version is fine.
Also notice that threading.RLock (the re-entrant lock in Python’s standard library) is not the same as a read-write lock. RLock lets the same thread acquire the lock multiple times without deadlocking, which solves recursive locking problems. It does not allow multiple concurrent readers. If you reach for RLock thinking it unlocks parallelism, it will not.
The real lesson from this problem is not about two-sum at all. It is about asking “what operations are actually incompatible?” before reaching for a mutex. Most shared-data bugs come from over-locking (killing parallelism) or under-locking (creating races), and the right question is always: which combinations of concurrent operations are actually unsafe?
Reads that do not mutate state are safe to overlap. Writes need exclusion. Locking at the right granularity is what separates a correct concurrent system from a fast one.
Tags:
Related Posts
Print Sequentially with Multiple Threads: A Microsoft Interview Problem
Three threads, one ordered sequence. This classic Microsoft SDE-2 problem looks like a mutual exclusion problem but it's actually an ordering problem, and that distinction changes everything.
Parallel Pascal's Triangle: When Parallelism Helps and When It Doesn't
Parallelizing Pascal's Triangle reveals a deeper truth about task granularity: throwing threads at a problem doesn't always make it faster, and knowing when it will is the real skill.
Design a Vending Machine: State Machines in Practice
A first-principles LLD walkthrough of a vending machine using the State pattern. Covers state transitions, inventory management, payment handling, and why State beats if/else chains when behavior must vary by state.