Print Sequentially with Multiple Threads: A Microsoft Interview Problem
Here’s the problem: you have three threads. Thread 1 must print 1, then Thread 2 must print 2, then Thread 3 must print 3, then Thread 1 must print 4, then Thread 2 must print 5, and so on. Each thread prints only its assigned numbers, and the output must always be in order regardless of how the OS schedules the threads.
When I first reasoned through this, my instinct was: this is a mutual exclusion problem. Just put a lock around the print statement and it’ll serialize. But a lock only ensures that two threads don’t print at the same time. It says nothing about who goes first. You’ll get one at a time, but in whatever order the OS feels like scheduling them. That’s not ordering, that’s just mutual exclusion.
The hard part here is not preventing simultaneous access to a shared resource. The hard part is enforcing a specific sequence of execution across threads that have no inherent relationship to each other. Those are fundamentally different problems.
Why the Naive Approach Produces Chaos
Without any coordination, three threads race to print their first number. The OS picks who runs in whatever order it likes based on scheduling policy, CPU load, and factors you can’t control. On most systems this is nondeterministic in practice:
import threading
counter = 0
def print_numbers(thread_id: int, total: int) -> None:
# No coordination: each thread just prints its assigned numbers
for i in range(thread_id, total + 1, 3):
print(i)
threads = [
threading.Thread(target=print_numbers, args=(i, 9))
for i in range(1, 4)
]
for t in threads:
t.start()
for t in threads:
t.join()
Run this and you might see 1, 4, 7, 2, 5, 8, 3, 6, 9 or you might see 2, 1, 3, 4, 5, 6, 7, 8, 9 or something else entirely. The output changes between runs. There is no mechanism here that enforces ordering, so you get none.
The mistake is treating this as a resource-protection problem when it is actually a handoff problem. Thread 1 needs to signal Thread 2 when it is done. Thread 2 needs to signal Thread 3. Thread 3 needs to signal Thread 1. The pattern is a chain of handoffs, not a single lock.
Solution 1: Semaphores
A semaphore is a counter with two operations: acquire (decrement, blocking if the count is zero) and release (increment, unblocking a waiter if any). The key insight is that you can use a semaphore as a one-directional signal: one thread releases it, another thread acquires it.
For ordered execution, give each thread its own semaphore. Thread 1 starts with its semaphore at 1 (it can run immediately). Threads 2 and 3 start at 0 (they block). After Thread 1 prints, it releases Thread 2’s semaphore, which unblocks Thread 2. After Thread 2 prints, it releases Thread 3’s semaphore. After Thread 3 prints, it releases Thread 1’s semaphore. The cycle continues until all numbers are printed.
from __future__ import annotations
import threading
def print_sequentially_semaphore(total: int) -> None:
"""
Three threads print numbers 1..total in order.
Each thread holds a semaphore that acts as its turn signal.
"""
sem1 = threading.Semaphore(1) # Thread 1 goes first
sem2 = threading.Semaphore(0)
sem3 = threading.Semaphore(0)
semaphores = [sem1, sem2, sem3]
def worker(thread_id: int) -> None:
own_sem = semaphores[thread_id]
next_sem = semaphores[(thread_id + 1) % 3]
current = thread_id + 1 # First number this thread prints
while current <= total:
own_sem.acquire()
if current <= total:
print(current)
next_sem.release()
current += 3
threads = [
threading.Thread(target=worker, args=(i,))
for i in range(3)
]
for t in threads:
t.start()
for t in threads:
t.join()
if __name__ == "__main__":
print_sequentially_semaphore(9)
Walk through the first few steps to verify the logic. sem1 starts at 1, sem2 and sem3 at 0. Thread 1 acquires sem1 (succeeds immediately), prints 1, and releases sem2. Now sem2 is 1, so Thread 2 unblocks, prints 2, and releases sem3. Thread 3 unblocks, prints 3, and releases sem1. Thread 1 can now print 4. The cycle enforces the ordering guarantee regardless of OS scheduling.
One thing to note: after Thread 3 prints the last number in its range, it still calls next_sem.release() to unblock Thread 1. Thread 1 acquires sem1 but then checks current <= total before printing, so it exits cleanly without printing past the boundary. This boundary check is subtle and easy to miss, but without it you risk printing extra numbers or deadlocking.
Solution 2: Condition Variables
A condition variable lets a thread wait until some condition becomes true, then wake up and re-check it. This approach uses a single shared counter to track whose turn it is, and each thread waits until the counter matches its own ID.
from __future__ import annotations
import threading
def print_sequentially_condition(total: int) -> None:
"""
Single condition variable + shared turn counter.
Each thread waits until turn == its own id, then prints and advances the turn.
"""
lock = threading.Lock()
condition = threading.Condition(lock)
turn = 1 # Whose turn it is (1, 2, or 3)
def worker(thread_id: int) -> None:
nonlocal turn
current = thread_id # First number this thread prints
while current <= total:
with condition:
while turn != thread_id:
condition.wait()
if current <= total:
print(current)
current += 3
turn = (turn % 3) + 1 # Rotate: 1->2->3->1
condition.notify_all()
threads = [
threading.Thread(target=worker, args=(i,))
for i in range(1, 4)
]
for t in threads:
t.start()
for t in threads:
t.join()
if __name__ == "__main__":
print_sequentially_condition(9)
The condition variable approach reads more like the problem statement: “wait until it’s your turn, do your work, advance the turn, wake everyone up.” The notify_all() wakes all three threads after every print. Two of them check turn != thread_id, find it isn’t their turn yet, and go back to sleep. Only the right thread proceeds. This is slightly less efficient than the semaphore approach because all three threads wake on every notification, but for three threads it makes no practical difference.
Comparing the Two Approaches
The semaphore approach creates a directed handoff: Thread 1 unblocks only Thread 2, never Thread 3. No unnecessary wakeups. The condition variable approach broadcasts to everyone and lets each thread self-select. For small N the difference is noise, but semaphores are generally the more precise tool when you know exactly which thread should run next.
Semaphores are also easier to reason about for this specific pattern. Each semaphore is a binary permit: you either have your turn or you don’t. The condition variable approach requires you to hold the invariant that turn is always accurate and that every path through the loop correctly advances it, which is slightly more to think about.
Both approaches correctly handle the ordering problem. Neither requires a database, a message queue, or any coordination outside the process. The coordination lives entirely in the synchronization primitives.
The Real Lesson
When I reason through this problem carefully, the thing that makes it tricky is the framing. “Multiple threads, shared output” sounds like mutual exclusion. Reach for a lock, done. But that framing is wrong. The constraint is not “only one thread at a time” (mutual exclusion), it is “threads in a specific order” (ordering). Those require different tools.
Semaphores are one-directional signals. Mutexes protect shared regions. Condition variables let threads wait on predicates. Knowing which tool fits which constraint is the core skill in concurrent programming. Most concurrency bugs come from applying the mutual exclusion tool to an ordering problem, or applying no synchronization at all and hoping the scheduler cooperates. It does not.
If you want to talk through the semaphore implementation or debate whether condition variables are cleaner here, reach out on Twitter or LinkedIn.
Tags:
Related Posts
Thread-Safe Two-Sum: Concurrency Bugs Hide in Plain Sight
A simple shared-list problem reveals how race conditions sneak past you, and why a read-write lock beats a plain mutex when reads dominate writes.
Parallel Pascal's Triangle: When Parallelism Helps and When It Doesn't
Parallelizing Pascal's Triangle reveals a deeper truth about task granularity: throwing threads at a problem doesn't always make it faster, and knowing when it will is the real skill.
Design a Vending Machine: State Machines in Practice
A first-principles LLD walkthrough of a vending machine using the State pattern. Covers state transitions, inventory management, payment handling, and why State beats if/else chains when behavior must vary by state.