Design a Load Balancer: Algorithms, Health Checks, and Session Persistence

A load balancer sounds like one of those problems where the answer is obvious. You have a pool of servers and a stream of incoming requests, so you spread the requests across the servers. Done. And then you start asking follow-up questions and it unravels quickly.

What happens if a server crashes in between health checks? What if one server is twice as powerful as the others? What about a user mid-checkout whose session lives in one server’s memory? What happens to their cart if their next request lands on a different machine? Each of these questions introduces a constraint that pulls against the others. The interesting design work is not in writing the routing logic. It’s in deciding what guarantees the system actually needs to provide and then picking the simplest mechanism that delivers them.

When I reason through this problem from scratch, the first instinct is to treat load balancing as a single algorithm choice. But that conflates two separate concerns: how you pick the next server (the balancing strategy) and what invariants you must maintain across picks (health state, connection tracking, session affinity). Getting the object boundaries right here matters a lot, because the strategy can and should change at runtime without any other part of the system noticing.

Requirements

Functional

Route incoming requests to one of a pool of healthy backend servers
Support multiple balancing algorithms: round-robin, weighted round-robin, least connections, and IP-hash
Periodically health-check each server and remove unhealthy ones from rotation
Support sticky sessions so a client always lands on the same server when needed

Non-functional

The routing decision must be fast enough to add negligible latency (sub-millisecond)
The system should be thread-safe since multiple requests arrive concurrently
Swapping the balancing algorithm should not require restarting or rewiring other components

Core Entities

Server is the fundamental unit. It carries not just the address but also runtime state: is it healthy, how many active connections does it hold, and what weight did the operator assign it. Weight matters for weighted round-robin and is meaningless for least-connections, but keeping it on Server avoids awkward special-casing in strategies.

BalancingStrategy is an abstract base class with a single method: given a list of healthy servers, pick one. Every algorithm implements this interface. The caller never knows which algorithm is running.

HealthChecker runs in the background, probing each server and flipping its is_healthy flag. It holds a reference to the server pool and mutates server state directly. Mutation here is intentional and expected.

LoadBalancer is the orchestrator. It owns the server pool, holds a reference to the current strategy, and exposes the single get_server() method that the rest of the system calls.

ConnectionPool tracks active connections per server. Least-connections needs this count to make routing decisions. Rather than embedding connection tracking inside the strategy (where it does not belong), it lives as a separate concern that strategies can query.

Class Design

+------------------+         +----------------------+
|  LoadBalancer    |-------->|  BalancingStrategy   |
|------------------|         |----------------------|
| - servers        |         | + select(servers)    |
| - strategy       |    +----+----------------------+
| - health_checker |    |         |            |
| + get_server()   |    |         |            |
+------------------+    |    RoundRobin  WeightedRoundRobin
                         |         |
                         |  LeastConnections   IPHash
                         |
+------------------+    |    +------------------+
|  Server          |<---+    |  HealthChecker   |
|------------------|         |------------------|
| - address        |         | - interval_sec   |
| - weight         |         | - servers        |
| - is_healthy     |         | + start()        |
| - active_conns   |         | - _probe(server) |
+------------------+         +------------------+

The key relationship to notice: BalancingStrategy takes a plain list of Server objects. It does not call back into LoadBalancer. This keeps strategies stateless and trivially testable in isolation. The HealthChecker writes to Server.is_healthy but never calls into the strategy. The Server data object decouples these three subsystems, not direct method calls on each other.

Implementation

from __future__ import annotations

import hashlib
import threading
import time
import urllib.request
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from typing import Optional


@dataclass
class Server:
    address: str
    weight: int = 1
    is_healthy: bool = True
    active_connections: int = 0

    def increment_connections(self) -> None:
        self.active_connections += 1

    def decrement_connections(self) -> None:
        self.active_connections = max(0, self.active_connections - 1)


class BalancingStrategy(ABC):
    @abstractmethod
    def select(self, servers: list[Server]) -> Optional[Server]:
        pass


class RoundRobin(BalancingStrategy):
    def __init__(self) -> None:
        self._index = 0
        self._lock = threading.Lock()

    def select(self, servers: list[Server]) -> Optional[Server]:
        if not servers:
            return None
        with self._lock:
            server = servers[self._index % len(servers)]
            self._index += 1
        return server


class WeightedRoundRobin(BalancingStrategy):
    """
    Expands the server list by weight so a server with weight=3
    appears three times in the rotation. Simple and correct for
    static weights; replace with smooth weighted RR for dynamic ones.
    """

    def __init__(self) -> None:
        self._index = 0
        self._lock = threading.Lock()

    def select(self, servers: list[Server]) -> Optional[Server]:
        if not servers:
            return None
        weighted: list[Server] = []
        for s in servers:
            weighted.extend([s] * s.weight)
        with self._lock:
            server = weighted[self._index % len(weighted)]
            self._index += 1
        return server


class LeastConnections(BalancingStrategy):
    def select(self, servers: list[Server]) -> Optional[Server]:
        if not servers:
            return None
        return min(servers, key=lambda s: s.active_connections)


class IPHash(BalancingStrategy):
    """
    Deterministically maps a client IP to a server index.
    The same IP always lands on the same server as long as
    the pool size does not change. This is the simplest form
    of session affinity that requires no server-side state.
    """

    def __init__(self, client_ip: str) -> None:
        self._client_ip = client_ip

    def select(self, servers: list[Server]) -> Optional[Server]:
        if not servers:
            return None
        digest = int(hashlib.md5(self._client_ip.encode()).hexdigest(), 16)
        return servers[digest % len(servers)]


class HealthChecker:
    def __init__(self, servers: list[Server], interval_sec: float = 10.0) -> None:
        self._servers = servers
        self._interval = interval_sec
        self._thread = threading.Thread(target=self._run, daemon=True)

    def start(self) -> None:
        self._thread.start()

    def _run(self) -> None:
        while True:
            for server in self._servers:
                self._probe(server)
            time.sleep(self._interval)

    def _probe(self, server: Server) -> None:
        try:
            urllib.request.urlopen(
                f"http://{server.address}/health", timeout=2
            )
            server.is_healthy = True
        except Exception:
            server.is_healthy = False


class LoadBalancer:
    def __init__(
        self,
        servers: list[Server],
        strategy: BalancingStrategy,
        health_check_interval: float = 10.0,
    ) -> None:
        self._servers = servers
        self._strategy = strategy
        self._health_checker = HealthChecker(servers, health_check_interval)
        self._health_checker.start()

    def set_strategy(self, strategy: BalancingStrategy) -> None:
        self._strategy = strategy

    def get_server(self) -> Optional[Server]:
        healthy = [s for s in self._servers if s.is_healthy]
        return self._strategy.select(healthy)

    def release_server(self, server: Server) -> None:
        server.decrement_connections()

Design Decisions and Trade-offs

Strategy pattern over inheritance. The alternative to Strategy here is a single LoadBalancer class with a mode flag and a big if/elif block. That gets messy fast and violates the open-closed principle. With Strategy, adding a new algorithm means adding a new class and nothing else. Switching at runtime (say, to shift from round-robin to least-connections during a traffic spike) becomes a single set_strategy() call. The cost is one extra layer of indirection, which is worth it.

Sticky sessions vs. stateless balancing. This is the sharpest trade-off in the whole design. IP-hash gives you sticky sessions cheaply: no external state, no coordination, and no session storage needed. The downside is that it breaks down in two ways. First, if a server goes down, all sessions pinned to it are disrupted anyway. Second, a client behind a corporate NAT or a mobile network that changes IP addresses will bounce between servers. The more correct approach for real sticky sessions is a shared session store (Redis is the canonical choice) plus a session cookie. The load balancer reads the cookie, strips it, routes accordingly, and the server stores session data in Redis rather than local memory. This decouples session affinity from server identity entirely, which means any server can handle any request without the user noticing. IP-hash is a reasonable approximation when you cannot afford the shared store, but the result is an approximation.

Health check granularity. The HealthChecker here probes a /health endpoint. That is better than a raw TCP connection check (which only tells you the process is alive, not that it can serve requests) but less precise than an application-level check that queries the database or checks queue depth. The right granularity depends on what “healthy” actually means for your backend. A server that responds to /health but is stuck in a deadlock waiting for a database lock is not truly healthy. Deep health checks trade accuracy for coupling.

Consistent hashing as an extension. IP-hash as shown above is vulnerable to pool size changes. Add one server, and a large fraction of clients reroute. Consistent hashing on a ring solves this: adding a server shifts only a proportional fraction of traffic (1/n where n is the new pool size). This matters most in cache-adjacent scenarios where you want to minimize cache misses after a topology change. The Strategy interface is already the right place to add this, and the select() signature is compatible with a consistent hashing implementation.

Thread safety in RoundRobin. The index increment is a read-modify-write, so it needs a lock. An alternative is to use itertools.cycle wrapped in a threading.local, but that produces independent cycles per thread rather than a single global cycle, which would over-serve the first server in the list from whichever thread runs most frequently. The threading.Lock around the index is the correct approach.

The Honest Failure Mode

The part that trips people up in interviews is not the routing logic. It is the window between a health check interval and the actual failure. If a server crashes one second after a health check passes, it will receive requests for the next nine seconds (assuming a ten-second interval). During that window, those requests will fail at the network layer. The options are: shorten the interval (more probe traffic), use circuit breakers that trip immediately on connection failures, or do both. Most production load balancers do both. The HealthChecker here is the periodic probe. Adding a circuit breaker means the get_server() path also tracks consecutive failures per server and short-circuits to marking it unhealthy without waiting for the next probe cycle.

That extension is worth keeping in mind even if you do not implement it in an interview. Naming it explicitly shows you are thinking about failure modes rather than just happy paths.

If any of this sparked a question or you would approach the session affinity problem differently, I’d genuinely like to hear it. Reach out on Twitter or LinkedIn.