Design WhatsApp: Messages, Delivery Status, and Group Chats
The two grey ticks turning blue is one of the most recognizable UI signals in software. Everyone who uses WhatsApp understands exactly what it means. But when I reason through how to implement it, the deceptively simple question is: where does that state live and who is responsible for changing it?
Your first instinct might be to put a status field on each Message and update it as the message progresses. Single-recipient chat is fine with that. Then you think through group messages. A message sent to a group of 10 people needs to be delivered to all 10 before you can show double ticks. Read receipts require that all 10 have opened the chat. If you store a single status on the message, you lose the per-recipient detail you need to compute the aggregate. You need a different model: one delivery record per recipient per message. That one insight reshapes most of the design.
The second interesting problem is offline delivery. A user who has not opened WhatsApp in three days still gets your messages the moment they reconnect. The server has to hold messages until the recipient is available, then push them in order, and then update the sender’s UI with the delivery confirmation. This is the store-and-forward pattern, and building it correctly requires you to think about what “delivered” actually means: delivered to the device, or delivered to the server? WhatsApp chose server-plus-device, which is why you see a single grey tick the moment the server accepts your message.
Requirements
Functional
- Send text and media messages between users in one-to-one chats
- Support group chats with multiple recipients
- Track delivery state per message: SENT (server received), DELIVERED (device received), READ (user opened it)
- Deliver messages to offline users when they reconnect
- Notify the sender when delivery or read state changes
Non-functional
- Delivery state updates must propagate in near-real-time when both parties are online
- The design must handle the group message case where different recipients reach READ at different times
- The design must preserve message ordering per chat
Core Entities
User holds identity and online status. Whether a user is currently connected determines whether messages can be pushed directly or queue for later delivery.
Message is an immutable record once created. It carries content, sender, timestamp, and a reference to the chat it belongs to. It does not carry a single status field. That job belongs to MessageDeliveryRecord.
MessageDeliveryRecord is the key entity most people miss in early designs. One record per (message, recipient) pair. It holds the individual delivery state for that recipient. Aggregating across all records for a message gives you the sender’s visible tick state.
TextMessage and MediaMessage extend Message through a class hierarchy. A TextMessage holds a string body. A MediaMessage holds a media URL and MIME type. The Factory pattern creates the right type based on what the sender submitted.
Chat represents a one-to-one conversation between two users. It owns an ordered list of messages and knows both participants.
Group represents a multi-party conversation. It owns its member list and a list of messages. When a sender sends a message to a group, the system creates one MessageDeliveryRecord for each member except the sender.
MessageQueue stores outbound messages for offline users. When a user comes online, the queue drains in order.
NotificationService pushes delivery state changes back to the original sender. It observes state changes on MessageDeliveryRecord and triggers sender notifications.
Class Design
+-------------------+ +-------------------------+
| User | | Message |
|-------------------| |-------------------------|
| user_id: str | | message_id: str |
| display_name: str | | sender: User |
| is_online: bool | | chat_id: str |
| | | timestamp: datetime |
+-------------------+ | content_type: str |
+-------------------------+
^
+-----------+-----------+
| |
+------------------+ +---------------------+
| TextMessage | | MediaMessage |
|------------------| |---------------------|
| body: str | | media_url: str |
+------------------+ | mime_type: str |
+---------------------+
+-----------------------------+
| MessageDeliveryRecord |
|-----------------------------|
| record_id: str |
| message_id: str |
| recipient: User |
| state: DeliveryState |
| delivered_at: datetime|None |
| read_at: datetime|None |
|-----------------------------|
| mark_delivered() |
| mark_read() |
+-----------------------------+
+-------------------+ +-------------------+
| Chat | | Group |
|-------------------| |-------------------|
| chat_id: str | | group_id: str |
| participants[2] | | name: str |
| messages: list | | members: list |
| | | messages: list |
+-------------------+ +-------------------+
+----------------------+ +------------------------+
| MessageQueue | | NotificationService |
|----------------------| |------------------------|
| _queue: dict | | + notify_sender(...) |
| enqueue(user, msg) | | + on_state_change(...) |
| drain(user) | +------------------------+
+----------------------+
The relationship to understand: MessageDeliveryRecord is the bridge between Message and User. A message to a group of 5 produces 5 records. The sender-visible state comes from aggregating all records for that message and picking the minimum state (you only show blue ticks when all recipients have READ).
Key Implementation
from __future__ import annotations
import uuid
from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum, auto
from typing import Callable, Optional
class DeliveryState(Enum):
SENT = auto() # server accepted the message
DELIVERED = auto() # device received it
READ = auto() # user opened the chat
class ContentType(Enum):
TEXT = "text"
MEDIA = "media"
@dataclass
class User:
user_id: str
display_name: str
is_online: bool = False
@dataclass
class Message:
message_id: str
sender: User
chat_id: str
timestamp: datetime
content_type: ContentType
@staticmethod
def new_id() -> str:
return str(uuid.uuid4())
@dataclass
class TextMessage(Message):
body: str = ""
@dataclass
class MediaMessage(Message):
media_url: str = ""
mime_type: str = ""
class MessageFactory:
@staticmethod
def create_text(sender: User, chat_id: str, body: str) -> TextMessage:
return TextMessage(
message_id=Message.new_id(),
sender=sender,
chat_id=chat_id,
timestamp=datetime.utcnow(),
content_type=ContentType.TEXT,
body=body,
)
@staticmethod
def create_media(
sender: User, chat_id: str, media_url: str, mime_type: str
) -> MediaMessage:
return MediaMessage(
message_id=Message.new_id(),
sender=sender,
chat_id=chat_id,
timestamp=datetime.utcnow(),
content_type=ContentType.MEDIA,
media_url=media_url,
mime_type=mime_type,
)
@dataclass
class MessageDeliveryRecord:
record_id: str
message: Message
recipient: User
state: DeliveryState = DeliveryState.SENT
delivered_at: Optional[datetime] = None
read_at: Optional[datetime] = None
_on_change: list[Callable[["MessageDeliveryRecord"], None]] = field(
default_factory=list, repr=False
)
def register_observer(
self, callback: Callable[["MessageDeliveryRecord"], None]
) -> None:
self._on_change.append(callback)
def mark_delivered(self) -> None:
if self.state == DeliveryState.SENT:
self.state = DeliveryState.DELIVERED
self.delivered_at = datetime.utcnow()
self._notify()
def mark_read(self) -> None:
if self.state != DeliveryState.READ:
self.state = DeliveryState.READ
self.read_at = datetime.utcnow()
self._notify()
def _notify(self) -> None:
for callback in self._on_change:
callback(self)
@staticmethod
def new(message: Message, recipient: User) -> "MessageDeliveryRecord":
return MessageDeliveryRecord(
record_id=str(uuid.uuid4()),
message=message,
recipient=recipient,
)
class NotificationService:
"""
Observes delivery record state changes and pushes
updates back to the original sender.
In a real system this fires a WebSocket push or APNS/FCM notification.
"""
def on_state_change(self, record: MessageDeliveryRecord) -> None:
sender = record.message.sender
aggregate = self._aggregate_state(record.message)
print(
f"[notify] {sender.display_name}'s message {record.message.message_id[:8]} "
f"is now {aggregate.name} for all recipients"
)
def _aggregate_state(self, message: Message) -> DeliveryState:
# Caller is responsible for passing all records for this message.
# Stubbed here; real implementation queries a repository.
return DeliveryState.DELIVERED
class MessageQueue:
"""Store-and-forward queue for offline users."""
def __init__(self) -> None:
self._queue: dict[str, list[Message]] = {}
def enqueue(self, recipient: User, message: Message) -> None:
self._queue.setdefault(recipient.user_id, []).append(message)
def drain(self, recipient: User) -> list[Message]:
return self._queue.pop(recipient.user_id, [])
@dataclass
class Chat:
chat_id: str
participants: list[User]
_messages: list[Message] = field(default_factory=list, repr=False)
def add_message(self, message: Message) -> None:
self._messages.append(message)
def messages(self) -> list[Message]:
return list(self._messages)
def other_participant(self, sender: User) -> User:
for p in self.participants:
if p.user_id != sender.user_id:
return p
raise ValueError("Sender is not a participant in this chat")
@dataclass
class Group:
group_id: str
name: str
members: list[User]
_messages: list[Message] = field(default_factory=list, repr=False)
def add_message(self, message: Message) -> None:
self._messages.append(message)
def recipients_for(self, sender: User) -> list[User]:
return [m for m in self.members if m.user_id != sender.user_id]
class MessagingService:
"""
Orchestrates sending a message: creates delivery records,
handles offline queuing, and wires up observer notifications.
"""
def __init__(
self,
queue: MessageQueue,
notifications: NotificationService,
) -> None:
self._queue = queue
self._notifications = notifications
# In a real system, records are persisted to a database.
self._records: list[MessageDeliveryRecord] = []
def send_to_chat(self, chat: Chat, message: Message) -> list[MessageDeliveryRecord]:
recipient = chat.other_participant(message.sender)
chat.add_message(message)
return self._deliver([recipient], message)
def send_to_group(
self, group: Group, message: Message
) -> list[MessageDeliveryRecord]:
recipients = group.recipients_for(message.sender)
group.add_message(message)
return self._deliver(recipients, message)
def _deliver(
self, recipients: list[User], message: Message
) -> list[MessageDeliveryRecord]:
records = []
for recipient in recipients:
record = MessageDeliveryRecord.new(message, recipient)
record.register_observer(self._notifications.on_state_change)
self._records.append(record)
records.append(record)
if recipient.is_online:
record.mark_delivered()
else:
self._queue.enqueue(recipient, message)
return records
def user_came_online(self, user: User) -> None:
"""Called when a user reconnects. Drains the queue and marks messages delivered."""
pending = self._queue.drain(user)
for message in pending:
for record in self._records:
if record.message.message_id == message.message_id:
if record.recipient.user_id == user.user_id:
record.mark_delivered()
Design Decisions and Trade-offs
State as a progression, not a flag. DeliveryState is an enum that moves in one direction: SENT to DELIVERED to READ. There is no going backwards. Modeling it as a simple string field with any-direction mutation would allow bugs like a message moving from READ back to DELIVERED. The enum combined with the guard checks in mark_delivered and mark_read make illegal transitions impossible to express.
Per-recipient records for group messages. The non-obvious insight here is that a group message has no single delivery state. It has a collection of per-recipient states. You compute the aggregate state you show the sender (one tick, two ticks, blue ticks) from that collection. For one-to-one chats, there is exactly one record per message, which simplifies to the same interface. This uniformity is a nice property: send_to_chat and send_to_group both produce MessageDeliveryRecord lists and nothing else in the system needs to know which kind of chat it was.
Observer for delivery notifications. Rather than having MessagingService directly call the sender after each state change, delivery records fire callbacks when their state changes. This decouples MessageDeliveryRecord from the notification mechanism entirely. You can add logging, analytics, or push notification adapters by registering additional observers without touching the record’s logic. The cost is that the callback wiring happens at record creation time, which requires some care to get right.
Store-and-forward at the service layer. The MessageQueue is intentionally simple: a dictionary keyed by user ID, holding an ordered list of pending messages. In a real system this would be a durable queue (think Redis lists with persistence, or a dedicated message broker). The interface is the same though: enqueue and drain. Keeping it behind an interface means you can swap the backing store without changing MessagingService.
Factory for message types. The alternative is letting callers construct TextMessage and MediaMessage directly. That works, but it scatters the ID generation and timestamp logic across every call site. The factory centralizes that so message creation is always consistent.
End-to-end encryption as an extension point. WhatsApp’s actual encryption happens at the message content layer before the message object is ever serialized. In this design, the right place to add it is in MessageFactory.create_text and create_media, where you encrypt the content before setting it on the object, and in a corresponding decrypt step when the recipient reads it. The rest of the design does not need to know encryption exists. That isolation is deliberate.
The group tick problem. When should the sender see blue ticks for a group message? The strictest definition is “when all recipients have read it.” That is what WhatsApp does. The aggregate logic I stubbed in NotificationService._aggregate_state takes the minimum state across all records: if nine of ten people have READ the message but one has only DELIVERED, the sender sees two grey ticks. The moment the last recipient hits READ, the ticks turn blue. The engineering implication is that you need an efficient query: given a message ID, what is the minimum delivery state across all its records? Indexing by message ID in the records store makes this a fast lookup.
The Offline Delivery Gap
The scenario worth reasoning through carefully: Alice sends a message to Bob at 2pm. Bob’s phone is off. The message lands in the queue. Bob turns his phone on at 6pm and reconnects. user_came_online drains the queue, marks the record delivered, and fires the observer. Alice’s UI updates from one grey tick to two. Then Bob opens the chat, mark_read fires, and Alice’s ticks turn blue.
The tricky part is ordering. If Alice sent five messages while Bob was offline, they need to drain and be marked delivered in timestamp order, not in whatever order the queue happened to store them. The _queue structure here preserves insertion order because it is a list, but you want to be explicit about sort order if you ever switch to an unordered backing store.
If you found this useful or want to argue about where delivery state should live, I’d genuinely enjoy the conversation. Reach out on Twitter or LinkedIn.
Tags:
Related Posts
Design Collaborative Document Editing: Cursors, Conflicts, and Operational Transforms
A low-level design walkthrough of collaborative document editing, explaining why naive last-write-wins breaks, how Operational Transforms work from first principles, and what makes concurrent text editing genuinely hard.
Design a Vending Machine: State Machines in Practice
A first-principles LLD walkthrough of a vending machine using the State pattern. Covers state transitions, inventory management, payment handling, and why State beats if/else chains when behavior must vary by state.
Design an ATM Machine: Transactions, State, and Security in LLD
A first-principles LLD walkthrough of an ATM covering state management, transaction types, card and account entities, cash dispensing logic, and why atomicity matters even in a machine-coding exercise.