Design Collaborative Document Editing: Cursors, Conflicts, and Operational Transforms

The gap between a single-user text editor and a collaborative one looks small from the outside. You have a document, multiple people are editing it at the same time, and you show everyone’s cursor. But when I reason through what it actually takes to make that work correctly, the gap is enormous. The naive approach fails in a way that is not immediately obvious until you trace through a concrete example.

Imagine Alice and Bob both have the same document open: “Hello”. Alice inserts “World ” at position 0, making it “World Hello”. At the same moment, Bob deletes the ‘H’ at position 0, intending to make it “ello”. Both operations are submitted to the server concurrently. If the server applies Alice’s insert first, the document becomes “World Hello”. Now Bob’s delete of position 0 runs and removes ‘W’, giving “orld Hello”. That is clearly wrong. Bob never intended to delete ‘W’. He intended to delete ‘H’, which Alice’s insert shifted to position 6. The server applied Bob’s operation with the wrong intent.

This is the fundamental problem that Operational Transforms (OT) solve. The core insight is: before applying a concurrent operation, you must transform it to account for the operations that were applied before it. Bob’s delete(position=0) must be transformed against Alice’s insert(position=0, length=6) to produce delete(position=6), which correctly targets the ‘H’ Bob intended to remove. The transformation is a function that takes two operations and returns a new operation with adjusted parameters.

What makes this hard is that OT must be associative: if you have three concurrent operations from three users, transforming them pairwise in any order must produce the same final document. This property, called convergence, is straightforward to state and surprisingly difficult to guarantee for all cases, especially when you add rich formatting, nested structures, or undo/redo into the mix.

Requirements

Functional

  • Multiple users can edit the same document concurrently
  • All clients converge to the same final document state regardless of network order
  • Each user sees a cursor indicator for every other active user
  • The document maintains a full history of operations for undo/redo
  • Operations are immutable once created

Non-functional

  • Clients should see their own edits applied instantly (optimistic local application)
  • The transformation logic must guarantee convergence for insert and delete operations
  • The design should be extensible to format operations without restructuring the core

Core Entities

Operation is the fundamental building block. Every change to a document is an operation: insert a string at a position, delete N characters from a position, or format a range. Operations are immutable. Once created, they do not change. Their parameters may be transformed to produce new operations, but the originals are preserved.

InsertOperation and DeleteOperation are concrete operation types. InsertOperation holds the position and the text to insert. DeleteOperation holds the position and the length. FormatOperation would hold a range and a set of style attributes, but I’ll keep the implementation focused on insert and delete since those are where the transformation logic lives.

OperationalTransform is the engine. It exposes a single static method: given two concurrent operations A and B (where A was applied first), transform B to produce B-prime, a version of B that produces the correct result when applied to a document that already has A applied.

Document holds the current text content and the ordered history of all applied operations. It applies operations and delegates transformation to OperationalTransform.

DocumentHistory is the append-only log of operations. It gives you undo/redo by replaying operations in reverse.

CollaborationSession manages the set of active users, their cursors, and the coordination between clients and the server. When a client submits an operation, the session applies pending server operations first (transforming the client’s operation against them), then broadcasts the result.

Cursor tracks each user’s current position in the document. When operations are applied, all cursors must be adjusted to remain valid.

Class Design

+-------------------------+
|        Operation        |  <-- abstract, immutable
|-------------------------|
| op_id: str              |
| author_id: str          |
| revision: int           |
| timestamp: datetime     |
|-------------------------|
| apply(doc: str) -> str  |
+-------------------------+
          ^         ^
          |         |
+---------+--+  +---+---------+
| InsertOp   |  |  DeleteOp   |
|------------|  |-------------|
| position   |  | position    |
| text: str  |  | length: int |
+------------+  +-------------+

+----------------------------+
|   OperationalTransform     |
|----------------------------|
| + transform(a, b) -> Op    |  <-- b' = transform(a, b)
| + _xform_ii(a, b)          |  insert vs insert
| + _xform_id(a, b)          |  insert vs delete
| + _xform_di(a, b)          |  delete vs insert
| + _xform_dd(a, b)          |  delete vs delete
+----------------------------+

+----------------------------+     +------------------+
|         Document           |     | DocumentHistory  |
|----------------------------|     |------------------|
| doc_id: str                |     | operations: list |
| content: str               |     |                  |
| revision: int              |     | undo() -> Op     |
| history: DocumentHistory   |     | redo() -> Op     |
|----------------------------|     +------------------+
| apply(op: Operation)       |
| apply_remote(op, pending)  |
+----------------------------+

+----------------------------+
|    CollaborationSession    |
|----------------------------|
| session_id: str            |
| document: Document         |
| cursors: dict[str, Cursor] |
|----------------------------|
| join(user)                 |
| submit_operation(op)       |
| _broadcast(op)             |
+----------------------------+

+------------------+
|      Cursor      |
|------------------|
| user_id: str     |
| position: int    |
|------------------|
| adjust(op)       |
+------------------+

Key Implementation

from __future__ import annotations

import uuid
from abc import ABC, abstractmethod
from dataclasses import dataclass
from datetime import datetime
from typing import Optional


# ---------------------------------------------------------------------------
# Operations (immutable)
# ---------------------------------------------------------------------------

@dataclass(frozen=True)
class Operation(ABC):
    op_id: str
    author_id: str
    revision: int
    timestamp: datetime

    @abstractmethod
    def apply(self, content: str) -> str: ...

    @staticmethod
    def new_id() -> str:
        return str(uuid.uuid4())


@dataclass(frozen=True)
class InsertOperation(Operation):
    position: int
    text: str

    def apply(self, content: str) -> str:
        pos = min(self.position, len(content))
        return content[:pos] + self.text + content[pos:]


@dataclass(frozen=True)
class DeleteOperation(Operation):
    position: int
    length: int

    def apply(self, content: str) -> str:
        pos = min(self.position, len(content))
        end = min(pos + self.length, len(content))
        return content[:pos] + content[end:]


# ---------------------------------------------------------------------------
# Operational Transform engine
# ---------------------------------------------------------------------------

class OperationalTransform:
    """
    transform(a, b) returns b', a version of b that produces
    the correct result when applied after a.

    Notation: a was applied first; b needs to be adjusted.
    """

    @staticmethod
    def transform(applied: Operation, incoming: Operation) -> Operation:
        if isinstance(applied, InsertOperation) and isinstance(incoming, InsertOperation):
            return OperationalTransform._xform_ii(applied, incoming)
        if isinstance(applied, InsertOperation) and isinstance(incoming, DeleteOperation):
            return OperationalTransform._xform_id(applied, incoming)
        if isinstance(applied, DeleteOperation) and isinstance(incoming, InsertOperation):
            return OperationalTransform._xform_di(applied, incoming)
        if isinstance(applied, DeleteOperation) and isinstance(incoming, DeleteOperation):
            return OperationalTransform._xform_dd(applied, incoming)
        raise TypeError(f"Cannot transform {type(applied)} against {type(incoming)}")

    @staticmethod
    def _xform_ii(
        applied: InsertOperation, incoming: InsertOperation
    ) -> InsertOperation:
        """
        Two concurrent inserts.
        If the applied insert is at or before the incoming position,
        the incoming position shifts right by the applied text length.
        Tie-break: lower author_id wins the earlier position.
        """
        if applied.position < incoming.position:
            new_pos = incoming.position + len(applied.text)
        elif applied.position == incoming.position:
            # Deterministic tie-break so all nodes agree on order
            if applied.author_id <= incoming.author_id:
                new_pos = incoming.position + len(applied.text)
            else:
                new_pos = incoming.position
        else:
            new_pos = incoming.position

        return InsertOperation(
            op_id=incoming.op_id,
            author_id=incoming.author_id,
            revision=incoming.revision,
            timestamp=incoming.timestamp,
            position=new_pos,
            text=incoming.text,
        )

    @staticmethod
    def _xform_id(
        applied: InsertOperation, incoming: DeleteOperation
    ) -> DeleteOperation:
        """
        Applied insert, incoming delete.
        If the insert landed before the delete start, shift the delete right.
        If the insert landed inside the delete range, expand the delete length.
        """
        insert_pos = applied.position
        insert_len = len(applied.text)

        if insert_pos <= incoming.position:
            # Insert is entirely before delete: shift delete right
            new_pos = incoming.position + insert_len
            new_len = incoming.length
        elif insert_pos < incoming.position + incoming.length:
            # Insert landed inside the deleted range: the delete must cover more
            new_pos = incoming.position
            new_len = incoming.length + insert_len
        else:
            # Insert is after delete: no change needed
            new_pos = incoming.position
            new_len = incoming.length

        return DeleteOperation(
            op_id=incoming.op_id,
            author_id=incoming.author_id,
            revision=incoming.revision,
            timestamp=incoming.timestamp,
            position=new_pos,
            length=new_len,
        )

    @staticmethod
    def _xform_di(
        applied: DeleteOperation, incoming: InsertOperation
    ) -> InsertOperation:
        """
        Applied delete, incoming insert.
        If the delete removed characters before the insert position, shift left.
        If the insert is inside the deleted range, clamp it to the delete start.
        """
        del_end = applied.position + applied.length

        if incoming.position <= applied.position:
            new_pos = incoming.position
        elif incoming.position >= del_end:
            new_pos = incoming.position - applied.length
        else:
            # Insert was inside deleted range: move to where deletion started
            new_pos = applied.position

        return InsertOperation(
            op_id=incoming.op_id,
            author_id=incoming.author_id,
            revision=incoming.revision,
            timestamp=incoming.timestamp,
            position=new_pos,
            text=incoming.text,
        )

    @staticmethod
    def _xform_dd(
        applied: DeleteOperation, incoming: DeleteOperation
    ) -> DeleteOperation:
        """
        Two concurrent deletes targeting potentially overlapping ranges.
        Shrink or eliminate the incoming delete to avoid double-deleting.
        """
        a_start = applied.position
        a_end = applied.position + applied.length
        b_start = incoming.position
        b_end = incoming.position + incoming.length

        if b_end <= a_start:
            # Incoming delete is entirely before applied: no change
            return incoming

        if b_start >= a_end:
            # Incoming delete is entirely after applied: shift left
            return DeleteOperation(
                op_id=incoming.op_id,
                author_id=incoming.author_id,
                revision=incoming.revision,
                timestamp=incoming.timestamp,
                position=incoming.position - applied.length,
                length=incoming.length,
            )

        # Ranges overlap: compute the surviving delete range
        new_start = min(b_start, a_start)
        # Characters in incoming range that were NOT already deleted by applied
        surviving_before = max(0, a_start - b_start)
        surviving_after = max(0, b_end - a_end)
        new_len = surviving_before + surviving_after

        return DeleteOperation(
            op_id=incoming.op_id,
            author_id=incoming.author_id,
            revision=incoming.revision,
            timestamp=incoming.timestamp,
            position=new_start,
            length=new_len,
        )


# ---------------------------------------------------------------------------
# Document and history
# ---------------------------------------------------------------------------

class DocumentHistory:
    def __init__(self) -> None:
        self._applied: list[Operation] = []
        self._undone: list[Operation] = []

    def record(self, op: Operation) -> None:
        self._applied.append(op)
        self._undone.clear()

    def undo(self) -> Optional[Operation]:
        if not self._applied:
            return None
        op = self._applied.pop()
        self._undone.append(op)
        return op

    def redo(self) -> Optional[Operation]:
        if not self._undone:
            return None
        op = self._undone.pop()
        self._applied.append(op)
        return op

    def ops_since(self, revision: int) -> list[Operation]:
        return [op for op in self._applied if op.revision > revision]


class Document:
    def __init__(self, doc_id: str, initial_content: str = "") -> None:
        self.doc_id = doc_id
        self._content = initial_content
        self._revision = 0
        self.history = DocumentHistory()

    @property
    def content(self) -> str:
        return self._content

    @property
    def revision(self) -> int:
        return self._revision

    def apply_local(self, op: Operation) -> None:
        """Apply an operation originating from this client optimistically."""
        self._content = op.apply(self._content)
        self._revision += 1
        self.history.record(op)

    def apply_remote(
        self, incoming: Operation, pending_local: list[Operation]
    ) -> Operation:
        """
        Apply a remote operation, transforming it against any pending
        local operations that the server has not yet seen.

        Returns the transformed operation (useful for cursor adjustments).
        """
        transformed = incoming
        for local_op in pending_local:
            transformed = OperationalTransform.transform(local_op, transformed)
        self._content = transformed.apply(self._content)
        self._revision += 1
        self.history.record(transformed)
        return transformed


# ---------------------------------------------------------------------------
# Cursor
# ---------------------------------------------------------------------------

@dataclass
class Cursor:
    user_id: str
    position: int

    def adjust(self, op: Operation) -> None:
        """Update cursor position to remain valid after an operation is applied."""
        if isinstance(op, InsertOperation):
            if op.position <= self.position:
                self.position += len(op.text)
        elif isinstance(op, DeleteOperation):
            if op.position + op.length <= self.position:
                self.position -= op.length
            elif op.position <= self.position:
                self.position = op.position


# ---------------------------------------------------------------------------
# Collaboration session
# ---------------------------------------------------------------------------

class CollaborationSession:
    def __init__(self, document: Document) -> None:
        self.session_id = str(uuid.uuid4())
        self._document = document
        self._cursors: dict[str, Cursor] = {}
        self._pending_local: list[Operation] = []
        self._server_revision = document.revision

    def join(self, user_id: str) -> None:
        self._cursors[user_id] = Cursor(user_id=user_id, position=0)

    def submit_local(self, op: Operation) -> None:
        """
        Optimistically apply a local operation and queue it for server acknowledgement.
        """
        self._document.apply_local(op)
        self._pending_local.append(op)
        self._adjust_cursors(op, exclude_user=op.author_id)

    def receive_remote(self, op: Operation) -> None:
        """
        Apply a confirmed server operation from another user.
        Transform it against any pending local operations first.
        """
        transformed = self._document.apply_remote(op, self._pending_local)
        self._server_revision = transformed.revision
        self._adjust_cursors(transformed, exclude_user=None)

    def acknowledge_local(self, op_id: str) -> None:
        """Server confirmed a local operation. Remove it from pending."""
        self._pending_local = [
            o for o in self._pending_local if o.op_id != op_id
        ]

    def _adjust_cursors(
        self, op: Operation, exclude_user: Optional[str]
    ) -> None:
        for user_id, cursor in self._cursors.items():
            if user_id != exclude_user:
                cursor.adjust(op)

Design Decisions and Trade-offs

Immutable operations make everything easier. The decision to use frozen=True on Operation dataclasses is deliberate. Transforming an operation produces a new operation, it never mutates the original. This means you can always look at the original operation to understand the author’s intent. The history log is also trustworthy because nothing in it can be silently modified after recording. Mutable operations would mean a transformation bug could corrupt the history log in ways that are extremely difficult to debug.

Why OT instead of last-write-wins. Last-write-wins is simple but wrong for collaborative text editing. In the Alice-and-Bob example from the opening, last-write-wins deletes the wrong character. OT preserves intent: Bob’s delete targets the ‘H’ he saw, and the transform adjusts the index to find the ‘H’ in the modified document. The cost of OT is the transformation logic, which has edge cases (overlapping deletes especially) that must be worked through carefully.

The convergence requirement. For OT to work correctly, the server must be the single sequencer. Clients submit operations to the server, and the server assigns them a revision number. All clients receive operations in the same server-assigned order and transform accordingly. This is the Jupiter Collaboration System model that Google Docs originally used. The tricky property is that if two clients each transform incoming operations against their local pending operations, they must both arrive at the same final document. The transformation functions here handle the critical cases: two concurrent inserts at the same position use a deterministic author ID tie-break so every client makes the same choice.

Undo/redo with OT is surprisingly hard. A naive undo applies the inverse of the last operation. But if other users have made edits since your operation, the inverse does not target the right location anymore. The correct approach is to transform the inverse operation against all subsequent operations before applying it, which is the same OT machinery applied recursively. This design keeps DocumentHistory simple and leaves proper undo as a known extension point. Implementing it fully requires a few more transformation cases.

CRDT as an alternative approach. CRDTs (Conflict-free Replicated Data Types) solve the same convergence problem with a different mechanism. Instead of transforming operations, you design data types whose merge operation is always commutative and associative. A CRDT-based text type assigns a unique immutable ID to each character, so “delete character X” always means the same thing regardless of what other operations have happened. No transformation needed. The trade-off is that CRDTs tend to carry more metadata per character and have higher baseline memory usage. They also make character ordering under concurrent inserts a more complex topic. Figma and many newer collaborative systems use CRDTs. Google Docs still uses OT. Both approaches converge correctly when implemented well.

Cursor adjustment as a first-class operation. When a remote operation is applied, every other user’s cursor needs to shift. An insert before your cursor pushes it right. A delete before your cursor pulls it left. A delete that overlaps your cursor position clamps you to the deletion start point. The Cursor.adjust method handles all three cases. The key is that cursor adjustment runs after the document transformation, using the transformed (server-resolved) operation, not the raw one the remote user sent.

The gap between this and a production system. A real collaborative editor needs a few more things this design omits. Network partition handling: what happens when a client goes offline and accumulates local operations for minutes before reconnecting? The client needs to transform all its pending operations against everything the server applied while it was gone, which can be a long chain. Rich text: formatting operations that affect ranges interact with insert/delete operations in non-trivial ways, and the transformation matrix grows. Presence indicators beyond cursors: selection ranges, annotations, comments. Each adds state that must be adjusted when operations land. The OT engine and the immutable operation model are the correct foundations for all of these, and each is a straightforward extension from the same base.


If the OT convergence proof or the CRDT comparison sparked a question, I’d genuinely enjoy thinking through it together. Reach out on Twitter or LinkedIn.

Tags:

#lld #machine-coding #object-oriented-design #python #real-time #crdt

Related Posts

Let's Connect! 💬

Whether you're looking to hire, want to collaborate on a project, or just want to chat about tech—I'd love to hear from you!