URL Shortener System Design (Part 1): High-Level Architecture

The URL shortener is a classic system design interview question that tests your understanding of distributed systems, caching strategies, and production trade-offs. It looks deceptively simple (“just generate a short code and redirect”), but the real challenge is handling scale, ensuring reliability, and making defensible architectural choices.

This is Part 1 of a two-part series. In this post, I’ll cover the high-level design (HLD): requirements gathering, capacity planning, ID generation approaches, database design, architecture, and caching strategy. These are the fundamentals you need to nail in any system design interview.

Part 2 (coming next) dives into low-level implementation details: analytics pipelines, security layers, observability, and production operational concerns with complete code implementations.

Start with Requirements (Don’t Skip This)

Many candidates jump straight to architecture in interviews. This is a mistake. Spend 5 minutes clarifying requirements, because it shows you think before coding and prevents you from solving the wrong problem.

Here’s what you need to nail down:

Functional Scope

Core flow: User submits long URL → get short URL → short URL redirects to original
Custom aliases: Can users pick their own short codes? (Nice to have, adds complexity)
Analytics: Track clicks, geography, devices? (Almost always yes, because this drives architecture)
Expiration: Do links expire? (TTL support affects storage and caching)
User accounts: Anonymous vs authenticated users? (Affects rate limiting and quotas)

Scale Assumptions

Ask for numbers. If the interviewer is vague, propose something reasonable:

100 million URLs created per month (realistic for a mid-sized service)
100:1 read-to-write ratio (redirects vastly outnumber creation)
5-year data retention (affects storage planning)

This gives you:

Writes: 100M/month ≈ 40 QPS average, ~120 QPS peak
Reads: 10B/month ≈ 4,000 QPS average, ~20,000 QPS peak

The 100:1 ratio is crucial. It tells you this is a read-heavy system where caching matters more than write optimization.

Non-Functional Requirements

Availability: 99.99% (4 nines = 52 minutes downtime/year)
Latency: <100ms redirect globally (p99 - this is aggressive)
Durability: Zero data loss for URL mappings (eventual consistency for analytics is OK)

Write these down. They’ll guide every decision you make.

Do the Math (Back-of-the-Envelope Calculations)

Always do capacity planning on the whiteboard. It shows you think about scale and helps you make informed technology choices. Here’s how I break it down:

Traffic Analysis

Start with what you know:

100M URLs/month ÷ 30 days ÷ 86,400 sec = 40 writes/sec average
Peak traffic (assume 3x): 120 writes/sec
100:1 read ratio: 4,000 redirects/sec average, 20,000 redirects/sec peak

The key insight: writes are trivial (any RDBMS handles 120 QPS easily), but reads are the bottleneck. At 20k QPS, you can’t hit the database for every redirect. Caching is mandatory, not optional.

Storage Planning

Calculate per-record size:

URL record:
- short_code: 8 bytes
- long_url: 2KB (plan for max, not average)
- user_id: 8 bytes
- created_at: 8 bytes
- expires_at: 8 bytes
- metadata: ~100 bytes
Total: ~2.2KB (round to 3KB with DB overhead)

5-year capacity:
- 100M/month × 12 × 5 = 6 billion URLs
- 6B × 3KB = 18TB raw
- With 3x replication: 54TB
- Add indexes (2x data size): ~100TB total

This fits comfortably on modern hardware. A single well-configured Postgres instance can handle this, though you’ll want to plan for sharding as you approach 10-20TB per shard.

Analytics Storage: The Hidden Iceberg

Here’s where candidates often stumble: analytics data grows much faster than URL data.

Click event:
- short_code: 8 bytes
- timestamp: 8 bytes
- ip_address: 16 bytes (IPv6)
- user_agent: 200 bytes
- referrer: 200 bytes
- geo_data: 50 bytes
Total: ~500 bytes per click

5-year volume:
- 10B clicks/month × 60 months = 600 billion events
- 600B × 500 bytes = 300TB raw

With compression (time-series DB gives 5:1 ratio): ~60TB

Critical decision point: Analytics is 300TB vs 18TB for URLs. You need separate storage strategies:

URL mappings: Strong consistency, ACID, fast point lookups → Postgres
Analytics: Eventual consistency, columnar storage, aggregation queries → ClickHouse/TimescaleDB

Cache Sizing

Apply the Pareto principle: 80% of traffic hits 20% of URLs.

Hot URLs: 20% × 6B = 1.2B URLs
Cache size: 1.2B × 500 bytes (compressed) = 600GB
Plan for: 1TB Redis cluster (room for growth)

Bandwidth (Usually Not the Problem)

Peak reads: 20k QPS × 3KB = 60 MB/sec
Peak writes: 120 QPS × 3KB = 360 KB/sec

Bandwidth is negligible. Don’t overthink it.

What This Math Tells You

Read optimization is critical - 99.5% of traffic is redirects
Caching is mandatory - Can’t sustain 20k DB queries/sec
Storage strategy must differ - URLs vs analytics need different databases
Scale is manageable - You don’t need NoSQL exotic solutions yet

These numbers guide every architectural decision that follows.

The Hard Part: Generating Short Codes

This is where most interviews focus, and for good reason: it tests your understanding of distributed systems. You need codes that are:

Unique (no collisions - data integrity)
Short (6-7 characters - user experience)
Unpredictable (security - prevent enumeration)
Fast to generate (120 QPS - no coordination bottlenecks)

Understanding the Keyspace

With 7 characters using base62 encoding [a-zA-Z0-9]:

Keyspace: 62^7 = 3.5 trillion possible codes
Our need: 6 billion URLs over 5 years
Utilization: 0.17% of keyspace

Space is not the constraint. The challenge is generating unique IDs across multiple servers without coordination overhead.

Approach 1: Hash-Based Generation

The first approach most people think of: hash the URL and use the first N characters.

import hashlib

def generate_short_code_hash(long_url: str, attempt: int = 0) -> str:
    """Hash-based generation with retry on collision"""
    input_str = f"{long_url}#{attempt}" if attempt > 0 else long_url
    hash_bytes = hashlib.sha256(input_str.encode()).digest()

    # Take first 6 bytes, convert to base62
    num = int.from_bytes(hash_bytes[:6], 'big')
    return base62_encode(num)[:7]

Collision Analysis

With 6 billion URLs in a 3.5 trillion keyspace:

Using Birthday Paradox:
P(collision) = 1 - e^(-n²/2N)
P(collision) = 1 - e^(-(6×10⁹)² / (2×3.5×10¹²))
P(collision) ≈ 0.5%

Expected collisions: 6B × 0.005 = 30 million over 5 years
Rate: ~6,000 collisions/month = 0.2 QPS spent on collision retries

Trade-offs:

✅ Automatic deduplication: Same URL always gets same code
✅ No coordination: Each server generates independently
❌ Collision handling required: Need DB query before every insert (40 QPS)
❌ Retry logic complexity: Must handle retry storms during high load

The collision rate is low, but checking existence for every insert adds 40 QPS of database load just for validation. This is acceptable but not ideal.

Approach 2: Counter-Based Generation (Naive)

A simpler approach: maintain a global counter and increment atomically.

class CounterService:
    def __init__(self, redis_client):
        self.redis = redis_client

    def get_next_id(self) -> int:
        # INCR is atomic in Redis
        return self.redis.incr("global_url_counter")

def generate_short_code() -> str:
    counter = counter_service.get_next_id()
    return base62_encode(counter).zfill(7)  # "0000001", "0000002", ...

Trade-offs:

✅ Zero collisions: Counter guarantees uniqueness
✅ Fast: Simple atomic increment
✅ Simple: Easy to implement and reason about
❌ SPOF: Redis down = can’t create URLs (violates 99.99% availability)
❌ Hot spot: Every write hits Redis (120 QPS to one service)
❌ Sequential IDs: Predictable codes (“0000001”, “0000002”) enable enumeration attacks

The SPOF is the critical issue. If Redis is unavailable for 5 minutes, you’ve already blown your availability budget for the month.

Approach 3: Range-Based Counter (Solving SPOF)

To eliminate the single point of failure, pre-allocate ID ranges to each app server. Each server requests a range once, then generates IDs locally without coordination.

class RangeAllocator:
    """Each app server allocates a range of 100k IDs at a time"""
    def __init__(self, redis_client, server_id: int):
        self.redis = redis_client
        self.server_id = server_id
        self.range_size = 100_000  # 100k IDs per allocation
        self.current = None
        self.max = None

    def get_next_id(self) -> int:
        if self.current is None or self.current >= self.max:
            self._allocate_range()

        self.current += 1
        return self.current

    def _allocate_range(self):
        """
        Atomically allocate a range from Redis
        Runs ~once per 25 minutes at 40 QPS (100k / 40 = 2500 seconds)
        """
        # Redis atomic operation
        range_start = self.redis.incrby("global_counter", self.range_size)
        self.current = range_start - self.range_size + 1
        self.max = range_start

        # Optional: Log allocation for debugging
        logger.info(f"Server {self.server_id} allocated range {self.current}-{self.max}")

Trade-offs:

✅ No SPOF: Redis downtime doesn’t block ID generation (servers use allocated range)
✅ Lower load: Redis hit once per 100k URLs (~25 minutes at 40 QPS)
✅ Zero collisions: Ranges don’t overlap
❌ ID waste on crashes: Server crash loses up to 100k IDs (0.0029% of keyspace)
❌ Still sequential: Codes remain predictable within a range

The ID waste is negligible. With 3.5 trillion possible codes and 6 billion needed, losing 100k per crash doesn’t matter.

Approach 4: Range-Based Counter + Bit-Shuffling (Recommended)

Combine range allocation with bit manipulation to solve both availability and security.

def generate_short_code() -> str:
    """
    Production-ready ID generation:
    1. Get ID from local range (no coordination, SPOF eliminated)
    2. Shuffle bits for unpredictability (security)
    3. Encode base62 (URL-safe)
    """
    counter_id = range_allocator.get_next_id()

    # Bit shuffling: interleave with timestamp and server ID
    # This makes sequential IDs appear random
    timestamp_bits = int(time.time()) & 0xFFFF  # 16 bits
    server_bits = range_allocator.server_id & 0xFF  # 8 bits
    counter_bits = counter_id & 0xFFFFFFFFFF  # 40 bits

    # Combine: [8b server | 16b timestamp | 40b counter] = 64 bits
    unique_id = (server_bits << 56) | (timestamp_bits << 40) | counter_bits

    # Optional: XOR with a secret key for additional unpredictability
    unique_id ^= SECRET_XOR_KEY

    return base62_encode(unique_id)[:7]  # Truncate to 7 chars

Trade-offs:

✅ Zero collisions: Counter guarantees uniqueness
✅ No SPOF: Local range allocation, no coordination needed
✅ Unpredictable: Bit shuffling + XOR makes codes appear random
✅ Fast: Pure in-memory computation, no network calls
✅ Debuggable: Server ID embedded (useful when investigating issues)
✅ Time-sortable: Timestamp component enables range queries

Why not UUIDs? UUIDs are 128 bits, which is overkill for this use case. Even truncated to 7 characters, they have collision probability. The counter-based approach gives deterministic uniqueness with better performance and built-in debugging metadata.

Recommendation

Use Approach 4 (range-based counter + bit shuffling) for production:

Solves availability (no SPOF)
Solves security (unpredictable codes)
Solves collisions (deterministic uniqueness)
Enables debugging (server ID embedded)

This approach satisfies all four requirements without exotic infrastructure.

Database Design: Choosing the Right Storage

Storage is where many candidates stumble. You need different databases for different access patterns.

Analyze Access Patterns First

URL Mappings:

Writes: 40 QPS, need ACID transactions
Reads: 20k QPS, but 90% cached → 2k QPS hits DB
Query patterns: Point lookups by short_code, list by user_id + pagination
Consistency: Strong (can’t serve wrong URLs)
Relations: Users → URLs (need JOINs)

Analytics:

Writes: 20k QPS, append-only, can be batched
Reads: Aggregations (count by day, GROUP BY country)
Consistency: Eventual (5-minute lag acceptable)
Relations: Minimal (mostly time-series queries)

These patterns are fundamentally different and need different storage engines.

SQL vs NoSQL for URL Mappings

The NoSQL argument: “Simple key-value lookups, billions of records, so use Cassandra or DynamoDB.”

Why PostgreSQL is better here:

Transactions are essential: Creating a URL requires atomic operations:
- Insert into urls table
- Increment user’s quota counter
- Check rate limits
NoSQL databases give up ACID for scale, but we don’t need that scale yet.
Secondary indexes matter: Need to query by:
- short_code (primary lookup)
- user_id + created_at (user’s URL list, paginated)
- expires_at (TTL cleanup)
Postgres handles these with indexes. Cassandra requires duplicate tables per query pattern.
Scale is manageable: 2k QPS reads on a single Postgres instance is trivial with:
- NVMe SSDs (100k+ IOPS)
- Proper indexes
- Connection pooling
We’re nowhere near needing NoSQL’s write scalability (50k+ QPS).
Operational simplicity: Postgres replication is mature and well-understood. Tuning Cassandra’s consistency levels and handling eventual consistency bugs adds operational overhead.

When to consider NoSQL: When writes exceed 50k QPS or a single Postgres shard can’t handle the load. Even then, shard Postgres first before switching to NoSQL.

Schema Design

-- URLs table: Strong consistency, relational
CREATE TABLE urls (
    id BIGINT PRIMARY KEY,  -- Pre-generated from range allocator
    short_code VARCHAR(7) UNIQUE NOT NULL,
    long_url TEXT NOT NULL,
    user_id BIGINT REFERENCES users(id),
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    expires_at TIMESTAMPTZ,
    is_active BOOLEAN DEFAULT true,

    -- Indexes for different access patterns
    CONSTRAINT short_code_valid CHECK (short_code ~ '^[a-zA-Z0-9]{7}$')
);

CREATE UNIQUE INDEX idx_short_code ON urls(short_code) WHERE is_active = true;
CREATE INDEX idx_user_created ON urls(user_id, created_at DESC);
CREATE INDEX idx_expires ON urls(expires_at) WHERE expires_at IS NOT NULL;

-- Users table
CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    api_key VARCHAR(64) UNIQUE NOT NULL,
    tier VARCHAR(20) DEFAULT 'free',
    url_quota INT DEFAULT 100,
    urls_created INT DEFAULT 0,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

Key optimizations:

Partial index on is_active - only index active URLs
Composite index on (user_id, created_at DESC) - efficient pagination
Partial index on expires_at - only index URLs with TTL

Analytics Storage: Time-Series Database

For analytics, use ClickHouse (or TimescaleDB) instead of Postgres:

-- ClickHouse schema (columnar storage)
CREATE TABLE clicks (
    short_code FixedString(7),
    clicked_at DateTime,
    ip_address IPv6,
    country_code FixedString(2),
    city String,
    device_type LowCardinality(String),  -- Enum-like compression
    referrer String,
    user_agent String
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(clicked_at)  -- Monthly partitions
ORDER BY (short_code, clicked_at);

ClickHouse advantages:

Compression: 10:1 ratio vs Postgres (300TB → 30TB)
Query speed: Columnar storage makes GROUP BY country_code 50x faster
Partitioning: Old partitions moved to S3 automatically (cost savings)
Write throughput: Handles 100k inserts/sec with batching

Cost comparison:

Postgres (row storage):
- 300TB × $0.10/GB = $30k/month

ClickHouse (columnar + S3):
- 30TB SSD × $0.10/GB = $3k/month
- 270TB S3 × $0.023/GB = $6.2k/month
- Total: $9.2k/month
- Savings: $20.8k/month

When to Shard

Shard Postgres when:

Single instance hits 20k write QPS, or
Storage exceeds 20TB per instance

For our scale (40 QPS writes, 6B URLs), a single instance handles everything comfortably. But here’s how sharding works when you need it:

SHARD_COUNT = 16  # Power of 2 for consistent hashing

def get_shard_id(short_code: str) -> int:
    """
    Consistent hashing by short_code
    - Even distribution (hash randomness)
    - Deterministic routing (same code → same shard)
    - No hotspots (codes are randomly distributed)
    """
    return int(hashlib.sha256(short_code.encode()).hexdigest()[:8], 16) % SHARD_COUNT

# Shard routing
shard_id = get_shard_id("aB3xY9z")  # Always routes to same shard
db_connection = db_pool.get_connection(f"postgres-shard-{shard_id}")

Each shard holds ~375M URLs (6B ÷ 16). Run leader-follower replication per shard for read scaling and high availability.

High-Level Architecture

The architecture separates read and write paths since they have very different characteristics (writes: 40 QPS, reads: 20k QPS).

                              ┌─────────────────────────┐
                              │   CloudFlare CDN/WAF    │
                              │  - DDoS protection      │
                              │  - Rate limiting        │
                              │  - Static edge cache    │
                              └────────┬────────────────┘
                                       │
                              ┌────────▼─────────┐
                              │  Global Load     │
                              │  Balancer (AWS   │
                              │  ALB/CloudFlare) │
                              └────────┬─────────┘
                                       │
              ┌────────────────────────┼────────────────────────┐
              │                        │                        │
         ┌────▼────┐            ┌─────▼─────┐          ┌──────▼──────┐
         │ App Pod │            │  App Pod  │          │   App Pod   │
         │ (US-E)  │            │  (US-W)   │          │    (EU)     │
         │ Stateless│           │ Stateless │          │  Stateless  │
         └────┬────┘            └─────┬─────┘          └──────┬──────┘
              │                       │                        │
         ┌────▼─────────────┐   ┌────▼───────────┐      ┌────▼─────────┐
         │ Redis Cluster    │   │ Redis Cluster  │      │Redis Cluster │
         │ (Regional cache) │   │ (Regional)     │      │ (Regional)   │
         │ 1TB, LRU evict   │   │ 1TB            │      │ 1TB          │
         └──────────────────┘   └────────────────┘      └──────────────┘
                          │            │             │
                          └────────────┼─────────────┘
                                       │
                          ┌────────────▼────────────┐
                          │  PostgreSQL Primary     │
                          │  Multi-Region replicas  │
                          │  (Sharded: 16 shards)   │
                          └────────────┬────────────┘
                                       │
           ┌───────────────────────────┼────────────────────┐
           │                           │                    │
      ┌────▼────┐               ┌──────▼──────┐      ┌─────▼──────┐
      │ Kafka   │               │  ClickHouse │      │  S3 (Cold  │
      │ Stream  │──────────────▶│  Analytics  │◀─────│  Storage)  │
      │ (Async) │               │   Cluster   │      │            │
      └─────────┘               └─────────────┘      └────────────┘

Write Path (URL Creation)

The write path prioritizes durability over speed. At 40 QPS, we can afford synchronous database writes.

POST /api/shorten
{
  "long_url": "https://example.com/very/long/url",
  "custom_alias": "mylink"  // optional
}

Flow:
1. [App Server] Authenticate user (JWT/API key)
   - Check rate limit: 100 URLs/hour for free tier
   - Fail fast if over quota

2. [App Server] Validate URL
   - Check format (not malware/phishing)
   - Query Google Safe Browsing API (async, 50ms timeout)
   - Block if flagged as malicious

3. [App Server] Generate short code
   - range_allocator.get_next_id() → local, no network call
   - Bit shuffle + base62 encode → "aB3xY9z"

4. [Database] Transactional write to Postgres
   BEGIN TRANSACTION;
     INSERT INTO urls (id, short_code, long_url, user_id, created_at)
       VALUES (...);
     UPDATE users SET urls_created = urls_created + 1 WHERE id = ?;
   COMMIT;

   Latency: ~10ms local replica, ~50ms cross-region

5. [Cache] Proactive cache write (fire-and-forget)
   - redis.setex(f"url:{short_code}", 86400, long_url)
   - If Redis fails, log error but don't fail request
   - Cache populated on first read anyway

6. [Response] Return short URL immediately
   {
     "short_url": "https://short.ly/aB3xY9z",
     "created_at": "2026-01-07T10:30:00Z"
   }

Total latency: ~80ms (p50), ~150ms (p99)

Why database before cache? Durability over speed. If we cache first and the database write fails, users get 404s on redirect. At 40 QPS, the database handles writes easily, so caching doesn’t help on the write path.

Read Path (Redirect)

The read path is where the system is tested. At 20k QPS, it needs aggressive optimization. The goal is <100ms latency globally.

GET /aB3xY9z

Flow:
1. [CDN] CloudFlare edge cache (200+ PoPs globally)
   - Cache-Control: public, max-age=300 (5 minutes)
   - 80% of requests served from edge
   - Latency: 10-20ms (CDN → user)

   If CDN miss (cold start or cache expiry):

2. [Load Balancer] Route to nearest region (US-E, US-W, EU)
   - Geo-routing: EU users → EU app servers
   - Latency: +15ms network hop

3. [App Server] Check Redis cache (regional cluster)
   GET url:aB3xY9z
   - Cache hit (90% after CDN misses): ~1ms
   - Cache miss (10%): proceed to database

4. [Database] Query Postgres replica (read-only)
   - Route to nearest read replica
   - SELECT long_url, expires_at FROM urls WHERE short_code = ? AND is_active = true;
   - Latency: ~5ms local, ~100ms cross-region

5. [Cache] Write-through to Redis (async)
   - redis.setex(f"url:{short_code}", 86400, long_url)
   - Don't block response

6. [Response] 302 Redirect immediately
   HTTP/1.1 302 Found
   Location: https://example.com/very/long/url
   Cache-Control: public, max-age=300

7. [Analytics] Async event to Kafka (fire-and-forget)
   - kafka.produce("clicks", {short_code, timestamp, ip, user_agent, ...})
   - Does NOT block redirect
   - If Kafka down, log to local buffer

Latency breakdown:
- CDN hit (80%): 10-20ms
- Redis hit (18%): 30-50ms
- DB hit (2%): 80-120ms
- p99: <100ms ✅

Multi-layer caching: CDN (edge) → Redis (regional) → Database. Each layer reduces load on the next by 80-90%.

CAP Theorem and Caching Strategy

Choosing Availability Over Consistency

In CAP theorem, network partitions will happen, forcing you to choose between Consistency and Availability. For URL shorteners, choose Availability (AP system).

Why favor availability:

Stale reads are tolerable: If a user updates a URL and it takes 5 minutes to propagate to all caches, that’s annoying but not catastrophic. Blocking all redirects during database unavailability is worse.
URLs are mostly immutable: 99% of URLs never change after creation. The rare edit cases can tolerate eventual consistency.
Expiration isn’t critical: If an expired URL is cached for 5 extra minutes, it’s a UX annoyance, not a security breach.

Cache Invalidation Strategy

Despite choosing availability, aggressively invalidate caches on writes:

async def update_url(short_code: str, new_long_url: str):
    """Update URL and aggressively invalidate caches"""
    # 1. Update database (source of truth)
    await db.execute(
        "UPDATE urls SET long_url = ?, updated_at = NOW() WHERE short_code = ?",
        new_long_url, short_code
    )

    # 2. Invalidate all cache layers (best effort)
    await asyncio.gather(
        redis.delete(f"url:{short_code}"),  # Redis
        cdn.purge_cache(f"/{short_code}"),  # CloudFlare edge
        return_exceptions=True  # Don't fail if purge fails
    )

    # 3. Wait 10 seconds for cache propagation before returning success
    await asyncio.sleep(10)

For deletes/deactivations, use cache tombstones instead of simple deletion:

async def deactivate_url(short_code: str):
    """Deactivate URL with cache tombstone"""
    # 1. Soft delete in DB
    await db.execute(
        "UPDATE urls SET is_active = false WHERE short_code = ?",
        short_code
    )

    # 2. Write tombstone to cache (instead of deleting)
    # TTL = 7 days (longer than cache's normal 24h TTL)
    await redis.setex(f"url:{short_code}", 604800, "__TOMBSTONE__")

    # This ensures deactivated URLs return 404 immediately

Multi-Layer Caching Strategy

Use three cache layers with different TTLs:

Layer 1: CDN Edge Cache (CloudFlare)
- TTL: 5 minutes
- Coverage: 80% of requests
- Invalidation: Purge API (eventual, ~30 seconds)
- Cost: ~$0.10 per million requests

Layer 2: Regional Redis (US-E, US-W, EU)
- TTL: 24 hours
- Coverage: 18% of requests (CDN misses)
- Invalidation: Immediate (delete key)
- Cost: ~$0.50/GB/month × 3 regions

Layer 3: Database Read Replicas
- TTL: Infinite (source of truth)
- Coverage: 2% of requests (cache misses)
- Latency: 5-100ms depending on region

Handling Viral Links (High Fan-Out Traffic)

Viral links create sudden traffic spikes (10k+ requests/sec for a single URL). Handle them with adaptive caching:

class AdaptiveCaching:
    """Increase cache TTL for hot links"""

    async def get_with_adaptive_ttl(self, short_code: str):
        # Check hit count in last minute (stored in Redis sorted set)
        hits_per_minute = await redis.zcount(
            f"hits:{short_code}",
            time.time() - 60,
            time.time()
        )

        # Adaptive TTL based on traffic
        if hits_per_minute > 1000:  # Viral threshold
            ttl = 3600  # 1 hour
            cdn_ttl = 600  # 10 minutes CDN
        elif hits_per_minute > 100:
            ttl = 1800  # 30 minutes
            cdn_ttl = 300  # 5 minutes CDN
        else:
            ttl = 600  # 10 minutes (default)
            cdn_ttl = 60  # 1 minute CDN

        # Cache with adaptive TTL
        await redis.setex(f"url:{short_code}", ttl, long_url)

        return long_url, cdn_ttl

With adaptive caching, a viral link getting 10k requests/sec is served almost entirely from CDN edge. The 30-second CDN cache propagation delay is acceptable for viral content.

TTL and Data Lifecycle Management

Link expiration requires two strategies: lazy expiration at read time, and background cleanup for storage reclamation.

# Strategy 1: Lazy expiration at read time
async def get_url(short_code: str):
    """Check expiration on every read"""
    url_data = await cache_or_db_fetch(short_code)

    if url_data.expires_at and url_data.expires_at < time.time():
        # Expired—return 404 and write tombstone
        await redis.setex(f"url:{short_code}", 86400, "__EXPIRED__")
        return None

    return url_data.long_url

# Strategy 2: Background cleanup (monthly) for storage reclamation
async def cleanup_expired_urls():
    """
    Runs monthly via cron
    Deletes expired URLs to reclaim storage
    """
    deleted = await db.execute("""
        DELETE FROM urls
        WHERE expires_at < NOW() - INTERVAL '7 days'
        AND is_active = false
    """)

    logger.info(f"Cleaned up {deleted} expired URLs")

Cost impact of TTL:

Scenario: 50% of URLs have 30-day TTL (temporary campaign links)

Without TTL:
- Storage: 6B URLs × 3KB = 18TB over 5 years
- Cost: 18TB × $0.10/GB = $1,800/month

With TTL and cleanup:
- Active URLs: ~500M (after expiration)
- Storage: 500M × 3KB = 1.5TB
- Cost: 1.5TB × $0.10/GB = $150/month
- Savings: $1,650/month = $99k over 5 years

TTL is not just a feature. It is a cost optimization strategy.

Wrapping Up Part 1

We’ve covered the high-level design fundamentals for a production URL shortener:

Requirements & Capacity Planning - Understanding scale drives every decision (read-heavy system → caching is mandatory)
Distributed ID Generation - Range-based counters + bit-shuffling solve availability, security, and uniqueness
Database Design - Separate storage strategies for URLs (Postgres) vs analytics (ClickHouse)
Architecture & Request Flows - Multi-region deployment with clear read/write path separation
Caching Strategy - Three-layer caching (CDN → Redis → DB) with adaptive TTLs for viral links

This architecture handles 10 billion redirects/month with <100ms p99 latency globally, costs ~$29k/month, scales horizontally, and has no single points of failure.

In Part 2, we’ll dive into the implementation details that make this production-ready:

Analytics Pipeline: Kafka streaming, batching strategies, handling 20k events/sec without blocking redirects
Security Layers: Rate limiting, URL validation, abuse detection, incident response
Observability: Metrics, dashboards, alerting, distributed tracing, cost monitoring

Stay tuned for Part 2, where we go from architecture to code.

Want to discuss system design? Reach out on Twitter or LinkedIn. I love talking about distributed systems, caching strategies, and building for scale.

URL Shortener System Design (Part 1): High-Level Architecture

Start with Requirements (Don’t Skip This)

Functional Scope

Scale Assumptions

Non-Functional Requirements

Do the Math (Back-of-the-Envelope Calculations)

Traffic Analysis

Storage Planning

Analytics Storage: The Hidden Iceberg

Cache Sizing

Bandwidth (Usually Not the Problem)

What This Math Tells You

The Hard Part: Generating Short Codes

Understanding the Keyspace

Approach 1: Hash-Based Generation

Approach 2: Counter-Based Generation (Naive)

Approach 3: Range-Based Counter (Solving SPOF)

Approach 4: Range-Based Counter + Bit-Shuffling (Recommended)

Recommendation

Database Design: Choosing the Right Storage

Analyze Access Patterns First

SQL vs NoSQL for URL Mappings

Schema Design

Analytics Storage: Time-Series Database

When to Shard

High-Level Architecture

Write Path (URL Creation)

Read Path (Redirect)

CAP Theorem and Caching Strategy

Choosing Availability Over Consistency

Cache Invalidation Strategy

Multi-Layer Caching Strategy

Handling Viral Links (High Fan-Out Traffic)

TTL and Data Lifecycle Management

Wrapping Up Part 1

Related Posts

Building a URL Shortener That Scales

How I Approach System Design Interviews: A Framework That Actually Works

URL Shortener System Design (Part 2): Production Implementation Deep Dive

Let's Connect! 💬