URL Shortener System Design (Part 1): High-Level Architecture
The URL shortener is a classic system design interview question that tests your understanding of distributed systems, caching strategies, and production trade-offs. It looks deceptively simple (“just generate a short code and redirect”), but the real challenge is handling scale, ensuring reliability, and making defensible architectural choices.
This is Part 1 of a two-part series. In this post, I’ll cover the high-level design (HLD): requirements gathering, capacity planning, ID generation approaches, database design, architecture, and caching strategy. These are the fundamentals you need to nail in any system design interview.
Part 2 (coming next) dives into low-level implementation details: analytics pipelines, security layers, observability, and production operational concerns with complete code implementations.
Start with Requirements (Don’t Skip This)
Many candidates jump straight to architecture in interviews. This is a mistake. Spend 5 minutes clarifying requirements, because it shows you think before coding and prevents you from solving the wrong problem.
Here’s what you need to nail down:
Functional Scope
- Core flow: User submits long URL → get short URL → short URL redirects to original
- Custom aliases: Can users pick their own short codes? (Nice to have, adds complexity)
- Analytics: Track clicks, geography, devices? (Almost always yes, because this drives architecture)
- Expiration: Do links expire? (TTL support affects storage and caching)
- User accounts: Anonymous vs authenticated users? (Affects rate limiting and quotas)
Scale Assumptions
Ask for numbers. If the interviewer is vague, propose something reasonable:
- 100 million URLs created per month (realistic for a mid-sized service)
- 100:1 read-to-write ratio (redirects vastly outnumber creation)
- 5-year data retention (affects storage planning)
This gives you:
- Writes: 100M/month ≈ 40 QPS average, ~120 QPS peak
- Reads: 10B/month ≈ 4,000 QPS average, ~20,000 QPS peak
The 100:1 ratio is crucial. It tells you this is a read-heavy system where caching matters more than write optimization.
Non-Functional Requirements
- Availability: 99.99% (4 nines = 52 minutes downtime/year)
- Latency: <100ms redirect globally (p99 - this is aggressive)
- Durability: Zero data loss for URL mappings (eventual consistency for analytics is OK)
Write these down. They’ll guide every decision you make.
Do the Math (Back-of-the-Envelope Calculations)
Always do capacity planning on the whiteboard. It shows you think about scale and helps you make informed technology choices. Here’s how I break it down:
Traffic Analysis
Start with what you know:
- 100M URLs/month ÷ 30 days ÷ 86,400 sec = 40 writes/sec average
- Peak traffic (assume 3x): 120 writes/sec
- 100:1 read ratio: 4,000 redirects/sec average, 20,000 redirects/sec peak
The key insight: writes are trivial (any RDBMS handles 120 QPS easily), but reads are the bottleneck. At 20k QPS, you can’t hit the database for every redirect. Caching is mandatory, not optional.
Storage Planning
Calculate per-record size:
URL record:
- short_code: 8 bytes
- long_url: 2KB (plan for max, not average)
- user_id: 8 bytes
- created_at: 8 bytes
- expires_at: 8 bytes
- metadata: ~100 bytes
Total: ~2.2KB (round to 3KB with DB overhead)
5-year capacity:
- 100M/month × 12 × 5 = 6 billion URLs
- 6B × 3KB = 18TB raw
- With 3x replication: 54TB
- Add indexes (2x data size): ~100TB total
This fits comfortably on modern hardware. A single well-configured Postgres instance can handle this, though you’ll want to plan for sharding as you approach 10-20TB per shard.
Analytics Storage: The Hidden Iceberg
Here’s where candidates often stumble: analytics data grows much faster than URL data.
Click event:
- short_code: 8 bytes
- timestamp: 8 bytes
- ip_address: 16 bytes (IPv6)
- user_agent: 200 bytes
- referrer: 200 bytes
- geo_data: 50 bytes
Total: ~500 bytes per click
5-year volume:
- 10B clicks/month × 60 months = 600 billion events
- 600B × 500 bytes = 300TB raw
With compression (time-series DB gives 5:1 ratio): ~60TB
Critical decision point: Analytics is 300TB vs 18TB for URLs. You need separate storage strategies:
- URL mappings: Strong consistency, ACID, fast point lookups → Postgres
- Analytics: Eventual consistency, columnar storage, aggregation queries → ClickHouse/TimescaleDB
Cache Sizing
Apply the Pareto principle: 80% of traffic hits 20% of URLs.
Hot URLs: 20% × 6B = 1.2B URLs
Cache size: 1.2B × 500 bytes (compressed) = 600GB
Plan for: 1TB Redis cluster (room for growth)
Bandwidth (Usually Not the Problem)
Peak reads: 20k QPS × 3KB = 60 MB/sec
Peak writes: 120 QPS × 3KB = 360 KB/sec
Bandwidth is negligible. Don’t overthink it.
What This Math Tells You
- Read optimization is critical - 99.5% of traffic is redirects
- Caching is mandatory - Can’t sustain 20k DB queries/sec
- Storage strategy must differ - URLs vs analytics need different databases
- Scale is manageable - You don’t need NoSQL exotic solutions yet
These numbers guide every architectural decision that follows.
The Hard Part: Generating Short Codes
This is where most interviews focus, and for good reason: it tests your understanding of distributed systems. You need codes that are:
- Unique (no collisions - data integrity)
- Short (6-7 characters - user experience)
- Unpredictable (security - prevent enumeration)
- Fast to generate (120 QPS - no coordination bottlenecks)
Understanding the Keyspace
With 7 characters using base62 encoding [a-zA-Z0-9]:
- Keyspace: 62^7 = 3.5 trillion possible codes
- Our need: 6 billion URLs over 5 years
- Utilization: 0.17% of keyspace
Space is not the constraint. The challenge is generating unique IDs across multiple servers without coordination overhead.
Approach 1: Hash-Based Generation
The first approach most people think of: hash the URL and use the first N characters.
import hashlib
def generate_short_code_hash(long_url: str, attempt: int = 0) -> str:
"""Hash-based generation with retry on collision"""
input_str = f"{long_url}#{attempt}" if attempt > 0 else long_url
hash_bytes = hashlib.sha256(input_str.encode()).digest()
# Take first 6 bytes, convert to base62
num = int.from_bytes(hash_bytes[:6], 'big')
return base62_encode(num)[:7]
Collision Analysis
With 6 billion URLs in a 3.5 trillion keyspace:
Using Birthday Paradox:
P(collision) = 1 - e^(-n²/2N)
P(collision) = 1 - e^(-(6×10⁹)² / (2×3.5×10¹²))
P(collision) ≈ 0.5%
Expected collisions: 6B × 0.005 = 30 million over 5 years
Rate: ~6,000 collisions/month = 0.2 QPS spent on collision retries
Trade-offs:
- ✅ Automatic deduplication: Same URL always gets same code
- ✅ No coordination: Each server generates independently
- ❌ Collision handling required: Need DB query before every insert (40 QPS)
- ❌ Retry logic complexity: Must handle retry storms during high load
The collision rate is low, but checking existence for every insert adds 40 QPS of database load just for validation. This is acceptable but not ideal.
Approach 2: Counter-Based Generation (Naive)
A simpler approach: maintain a global counter and increment atomically.
class CounterService:
def __init__(self, redis_client):
self.redis = redis_client
def get_next_id(self) -> int:
# INCR is atomic in Redis
return self.redis.incr("global_url_counter")
def generate_short_code() -> str:
counter = counter_service.get_next_id()
return base62_encode(counter).zfill(7) # "0000001", "0000002", ...
Trade-offs:
- ✅ Zero collisions: Counter guarantees uniqueness
- ✅ Fast: Simple atomic increment
- ✅ Simple: Easy to implement and reason about
- ❌ SPOF: Redis down = can’t create URLs (violates 99.99% availability)
- ❌ Hot spot: Every write hits Redis (120 QPS to one service)
- ❌ Sequential IDs: Predictable codes (“0000001”, “0000002”) enable enumeration attacks
The SPOF is the critical issue. If Redis is unavailable for 5 minutes, you’ve already blown your availability budget for the month.
Approach 3: Range-Based Counter (Solving SPOF)
To eliminate the single point of failure, pre-allocate ID ranges to each app server. Each server requests a range once, then generates IDs locally without coordination.
class RangeAllocator:
"""Each app server allocates a range of 100k IDs at a time"""
def __init__(self, redis_client, server_id: int):
self.redis = redis_client
self.server_id = server_id
self.range_size = 100_000 # 100k IDs per allocation
self.current = None
self.max = None
def get_next_id(self) -> int:
if self.current is None or self.current >= self.max:
self._allocate_range()
self.current += 1
return self.current
def _allocate_range(self):
"""
Atomically allocate a range from Redis
Runs ~once per 25 minutes at 40 QPS (100k / 40 = 2500 seconds)
"""
# Redis atomic operation
range_start = self.redis.incrby("global_counter", self.range_size)
self.current = range_start - self.range_size + 1
self.max = range_start
# Optional: Log allocation for debugging
logger.info(f"Server {self.server_id} allocated range {self.current}-{self.max}")
Trade-offs:
- ✅ No SPOF: Redis downtime doesn’t block ID generation (servers use allocated range)
- ✅ Lower load: Redis hit once per 100k URLs (~25 minutes at 40 QPS)
- ✅ Zero collisions: Ranges don’t overlap
- ❌ ID waste on crashes: Server crash loses up to 100k IDs (0.0029% of keyspace)
- ❌ Still sequential: Codes remain predictable within a range
The ID waste is negligible. With 3.5 trillion possible codes and 6 billion needed, losing 100k per crash doesn’t matter.
Approach 4: Range-Based Counter + Bit-Shuffling (Recommended)
Combine range allocation with bit manipulation to solve both availability and security.
def generate_short_code() -> str:
"""
Production-ready ID generation:
1. Get ID from local range (no coordination, SPOF eliminated)
2. Shuffle bits for unpredictability (security)
3. Encode base62 (URL-safe)
"""
counter_id = range_allocator.get_next_id()
# Bit shuffling: interleave with timestamp and server ID
# This makes sequential IDs appear random
timestamp_bits = int(time.time()) & 0xFFFF # 16 bits
server_bits = range_allocator.server_id & 0xFF # 8 bits
counter_bits = counter_id & 0xFFFFFFFFFF # 40 bits
# Combine: [8b server | 16b timestamp | 40b counter] = 64 bits
unique_id = (server_bits << 56) | (timestamp_bits << 40) | counter_bits
# Optional: XOR with a secret key for additional unpredictability
unique_id ^= SECRET_XOR_KEY
return base62_encode(unique_id)[:7] # Truncate to 7 chars
Trade-offs:
- ✅ Zero collisions: Counter guarantees uniqueness
- ✅ No SPOF: Local range allocation, no coordination needed
- ✅ Unpredictable: Bit shuffling + XOR makes codes appear random
- ✅ Fast: Pure in-memory computation, no network calls
- ✅ Debuggable: Server ID embedded (useful when investigating issues)
- ✅ Time-sortable: Timestamp component enables range queries
Why not UUIDs? UUIDs are 128 bits, which is overkill for this use case. Even truncated to 7 characters, they have collision probability. The counter-based approach gives deterministic uniqueness with better performance and built-in debugging metadata.
Recommendation
Use Approach 4 (range-based counter + bit shuffling) for production:
- Solves availability (no SPOF)
- Solves security (unpredictable codes)
- Solves collisions (deterministic uniqueness)
- Enables debugging (server ID embedded)
This approach satisfies all four requirements without exotic infrastructure.
Database Design: Choosing the Right Storage
Storage is where many candidates stumble. You need different databases for different access patterns.
Analyze Access Patterns First
URL Mappings:
- Writes: 40 QPS, need ACID transactions
- Reads: 20k QPS, but 90% cached → 2k QPS hits DB
- Query patterns: Point lookups by short_code, list by user_id + pagination
- Consistency: Strong (can’t serve wrong URLs)
- Relations: Users → URLs (need JOINs)
Analytics:
- Writes: 20k QPS, append-only, can be batched
- Reads: Aggregations (count by day, GROUP BY country)
- Consistency: Eventual (5-minute lag acceptable)
- Relations: Minimal (mostly time-series queries)
These patterns are fundamentally different and need different storage engines.
SQL vs NoSQL for URL Mappings
The NoSQL argument: “Simple key-value lookups, billions of records, so use Cassandra or DynamoDB.”
Why PostgreSQL is better here:
-
Transactions are essential: Creating a URL requires atomic operations:
- Insert into urls table
- Increment user’s quota counter
- Check rate limits
NoSQL databases give up ACID for scale, but we don’t need that scale yet.
-
Secondary indexes matter: Need to query by:
short_code(primary lookup)user_id + created_at(user’s URL list, paginated)expires_at(TTL cleanup)
Postgres handles these with indexes. Cassandra requires duplicate tables per query pattern.
-
Scale is manageable: 2k QPS reads on a single Postgres instance is trivial with:
- NVMe SSDs (100k+ IOPS)
- Proper indexes
- Connection pooling
We’re nowhere near needing NoSQL’s write scalability (50k+ QPS).
-
Operational simplicity: Postgres replication is mature and well-understood. Tuning Cassandra’s consistency levels and handling eventual consistency bugs adds operational overhead.
When to consider NoSQL: When writes exceed 50k QPS or a single Postgres shard can’t handle the load. Even then, shard Postgres first before switching to NoSQL.
Schema Design
-- URLs table: Strong consistency, relational
CREATE TABLE urls (
id BIGINT PRIMARY KEY, -- Pre-generated from range allocator
short_code VARCHAR(7) UNIQUE NOT NULL,
long_url TEXT NOT NULL,
user_id BIGINT REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
expires_at TIMESTAMPTZ,
is_active BOOLEAN DEFAULT true,
-- Indexes for different access patterns
CONSTRAINT short_code_valid CHECK (short_code ~ '^[a-zA-Z0-9]{7}$')
);
CREATE UNIQUE INDEX idx_short_code ON urls(short_code) WHERE is_active = true;
CREATE INDEX idx_user_created ON urls(user_id, created_at DESC);
CREATE INDEX idx_expires ON urls(expires_at) WHERE expires_at IS NOT NULL;
-- Users table
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
api_key VARCHAR(64) UNIQUE NOT NULL,
tier VARCHAR(20) DEFAULT 'free',
url_quota INT DEFAULT 100,
urls_created INT DEFAULT 0,
created_at TIMESTAMPTZ DEFAULT NOW()
);
Key optimizations:
- Partial index on
is_active- only index active URLs - Composite index on
(user_id, created_at DESC)- efficient pagination - Partial index on
expires_at- only index URLs with TTL
Analytics Storage: Time-Series Database
For analytics, use ClickHouse (or TimescaleDB) instead of Postgres:
-- ClickHouse schema (columnar storage)
CREATE TABLE clicks (
short_code FixedString(7),
clicked_at DateTime,
ip_address IPv6,
country_code FixedString(2),
city String,
device_type LowCardinality(String), -- Enum-like compression
referrer String,
user_agent String
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(clicked_at) -- Monthly partitions
ORDER BY (short_code, clicked_at);
ClickHouse advantages:
- Compression: 10:1 ratio vs Postgres (300TB → 30TB)
- Query speed: Columnar storage makes
GROUP BY country_code50x faster - Partitioning: Old partitions moved to S3 automatically (cost savings)
- Write throughput: Handles 100k inserts/sec with batching
Cost comparison:
Postgres (row storage):
- 300TB × $0.10/GB = $30k/month
ClickHouse (columnar + S3):
- 30TB SSD × $0.10/GB = $3k/month
- 270TB S3 × $0.023/GB = $6.2k/month
- Total: $9.2k/month
- Savings: $20.8k/month
When to Shard
Shard Postgres when:
- Single instance hits 20k write QPS, or
- Storage exceeds 20TB per instance
For our scale (40 QPS writes, 6B URLs), a single instance handles everything comfortably. But here’s how sharding works when you need it:
SHARD_COUNT = 16 # Power of 2 for consistent hashing
def get_shard_id(short_code: str) -> int:
"""
Consistent hashing by short_code
- Even distribution (hash randomness)
- Deterministic routing (same code → same shard)
- No hotspots (codes are randomly distributed)
"""
return int(hashlib.sha256(short_code.encode()).hexdigest()[:8], 16) % SHARD_COUNT
# Shard routing
shard_id = get_shard_id("aB3xY9z") # Always routes to same shard
db_connection = db_pool.get_connection(f"postgres-shard-{shard_id}")
Each shard holds ~375M URLs (6B ÷ 16). Run leader-follower replication per shard for read scaling and high availability.
High-Level Architecture
The architecture separates read and write paths since they have very different characteristics (writes: 40 QPS, reads: 20k QPS).
┌─────────────────────────┐
│ CloudFlare CDN/WAF │
│ - DDoS protection │
│ - Rate limiting │
│ - Static edge cache │
└────────┬────────────────┘
│
┌────────▼─────────┐
│ Global Load │
│ Balancer (AWS │
│ ALB/CloudFlare) │
└────────┬─────────┘
│
┌────────────────────────┼────────────────────────┐
│ │ │
┌────▼────┐ ┌─────▼─────┐ ┌──────▼──────┐
│ App Pod │ │ App Pod │ │ App Pod │
│ (US-E) │ │ (US-W) │ │ (EU) │
│ Stateless│ │ Stateless │ │ Stateless │
└────┬────┘ └─────┬─────┘ └──────┬──────┘
│ │ │
┌────▼─────────────┐ ┌────▼───────────┐ ┌────▼─────────┐
│ Redis Cluster │ │ Redis Cluster │ │Redis Cluster │
│ (Regional cache) │ │ (Regional) │ │ (Regional) │
│ 1TB, LRU evict │ │ 1TB │ │ 1TB │
└──────────────────┘ └────────────────┘ └──────────────┘
│ │ │
└────────────┼─────────────┘
│
┌────────────▼────────────┐
│ PostgreSQL Primary │
│ Multi-Region replicas │
│ (Sharded: 16 shards) │
└────────────┬────────────┘
│
┌───────────────────────────┼────────────────────┐
│ │ │
┌────▼────┐ ┌──────▼──────┐ ┌─────▼──────┐
│ Kafka │ │ ClickHouse │ │ S3 (Cold │
│ Stream │──────────────▶│ Analytics │◀─────│ Storage) │
│ (Async) │ │ Cluster │ │ │
└─────────┘ └─────────────┘ └────────────┘
Write Path (URL Creation)
The write path prioritizes durability over speed. At 40 QPS, we can afford synchronous database writes.
POST /api/shorten
{
"long_url": "https://example.com/very/long/url",
"custom_alias": "mylink" // optional
}
Flow:
1. [App Server] Authenticate user (JWT/API key)
- Check rate limit: 100 URLs/hour for free tier
- Fail fast if over quota
2. [App Server] Validate URL
- Check format (not malware/phishing)
- Query Google Safe Browsing API (async, 50ms timeout)
- Block if flagged as malicious
3. [App Server] Generate short code
- range_allocator.get_next_id() → local, no network call
- Bit shuffle + base62 encode → "aB3xY9z"
4. [Database] Transactional write to Postgres
BEGIN TRANSACTION;
INSERT INTO urls (id, short_code, long_url, user_id, created_at)
VALUES (...);
UPDATE users SET urls_created = urls_created + 1 WHERE id = ?;
COMMIT;
Latency: ~10ms local replica, ~50ms cross-region
5. [Cache] Proactive cache write (fire-and-forget)
- redis.setex(f"url:{short_code}", 86400, long_url)
- If Redis fails, log error but don't fail request
- Cache populated on first read anyway
6. [Response] Return short URL immediately
{
"short_url": "https://short.ly/aB3xY9z",
"created_at": "2026-01-07T10:30:00Z"
}
Total latency: ~80ms (p50), ~150ms (p99)
Why database before cache? Durability over speed. If we cache first and the database write fails, users get 404s on redirect. At 40 QPS, the database handles writes easily, so caching doesn’t help on the write path.
Read Path (Redirect)
The read path is where the system is tested. At 20k QPS, it needs aggressive optimization. The goal is <100ms latency globally.
GET /aB3xY9z
Flow:
1. [CDN] CloudFlare edge cache (200+ PoPs globally)
- Cache-Control: public, max-age=300 (5 minutes)
- 80% of requests served from edge
- Latency: 10-20ms (CDN → user)
If CDN miss (cold start or cache expiry):
2. [Load Balancer] Route to nearest region (US-E, US-W, EU)
- Geo-routing: EU users → EU app servers
- Latency: +15ms network hop
3. [App Server] Check Redis cache (regional cluster)
GET url:aB3xY9z
- Cache hit (90% after CDN misses): ~1ms
- Cache miss (10%): proceed to database
4. [Database] Query Postgres replica (read-only)
- Route to nearest read replica
- SELECT long_url, expires_at FROM urls WHERE short_code = ? AND is_active = true;
- Latency: ~5ms local, ~100ms cross-region
5. [Cache] Write-through to Redis (async)
- redis.setex(f"url:{short_code}", 86400, long_url)
- Don't block response
6. [Response] 302 Redirect immediately
HTTP/1.1 302 Found
Location: https://example.com/very/long/url
Cache-Control: public, max-age=300
7. [Analytics] Async event to Kafka (fire-and-forget)
- kafka.produce("clicks", {short_code, timestamp, ip, user_agent, ...})
- Does NOT block redirect
- If Kafka down, log to local buffer
Latency breakdown:
- CDN hit (80%): 10-20ms
- Redis hit (18%): 30-50ms
- DB hit (2%): 80-120ms
- p99: <100ms ✅
Multi-layer caching: CDN (edge) → Redis (regional) → Database. Each layer reduces load on the next by 80-90%.
CAP Theorem and Caching Strategy
Choosing Availability Over Consistency
In CAP theorem, network partitions will happen, forcing you to choose between Consistency and Availability. For URL shorteners, choose Availability (AP system).
Why favor availability:
-
Stale reads are tolerable: If a user updates a URL and it takes 5 minutes to propagate to all caches, that’s annoying but not catastrophic. Blocking all redirects during database unavailability is worse.
-
URLs are mostly immutable: 99% of URLs never change after creation. The rare edit cases can tolerate eventual consistency.
-
Expiration isn’t critical: If an expired URL is cached for 5 extra minutes, it’s a UX annoyance, not a security breach.
Cache Invalidation Strategy
Despite choosing availability, aggressively invalidate caches on writes:
async def update_url(short_code: str, new_long_url: str):
"""Update URL and aggressively invalidate caches"""
# 1. Update database (source of truth)
await db.execute(
"UPDATE urls SET long_url = ?, updated_at = NOW() WHERE short_code = ?",
new_long_url, short_code
)
# 2. Invalidate all cache layers (best effort)
await asyncio.gather(
redis.delete(f"url:{short_code}"), # Redis
cdn.purge_cache(f"/{short_code}"), # CloudFlare edge
return_exceptions=True # Don't fail if purge fails
)
# 3. Wait 10 seconds for cache propagation before returning success
await asyncio.sleep(10)
For deletes/deactivations, use cache tombstones instead of simple deletion:
async def deactivate_url(short_code: str):
"""Deactivate URL with cache tombstone"""
# 1. Soft delete in DB
await db.execute(
"UPDATE urls SET is_active = false WHERE short_code = ?",
short_code
)
# 2. Write tombstone to cache (instead of deleting)
# TTL = 7 days (longer than cache's normal 24h TTL)
await redis.setex(f"url:{short_code}", 604800, "__TOMBSTONE__")
# This ensures deactivated URLs return 404 immediately
Multi-Layer Caching Strategy
Use three cache layers with different TTLs:
Layer 1: CDN Edge Cache (CloudFlare)
- TTL: 5 minutes
- Coverage: 80% of requests
- Invalidation: Purge API (eventual, ~30 seconds)
- Cost: ~$0.10 per million requests
Layer 2: Regional Redis (US-E, US-W, EU)
- TTL: 24 hours
- Coverage: 18% of requests (CDN misses)
- Invalidation: Immediate (delete key)
- Cost: ~$0.50/GB/month × 3 regions
Layer 3: Database Read Replicas
- TTL: Infinite (source of truth)
- Coverage: 2% of requests (cache misses)
- Latency: 5-100ms depending on region
Handling Viral Links (High Fan-Out Traffic)
Viral links create sudden traffic spikes (10k+ requests/sec for a single URL). Handle them with adaptive caching:
class AdaptiveCaching:
"""Increase cache TTL for hot links"""
async def get_with_adaptive_ttl(self, short_code: str):
# Check hit count in last minute (stored in Redis sorted set)
hits_per_minute = await redis.zcount(
f"hits:{short_code}",
time.time() - 60,
time.time()
)
# Adaptive TTL based on traffic
if hits_per_minute > 1000: # Viral threshold
ttl = 3600 # 1 hour
cdn_ttl = 600 # 10 minutes CDN
elif hits_per_minute > 100:
ttl = 1800 # 30 minutes
cdn_ttl = 300 # 5 minutes CDN
else:
ttl = 600 # 10 minutes (default)
cdn_ttl = 60 # 1 minute CDN
# Cache with adaptive TTL
await redis.setex(f"url:{short_code}", ttl, long_url)
return long_url, cdn_ttl
With adaptive caching, a viral link getting 10k requests/sec is served almost entirely from CDN edge. The 30-second CDN cache propagation delay is acceptable for viral content.
TTL and Data Lifecycle Management
Link expiration requires two strategies: lazy expiration at read time, and background cleanup for storage reclamation.
# Strategy 1: Lazy expiration at read time
async def get_url(short_code: str):
"""Check expiration on every read"""
url_data = await cache_or_db_fetch(short_code)
if url_data.expires_at and url_data.expires_at < time.time():
# Expired—return 404 and write tombstone
await redis.setex(f"url:{short_code}", 86400, "__EXPIRED__")
return None
return url_data.long_url
# Strategy 2: Background cleanup (monthly) for storage reclamation
async def cleanup_expired_urls():
"""
Runs monthly via cron
Deletes expired URLs to reclaim storage
"""
deleted = await db.execute("""
DELETE FROM urls
WHERE expires_at < NOW() - INTERVAL '7 days'
AND is_active = false
""")
logger.info(f"Cleaned up {deleted} expired URLs")
Cost impact of TTL:
Scenario: 50% of URLs have 30-day TTL (temporary campaign links)
Without TTL:
- Storage: 6B URLs × 3KB = 18TB over 5 years
- Cost: 18TB × $0.10/GB = $1,800/month
With TTL and cleanup:
- Active URLs: ~500M (after expiration)
- Storage: 500M × 3KB = 1.5TB
- Cost: 1.5TB × $0.10/GB = $150/month
- Savings: $1,650/month = $99k over 5 years
TTL is not just a feature. It is a cost optimization strategy.
Wrapping Up Part 1
We’ve covered the high-level design fundamentals for a production URL shortener:
- Requirements & Capacity Planning - Understanding scale drives every decision (read-heavy system → caching is mandatory)
- Distributed ID Generation - Range-based counters + bit-shuffling solve availability, security, and uniqueness
- Database Design - Separate storage strategies for URLs (Postgres) vs analytics (ClickHouse)
- Architecture & Request Flows - Multi-region deployment with clear read/write path separation
- Caching Strategy - Three-layer caching (CDN → Redis → DB) with adaptive TTLs for viral links
This architecture handles 10 billion redirects/month with <100ms p99 latency globally, costs ~$29k/month, scales horizontally, and has no single points of failure.
In Part 2, we’ll dive into the implementation details that make this production-ready:
- Analytics Pipeline: Kafka streaming, batching strategies, handling 20k events/sec without blocking redirects
- Security Layers: Rate limiting, URL validation, abuse detection, incident response
- Observability: Metrics, dashboards, alerting, distributed tracing, cost monitoring
Stay tuned for Part 2, where we go from architecture to code.
Want to discuss system design? Reach out on Twitter or LinkedIn. I love talking about distributed systems, caching strategies, and building for scale.
Tags:
Related Posts
Building a URL Shortener That Scales
Designing and implementing a production-ready URL shortener with custom domains, analytics, and high availability.
How I Approach System Design Interviews: A Framework That Actually Works
The framework I use to tackle system design interviews, demonstrated through designing a URL shortener from requirements to scaling strategies.
URL Shortener System Design (Part 2): Production Implementation Deep Dive
From architecture to code: implement production-grade analytics pipelines, security layers, and observability for URL shorteners. Complete with Kafka streaming, rate limiting, and monitoring.