DDSA Solutions
Case Study8 min read·

Design a URL Shortener — Complete Interview Walkthrough

Step-by-step system design for a URL shortening service like bit.ly: API design, short-code generation, database schema, caching, redirects, and scaling reads to millions per second.

The URL shortener is the "Two Sum" of system design interviews. It is simple enough to finish in forty-five minutes, yet rich enough to discuss hashing, databases, caching, and rate limiting. Interviewers use it to see whether you can move from requirements to a working architecture without over-engineering. If you have not read our interview framework yet, skim that first — this walkthrough follows the same steps.

This walkthrough assumes a realistic product: users paste a long URL, get a short link, and anyone who clicks is redirected. We are not building enterprise SSO or link analytics dashboards unless the interviewer asks.

Requirements

Functional

  • Given a long URL, return a shorter URL on our domain (e.g. dsas.ly/abc123).
  • Visiting the short URL redirects (HTTP 301 or 302) to the original long URL.
  • Optional: custom alias (dsas.ly/my-resume) if not taken.
  • Optional: links expire after a TTL.

Non-functional

  • Highly available redirects — a broken short link erodes trust immediately.
  • Low latency on redirect — target under 100ms p99.
  • Read-heavy: assume 100:1 read-to-write ratio or higher.
  • Scale: 100M new URLs/month, 10B redirects/month (adjust with interviewer).

Capacity estimation

100M writes/month ≈ 38 writes/sec average, ~200/sec peak. 10B reads/month ≈ 3,800 reads/sec average, ~20,000/sec peak. Each row: short code (7 chars), long URL (~200 bytes avg), metadata ~100 bytes → ~350 bytes/row. 100M/month × 12 × 350 bytes ≈ 420 GB/year before indexes and replication. Manageable on one cluster initially; reads need caching.

API design

EndpointPurposeNotes
POST /v1/urlsCreate short URLBody: { url, customAlias?, expiresInDays? } → { shortUrl, shortCode }
GET /{shortCode}Redirect302 Found + Location header (302 allows changing destination later)
DELETE /v1/urls/{shortCode}Remove linkRequires auth in real product; optional in interview

301 vs 302

Use 302 (temporary redirect) if you might update the destination or want accurate click counts per hop. Use 301 (permanent) if the mapping never changes and you want browsers to cache aggressively. Most shorteners use 302 for flexibility.

High-level architecture

Split the design into two paths — **create** (write-heavy, rare) and **redirect** (read-heavy, constant). Interviewers care more about the redirect path. Draw both, but spend 70% of your time on GET /{code}.

Write path: create a short URL

  1. Client POSTs { url, customAlias?, expiresInDays? } to POST /v1/urls.
  2. API validates URL scheme, length, blocklist, and SSRF rules.
  3. Rate limiter checks IP / API key — reject 429 if abusive.
  4. Generate short code (base62 ID) or validate custom alias uniqueness.
  5. INSERT into PostgreSQL primary — short_code, long_url, created_at, expires_at.
  6. Return 201 { shortUrl: "https://dsas.ly/abc123" } — no need to warm cache yet; link is cold.

Read path: redirect (the critical path)

  1. Browser or bot GETs https://dsas.ly/abc123.
  2. Optional CDN edge — cache 302 Location for viral links only.
  3. Load balancer routes to any stateless API server.
  4. Redis GET url:abc123 — on hit, return long URL immediately.
  5. On miss: SELECT long_url FROM read replica WHERE short_code = $1.
  6. Populate Redis (TTL 24h), respond HTTP 302 with Location: long_url.
ComponentRoleWrite pathRead path
Load balancerDistributes trafficYesYes
API serversValidation, business logicInsert mappingLookup + redirect
RedisHot redirect cacheOptional invalidatePrimary lookup
PostgreSQL primarySource of truthINSERT new linksNot on hot path
Read replicasScale readsNoFallback on cache miss
CDNEdge cache for 302NoOptional for viral URLs

Generating short codes

Option 1: Base62 encode a unique ID

Use a 64-bit auto-increment ID (from DB sequence or dedicated ID generator). Encode in base62 [a-zA-Z0-9] → 7 characters covers 62^7 ≈ 3.5 trillion URLs. Pros: no collisions, predictable length. Cons: exposes volume (minor concern).

Option 2: Hash the long URL

MD5/SHA truncated — fast but collisions require detection and retry. Same long URL might get different short links unless you deduplicate with a separate lookup table. Good for idempotent shortening; trickier to explain under time pressure.

In interviews, Option 1 is the safer default. Mention Option 2 if the interviewer asks about deduplication.

Database schema

ColumnTypePurpose
short_codeVARCHAR(10) PKPublic identifier in the URL — indexed for redirect lookup
long_urlTEXT NOT NULLOriginal destination — can be long
user_idUUID NULLOwner if authenticated; NULL for anonymous
created_atTIMESTAMPTZAudit, analytics, TTL calculation
expires_atTIMESTAMPTZ NULLOptional link expiry — cron deletes expired rows

Key queries

  • Redirect: SELECT long_url FROM url_mappings WHERE short_code = $1 — must use unique index on short_code.
  • Create: INSERT … ON CONFLICT (short_code) DO NOTHING — return 409 for taken custom aliases.
  • Cleanup: DELETE FROM url_mappings WHERE expires_at < NOW() — batch nightly job.
Advertisement

Do not store click counts in this table if redirects are hot — updates on every read would kill write throughput. Use async analytics instead.

Caching strategy

Use cache-aside on the redirect path. Key format: url:{shortCode} → long URL string.

  1. GET redirect: Redis GET url:abc123.
  2. Hit → build 302 Location header, return immediately.
  3. Miss → query read replica, SET Redis with 24h TTL, then redirect.
  4. POST create: write to DB primary; cache optional (link not yet clicked).
  5. DELETE / update: DEL url:abc123 in Redis + purge CDN edge if used.

TTL choice: 24 hours balances freshness vs hit ratio. Viral links get millions of hits within hours — one cache fill serves them all. Long-tail links may expire from Redis unused; that is fine — occasional DB miss is cheap.

Cache invalidation

If a URL is deleted or updated, purge Redis key and CDN edge entry. Stale redirects are a classic bug in student designs — mention it proactively.

Scaling writes and reads

  • API servers: horizontal scale behind load balancer; stateless.
  • Database: primary for writes, multiple read replicas for redirect lookups on cache miss.
  • Sharding: partition by short_code hash when single DB exhausts write IOPS or storage.
  • Rate limiting: token bucket per IP on POST to prevent abuse — see Design a Rate Limiter.

Security and abuse

  • Validate URLs — block javascript: and internal IP ranges (SSRF prevention).
  • Scan or blocklist known phishing domains.
  • CAPTCHA or auth for anonymous bulk creation.
  • Monitor redirect targets for malware reports.

Failure modes

FailureCreate pathRedirect pathMitigation
Redis downWorks — writes go to DBSlower — fall through to replicaAuto-failover Redis; replicas handle miss load
DB primary downFail writes (503)Reads OK via replica + cachePromote replica or queue writes
Replica lagN/ARare stale redirect after editInvalidate cache on write; accept brief staleness
Region outageWrites fail in regionDNS failover to secondary regionMulti-region replicas + cache — advanced

Latency budget for the redirect path

Redirects must feel instant. Break down a 100ms p99 budget: DNS + TLS ~20ms (CDN helps), load balancer ~5ms, Redis GET ~1ms, application logic ~2ms, HTTP 302 response ~1ms. That leaves headroom. On cache miss, add read replica query ~10–20ms — still acceptable if misses are rare. This is why cache-aside is non-negotiable at scale.

StepComponentTypical latency
1CDN edge (optional)5–15ms
2Load balancer1–5ms
3Redis cache hit0.5–2ms
4PostgreSQL replica (miss)5–20ms
5302 redirect to client1ms

Custom alias and reserved words

Custom aliases (dsas.ly/my-portfolio) require a uniqueness check before insert. Reserve paths like /api, /admin, /health so they never collide with short codes. Reject profanity and impersonation domains. For interview depth: store reserved words in a small in-memory set loaded at startup — faster than a DB check on every create.

Optional: click analytics without slowing redirects

If the interviewer asks for analytics, never block the redirect on a write. Return 302 immediately, then publish a click event to a message queue (Kafka, SQS, RabbitMQ). A consumer batch-inserts into an analytics store (ClickHouse, BigQuery). Users perceive zero latency impact. Mention this pattern — it shows you understand async decoupling from your system design framework.

Multi-region considerations

For global users, deploy read replicas and Redis clusters in multiple regions. Writes go to one primary region; replicas async replicate. Redirects served locally from regional cache. Conflict resolution on custom aliases requires global uniqueness — use the primary DB or a dedicated coordination service. Only discuss this if the interviewer raises global scale; otherwise it is scope creep.

What to say in the last five minutes

Summarise: "We optimised for read latency with Redis and replicas, kept writes simple with base62 IDs in PostgreSQL, and added rate limiting plus URL validation for abuse. With more time I would add click analytics via an async queue so redirects stay fast." That closing shows product sense, not just diagram drawing.

Database indexing detail interviewers love

The redirect query is SELECT long_url FROM url_mappings WHERE short_code = $1. B-tree index on short_code makes this O(log n) disk lookups — effectively constant for interview purposes. Mention covering indexes only if asked: if the index includes long_url, the query is index-only without heap fetch. PostgreSQL EXPLAIN ANALYZE is what you use in production to verify; in an interview, stating "unique index on short_code" is sufficient.

Comparison to bit.ly and TinyURL

Real products add link preview crawlers, malware scanning, and logged-in user dashboards. In an interview, acknowledge these as v2 features unless the prompt includes them. TinyURL launched on a single server; bit.ly scaled with caching and sharding. You are designing the core path that every shortener shares — do not apologise for skipping login unless required.

Mock interview checklist

  1. Clarified functional + non-functional requirements (5 min)
  2. Did napkin math for writes, reads, storage (5 min)
  3. Drew API + components: LB, app, Redis, PostgreSQL (10 min)
  4. Explained short code generation and DB schema (10 min)
  5. Deep dive on caching and redirect latency (10 min)
  6. Mentioned abuse prevention, failure modes, analytics extension (5 min)

More in this series