Fundamentals6 min read·June 26, 2026

Design a Distributed Cache (Redis)

How to design Redis-like distributed caching: consistent hashing, replication, eviction, cache stampede, and when to use cache-aside in system design interviews.

Almost every case study on this site mentions Redis — URL shortener redirects, rate limiter counters, news feed timelines. Interviewers sometimes zoom in: "How would you build Redis itself?" You are not implementing every command; you are explaining sharding, replication, eviction, and failure behaviour. Read caching fundamentals first for cache-aside vs write-through; this article goes one level deeper into the cache cluster.

What interviewers want

Why a distributed cache exists (RAM latency, offload hot reads from DB).
How keys map to nodes (consistent hashing).
What happens when a node dies (replication, failover).
Eviction when memory is full (LRU, TTL).
Stampede and thundering herd mitigation.

Not the same as CDN

CDN caches static HTTP responses at the edge. Distributed cache (Redis) stores application objects (sessions, counters, serialized rows) close to app servers in the same region. Both are caches; different layers.

High-level architecture

Component	Role
Cache nodes	In-memory key-value store; partition of keyspace
Consistent hash ring	Map key → primary node; virtual nodes for balance
Replicas	Async copy of primary shard for read scaling and failover
Client library / proxy	Smart client or Twemproxy/Envoy routes to correct shard
Sentinel / control plane	Health checks, promote replica on primary failure

Consistent hashing

Naive `hash(key) % N` breaks when N changes — almost all keys remap. Consistent hashing places nodes and keys on a ring; key belongs to first node clockwise. Adding a node steals only adjacent key ranges. Virtual nodes (many points per physical server) smooth hot spots. Redis Cluster specifically uses 16,384 hash slots (CRC16 of key → slot → node) — same interview idea, slightly different implementation. Same idea as database sharding — interviewers expect you to draw the ring once.

Replication and failover

Each primary has one or more replicas. Writes go to primary; replicate asynchronously to replica (AP — brief loss window if primary dies before replicate). On primary failure, sentinel promotes replica; clients refresh ring metadata. Split-brain risk if two primaries — use quorum-based failover (majority of sentinels agree). For CAP discussions: Redis cluster trades perfect consistency for speed unless you use WAIT command (rare in interviews).

Eviction policies

Policy	Behaviour	When to mention
TTL expiry	Key vanishes after set time	Sessions, rate limit windows
LRU (approximate)	Evict least recently used when maxmemory hit	General object cache
LFU	Evict least frequently used	Hot key skew, repeated reads
noeviction	Return error on write when full	Critical counters — fail loud

LeetCode LRU Cache (146) is the same eviction idea in one machine — say that connection out loud.

Cache-aside in production

App reads cache; on miss, read DB, populate cache, return.
On write, update DB first, then delete cache key (not update — avoids race).
Set TTL to bound staleness even if invalidation misses.

This pattern appears in URL shortener, ride hailing surge multipliers, and ticket booking browse paths — not in seat holds.

Cache stampede

Hot key expires; 10K threads miss simultaneously and hammer DB. Fixes: (1) probabilistic early expiration — jitter TTL per key. (2) Mutex / "single flight" — one thread rebuilds, others wait. (3) Never expire ultra-hot keys; background refresh. (4) Pre-warm before known events (product launch). Mention at least one fix when discussing flash sales.

Hot keys and skew

One viral post’s like counter on a single Redis key saturates one CPU core. Mitigations: local in-process counter flushed periodically; split key into `likes:post:123:shard_{0..9}` and sum on read; read replicas for read-heavy hot keys. Load balancer cannot fix hot keys inside one shard — you need application-level splitting.

Capacity estimation

1B keys × 100 bytes key + 500 bytes value ≈ 600 GB — needs cluster of ~10 nodes at 64 GB RAM each (with overhead). 100K ops/sec cluster: single-threaded Redis ~100K simple GETs/sec per core — scale shards horizontally. Network: 100K × 1 KB = 100 MB/sec — usually not the bottleneck.

When not to use Redis

Source of truth for money or seat inventory — use SQL with transactions.
Large blobs (video) — use object storage + CDN.
Complex queries (joins, range scans) — use SQL.
Durability-first ledger — use WAL-backed database.

Redis vs Memcached (quick compare)

Feature	Redis	Memcached
Data structures	Strings, hashes, lists, sorted sets	Strings only
Persistence	Optional RDB/AOF	None (pure cache)
Replication	Built-in	Client-side sharding only
Typical use	Cache + session + rate limit + pub/sub	Simple object cache

Sample interview dialogue

Interviewer: "How does your cache scale?" You: "Keys shard via consistent hashing across Redis nodes with replicas for failover. App uses cache-aside with TTL and delete-on-write invalidation. I watch for hot keys and cache stampede on expiry — single-flight or jittered TTL. Redis is not durable source of truth; PostgreSQL is."

Pub/sub and streams (bonus)

Redis pub/sub delivers fire-and-forget messages — no persistence, subscribers offline miss events. Redis Streams add append-only log with consumer groups — closer to Kafka for lightweight event fan-out. Mention when interviewer asks "what else can Redis do?" — not required for cache-only deep dives.

Persistence trade-off

RDB snapshots every N minutes — fast recovery, may lose last writes. AOF logs every write — durable but slower. Most cache use cases disable persistence entirely: rebuild from DB on cold start. Session store might use AOF — clarify durability requirements before recommending.

What to say in the last five minutes

Close with: "Consistent hashing, primary-replica failover, LRU eviction with TTL, cache-aside with delete-on-write, single-flight against stampede." State clearly that money and seat inventory stay in SQL.

Mock interview checklist

Explained consistent hashing vs modulo sharding.
Described primary-replica failover.
Named eviction policy and TTL use cases.
Gave stampede mitigation strategy.
Stated when cache is wrong tool (strong consistency writes).

Closing summary

Distributed cache is fast RAM with a hash ring and replication — the infrastructure behind every "add Redis" box in other designs. Master this fundamental and you will answer deep-dive questions on half your case studies without pausing.

More in this series

How to Approach System Design Interviews (Without Panicking)Caching Fundamentals Every Interview Candidate Should Know SQL vs NoSQL — How to Choose in System Design Interviews Load Balancing and Horizontal Scaling for Interviews