Caching Fundamentals Every Interview Candidate Should Know
When to add a cache, cache-aside vs read-through vs write-through, TTL and invalidation, Redis vs CDN, and the consistency traps that catch candidates in system design interviews.
Caching is the first optimisation interviewers expect you to reach for on read-heavy systems. It is also where designs fall apart — stale data, thundering herds, and caches that never warm up. This article explains caching the way you would use it on a real .NET or Java backend, not as an abstract buzzword. You will apply these ideas directly in our URL shortener and rate limiter walkthroughs.
If you have implemented LRU Cache on LeetCode (problem 146), you already understand eviction. Production caching adds distribution, TTLs, and the hard question: what happens when the database and cache disagree?
Why cache at all?
Caches trade memory for latency and throughput. A PostgreSQL query that takes 15ms might become 1ms in Redis. More importantly, you protect the database from repetitive identical reads — the URL shortener redirect path, product catalog pages, user profile lookups. Without caching, viral traffic becomes a database outage.
- Read-heavy workloads with repeated keys (80/20 rule: 20% of keys serve 80% of traffic).
- Expensive computation (aggregations, ML inference results).
- Session data and rate limit counters.
- Static assets — images, JS bundles — via CDN at the edge.
When not to cache
Do not cache everything. Highly personalised, rarely repeated queries waste memory. Strongly consistent financial balances should not be served from a stale cache without careful invalidation. Ask: "If this value is 30 seconds old, is that acceptable?"
Cache placement in the stack
- Client/browser cache — HTTP Cache-Control headers.
- CDN — geographically close to users for static and some API responses.
- Application in-memory cache — per-server, ultra fast, not shared (IMemoryCache in .NET).
- Distributed cache — Redis, Memcached, shared across all app instances.
- Database buffer pool — internal to PostgreSQL/SQL Server, not your design choice but worth acknowledging.
The main caching patterns
Cache-aside (lazy loading) — default in interviews
The application owns cache logic. This is what you use in URL shortener redirects and most REST APIs.
Cache-aside read path
- App checks cache for key (e.g. url:abc123).
- Hit → return cached value immediately.
- Miss → query database.
- Write result to cache with TTL.
- Return value to client.
Cache-aside write path
- App writes to database first (source of truth).
- On success: DELETE cache key, or UPDATE cache with new value.
- Never update cache before DB commit — crash between steps causes permanent inconsistency.
Read-through
Cache sits in front of DB; cache library loads on miss transparently. Less application code, but you need a cache product that supports it. ORM second-level caches often work this way.
Write-through
Writes go to cache and DB synchronously. Stronger consistency, higher write latency. Used when reads must never see stale data after a write.
Write-behind (write-back)
Writes ack to cache immediately; async flush to DB. High write throughput, risk of data loss on crash. Mention for analytics buffers, not for bank transfers.
| Pattern | Read path | Write path | Typical use |
|---|---|---|---|
| Cache-aside | App checks cache first | App invalidates or updates cache | General purpose APIs |
| Read-through | Cache loads from DB | App writes DB, invalidates | ORM-heavy apps |
| Write-through | Cache always warm | Sync to cache + DB | Inventory with strict reads |
| Write-behind | From cache | Async DB persist | Metrics, logs, counters |
TTL and invalidation
Time-to-live (TTL) is a safety net. Even if invalidation fails, data expires. Short TTL (30–60s) for semi-dynamic content; long TTL (hours) for static catalog data with explicit purge on update.
Invalidation strategies:
- Delete key on write — simplest cache-aside approach.
- Versioned keys — cache user:123:v5; bump version on update instead of hunting all derived keys.
- Pub/sub broadcast — on update, all app servers evict local caches (useful with in-memory L1 + Redis L2).
- Event-driven — change data capture from DB streams invalidates downstream caches.
Thundering herd
Cache expires on a hot key; 10,000 requests simultaneously miss and hammer the database. Mitigations: probabilistic early expiry, request coalescing (only one thread reloads), or a short-lived mutex lock in Redis during refresh.
Redis vs CDN vs in-process
| Layer | Latency | Shared? | Best for |
|---|---|---|---|
| In-process (IMemoryCache) | Nanoseconds | No — per pod only | Config, reference data, L1 hot keys |
| Redis / Memcached | Sub-ms to 1ms | Yes — all app servers | Sessions, API cache, rate limits |
| CDN edge | 5–30ms globally | Yes — per PoP | Static assets, cacheable GET responses |
In .NET: register IMemoryCache for L1, IDistributedCache backed by StackExchange.Redis for L2. CDN handles Cache-Control headers on static files — your API still sets TTL for cacheable JSON if needed.
Consistency vocabulary
- Cache hit ratio — % of reads served from cache; monitor this in production.
- Stale read — user sees old value after someone else updated; acceptable for social likes, not for seat reservations.
- Cold start — empty cache after deploy; warm critical keys or use gradual rollout.
- Cache penetration — queries for non-existent keys bypass cache every time; use short TTL on negative cache entries.
Worked example: URL shortener redirect
Follow the redirect path from our URL shortener case study. User requests GET /abc123. App checks Redis key url:abc123. Hit → return long URL in 1ms. Miss → SELECT from PostgreSQL read replica → SET Redis with 24h TTL → redirect. On link delete, DEL url:abc123. On update, DEL or SET new value. Draw this flow in an interview before mentioning CDN — it shows you understand the core loop.
.NET implementation sketch
In ASP.NET Core, register IDistributedCache with StackExchange.Redis. Cache-aside pseudocode: var cached = await cache.GetStringAsync(key); if (cached != null) return cached; var fromDb = await repo.GetAsync(key); await cache.SetStringAsync(key, fromDb, new DistributedCacheEntryOptions { AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(5) }); return fromDb. For L1+L2, wrap IMemoryCache around Redis for config and reference data within a single pod. This is production-grade .NET, not pseudocode fantasy.
Monitoring what matters
- Hit ratio — below 80% on a read-heavy path means TTL too short, wrong keys, or insufficient memory.
- Eviction rate — Redis evicting keys means memory maxed; scale cluster or tighten TTLs.
- Miss latency p99 — spikes indicate thundering herd or cold start after deploy.
- Stale read incidents — track version mismatches after writes; alert if user-facing.
Interviewers at product companies appreciate when you mention observability without being asked. Caching is not fire-and-forget — bad cache config causes subtle production bugs for weeks.
Interview sound bite
When the interviewer asks "how would you speed this up?", say: "This path is read-heavy with repeated keys. I would add a Redis cache-aside layer with a 5-minute TTL, invalidate on write, and put static assets on a CDN. I would monitor hit ratio and watch for thundering herd on hot keys." That single paragraph covers pattern, technology, invalidation, and operational awareness.
Cache stampede — step-by-step mitigation
Step 1: hot key expires. Step 2: 10,000 concurrent requests miss. Step 3: all 10,000 hit PostgreSQL. Step 4: database latency spikes, timeouts cascade. Mitigation A: single-flight — first miss acquires a lock, others wait for reload. Mitigation B: stale-while-revalidate — serve stale value while one worker refreshes. Mitigation C: jittered TTL so keys do not expire simultaneously.
CAP theorem in one interview sentence
You cannot simultaneously guarantee perfect consistency, full availability, and tolerance to network partitions. Caches choose availability + partition tolerance with eventual consistency. Bank ledgers choose consistency over availability during a partition. Saying which letter you sacrifice — and why — is often enough at mid-level without a full lecture.
When the interviewer says "design a news feed"
Feeds are a caching and fan-out problem. Precompute timelines (fan-out on write) and cache per user in Redis — fast reads, heavy writes when celebrities post. Or assemble on read (fan-out on read) — lighter writes, slower reads for users following thousands. Most answers blend both: cache hot users, assemble cold ones. Read our full news feed walkthrough and the LeetCode-to-systems bridge (BFS fan-out).