DDSA Solutions
Case Study8 min read·

Design a News Feed (Twitter / Instagram Home Timeline)

How to design a social media news feed for interviews: fan-out on write vs fan-out on read, ranking, caching celebrity users, and storage trade-offs.

The news feed is one of the most common system design prompts at product companies. It sounds simple — "show me recent posts from people I follow" — but the moment you mention celebrities with 50 million followers, the naive design breaks. This walkthrough follows the same framework as our interview guide and leans heavily on caching.

Requirements

Functional

  • Users publish posts (text, image, video metadata).
  • Users follow other users.
  • Home feed shows recent posts from followed users, ranked by recency or engagement.
  • Pagination: load older posts on scroll.

Non-functional

  • Feed read latency under 200ms p99.
  • Write path should not block on fan-out to millions of followers.
  • Scale: 500M users, 200M DAU, average 200 follows per user.
  • Celebrity accounts may have 10M+ followers.

Clarify ranking

Is chronological order enough, or do you need ML ranking (likes, comments, affinity)? For most interviews, start with reverse-chronological, then mention ranking as a v2 layer on top of candidate post IDs.

Capacity estimation

Assume 200M DAU, each views feed 5 times/day → 1B feed reads/day ≈ 12,000 reads/sec average, ~60,000/sec peak. Posts: 100M new posts/day ≈ 1,200 writes/sec. Storage: if average post is 500 bytes metadata + media in object storage, posts DB grows ~50GB/day before replication. Feed cache per user might hold 500 post IDs × 8 bytes = 4KB — for 200M active users that is 800GB if everyone is cached (you will not cache everyone).

High-level architecture

Split write path (publish post) from read path (load feed). Both paths share user graph, post storage, and media CDN.

ComponentRole
Post serviceAccept new posts, store metadata, enqueue fan-out job
User graph serviceFollow / unfollow relationships
Feed serviceAssemble timeline for a user on read
Timeline cache (Redis)Precomputed list of post IDs per user
Post DB (SQL or NoSQL)Post content, author, timestamp
Object storage + CDNImages and video
Fan-out workersPush post IDs into follower timelines asynchronously

Fan-out on write vs fan-out on read

Fan-out on write (push model)

When user A posts, a worker inserts the post ID into every follower's timeline cache. Reads are fast: fetch prebuilt list from Redis, hydrate post details, return. Problem: if A has 10M followers, one post triggers 10M Redis writes.

Fan-out on read (pull model)

On feed load, query posts from all users A follows, merge by timestamp, return top N. Simple for writes, expensive for reads when users follow thousands of accounts. Works for low-follow-count users if you hybrid.

StrategyWrite costRead costBest for
Fan-out on writeHigh for celebritiesLowNormal users (< 10K followers)
Fan-out on readLowHighUsers following thousands of accounts
HybridBounded writes + selective pullModerateProduction systems at scale

The hybrid approach (say this in interviews)

Fan-out on write for users with fewer than, say, 10,000 followers. For celebrities, store the post once and merge their recent posts at read time from a celebrity feed shard. This is how Twitter historically handled the "Justin Bieber problem."

Write path step by step

  1. Client POST /posts with text and media upload URL.
  2. Post service validates, writes row to posts table, returns post_id.
  3. Publish event to message queue: { post_id, author_id, timestamp }.
  4. Fan-out worker consumes event: load follower list from graph service.
  5. For each follower under threshold: LPUSH post_id to Redis key feed:{follower_id} (trim to max 1000).
  6. For celebrity authors: skip fan-out; post lives in celebrity timeline only.
  7. Upload media async to S3; CDN serves on read.

Read path step by step

  1. Client GET /feed?cursor=...
  2. Fetch post ID list from Redis feed:{user_id}.
  3. Merge with celebrity posts (pull recent from followed celebrities).
  4. Deduplicate, sort by timestamp, take page size (e.g. 20).
  5. Batch GET post details from DB or post cache by IDs.
  6. Optional: ranking layer reorders the 20 candidates.
  7. Return JSON with author info and media URLs.

Data model sketch

  • posts: post_id, user_id, content, created_at, media_url
  • follows: follower_id, followee_id, created_at (index both directions)
  • feed cache: Redis list feed:{user_id} → [post_id, ...]

Failure modes

Advertisement
FailureBehaviour
Fan-out worker lagUser sees own post immediately; followers see delay of seconds
Redis missFall back to fan-out on read from DB — slower but correct
Celebrity postNever fan-out; always merged at read

Unfollow, block, and deleted posts

When user B unfollows A, stop fanning A's future posts into B's timeline — but you do not need to purge historical IDs immediately; they age out as the Redis list is trimmed. Blocks are stronger: filter A's post IDs at read time even if they remain in cache. Deleted posts should publish a tombstone event so fan-out workers and read path can remove or hide the post_id. Mentioning this shows you think about graph changes, not just the happy path.

Hot keys and sharding timelines

A celebrity does not fan-out on write, but millions of users may still read the same hot post metadata. Cache post bodies by post_id in Redis with TTL — classic cache-aside. Timeline lists themselves can shard across Redis Cluster by hash of user_id so no single node holds every feed. If one influencer triggers read spikes, CDN + post cache absorbs it; timeline list size stays bounded by LTRIM.

Ranking layer (v2)

Reverse-chronological is v1. Engagement ranking scores each candidate post: affinity (how often you interact with author), recency decay, and engagement velocity (likes in last hour). Fetch 200 recent post IDs from cache, score in the app tier or a ranking service, return top 20. Never rank the entire database — narrow candidates first, then rank. ML models are optional depth; describing the candidate → score → sort pipeline is enough for most interviews.

Database choice per component

Posts and follows fit PostgreSQL with indexes on user_id and created_at. Timelines belong in Redis, not SQL — see our SQL vs NoSQL guide for why polyglot persistence fits here. Media blobs live in S3; only URLs in the posts table.

Scaling the read path

Feed API servers sit behind an L7 load balancer with a stateless app tier. Each request: Redis timeline fetch → optional celebrity merge → batch post hydrate. Use connection pooling to PostgreSQL read replicas for cache misses. Target sub-200ms p99 by keeping the critical path to one Redis round-trip plus one batched DB query.

API design

EndpointMethodResponse
POST /v1/postsCreate post201 { post_id, created_at }
GET /v1/feed?cursor=&limit=20Home timeline200 { posts[], next_cursor }
GET /v1/users/{id}/posts?cursor=Profile posts200 paginated
POST /v1/users/{id}/followFollow204
DELETE /v1/users/{id}/followUnfollow204
POST /v1/media/uploadPre-signed S3 URL201 { upload_url, media_id }

Use cursor pagination on feed and profile — see API design. Return 429 on post spam via rate limiter.

Media upload flow

  1. Client requests POST /v1/media/upload with content_type.
  2. Server returns pre-signed S3 PUT URL and media_id.
  3. Client uploads bytes directly to S3 (offloads bandwidth from app tier).
  4. Client POST /v1/posts with { content, media_id }.
  5. CDN serves media_url on feed read — same pattern as URL shortener redirect offload.

Follow graph at scale

follows table with (follower_id, followee_id) indexed both ways supports "who does A follow?" for fan-out and "who follows B?" for follower counts. At billions of edges, shard by follower_id for fan-out reads. Celebrity followee_id rows are few but fan-out workers batch them separately. Graph DB is optional depth unless the prompt is social recommendations.

Latency budget for feed read

StepTarget p99
LB + auth5ms
Redis LRANGE feed:{user_id}2ms
Merge celebrity posts (small set)5ms
Batch hydrate 20 posts from cache/DB15ms
Serialize JSON response3ms

Total ~30ms server-side leaves headroom for network on a 200ms p99 SLA. Cache post bodies by post_id to make hydrate a Redis MGET, not 20 SQL round-trips.

What to say in the last five minutes

Close with: "Hybrid fan-out avoids celebrity write explosions, async workers keep post creation fast, Redis timelines make reads cheap per user, and we merge celebrity content at read. Post metadata in Postgres, timelines in Redis, media on S3/CDN. I would add engagement ranking on a bounded candidate set and rate-limit creates per user." That hits the trade-offs interviewers score highest.

Mock interview checklist

  1. Clarified functional vs non-functional requirements and ranking scope.
  2. Did napkin math for reads/sec, writes/sec, and cache size.
  3. Explained fan-out on write vs read and the celebrity hybrid.
  4. Walked write path and read path separately.
  5. Named Redis for timelines and SQL for posts/follows.
  6. Mentioned failure modes (worker lag, cache miss fallback).

Closing summary

Propose hybrid fan-out, async workers, Redis timelines, and celebrity exception. Mention rate limiting on post creation and caching for hot post metadata. That answer covers the hard part interviewers care about.

More in this series