DDSA Solutions
Case Study6 min read·

Design a Video Streaming Platform (Netflix / YouTube)

System design for video streaming at scale: upload pipeline, transcoding, adaptive bitrate, CDN delivery, and playback APIs for interview prep.

Video streaming interviews test whether you understand the difference between metadata and multi-gigabyte blobs, and whether you can explain why playback must never hit your origin database. Netflix and YouTube are read-heavy systems with an expensive write path (transcoding) and a cheap read path (CDN). Start with the interview framework: clarify live vs on-demand, mobile vs TV, and whether uploads are in scope.

Requirements

Functional

  • Creators upload video; viewers browse catalog and play with seek/pause/resume.
  • Adaptive quality: switch bitrate based on bandwidth (360p → 4K).
  • Resume playback from last position across devices.
  • Search and recommendations (optional v2 — mention but defer depth).

Non-functional

  • 100M DAU, peak evening traffic in each timezone.
  • Playback start under 2 seconds on good network.
  • Uploads can take minutes; playback is latency-sensitive.
  • High availability for reads; eventual consistency OK for view counts.

Clarify live vs VOD

Live streaming (Twitch, sports) needs low-latency ingest and segment buffers measured in seconds. On-demand (Netflix) tolerates minutes of transcoding before publish. Mixing both in one interview usually means you pick one and say what you would add later.

High-level architecture

ComponentRole
Upload API + resumable chunksAccept large files; store raw in object storage (S3)
Transcoding workersFFmpeg jobs: H.264/H.265 at multiple bitrates and resolutions
Manifest serviceHLS/DASH playlist linking segment URLs per quality level
Metadata DB (PostgreSQL)Title, owner, status, duration, poster URL
Blob store + CDNSegments at edge; origin shield reduces S3 egress
Playback APIReturn signed manifest URL + resume offset
Progress service (Redis)Last watched position per user per video

Upload and transcoding pipeline

  1. Client requests presigned multipart upload URL for video_id.
  2. Chunks land in S3 `raw/{video_id}/`; completion event on Kafka.
  3. Transcoder fleet pulls job: produce 360p, 720p, 1080p segments (2–6 sec each).
  4. Write `transcoded/{video_id}/{quality}/segment_N.ts` to S3; update metadata status = ready.
  5. Generate master HLS manifest listing variant streams.
  6. Invalidate CDN cache for poster thumbnail only — segments are immutable URLs.

Transcoding is CPU-heavy and embarrassingly parallel — scale worker pool independently from API servers. Failed jobs retry with backoff; poison videos go to dead-letter queue for manual review. Never block the upload HTTP response on transcoding — return `202 Accepted` with `video_id` and poll status or use WebSocket (chat pattern for status push).

Adaptive bitrate playback

HLS/DASH clients download a manifest, then pick a quality rung based on measured throughput. Each segment is a separate HTTP GET — perfect for CDN caching at edge PoPs worldwide. Player buffers 3–5 segments ahead; on bandwidth drop it switches to a lower manifest variant without rebuffering if possible. Interview tip: say "segments are immutable" — cache TTL can be weeks.

CDN and origin shield

Without CDN, 10M concurrent viewers × 5 Mbps average = 50 Tbps — impossible from one data centre. CloudFront/Akamai/Fastly cache `.ts` segments at edge. Origin shield (mid-tier cache) collapses duplicate misses to S3. Signed URLs or signed cookies prevent hot-linking; short TTL (hours) on manifest, long on segments. Geographic load balancing sends users to nearest PoP via DNS anycast.

Capacity estimation

Assume 1M videos, average 500 MB raw each → ~500 TB raw storage. A transcoded bitrate ladder (360p/720p/1080p) often totals roughly 1–2× raw size depending on codec and segment count — budget ~0.5–1 PB transcoded, not “raw × number of qualities.” 50M views/day, average watch 20 min: at ~3 Mbps effective throughput that is ~450 MB per full session from the CDN (not origin). Peak 5M concurrent × 3 Mbps ≈ 15 Tbps CDN egress — you buy bandwidth at the edge, not serve from one data centre. Metadata DB: 1M titles × 2 KB = 2 GB — trivial in PostgreSQL with read replicas.

Resume playback and progress

Store `user_id, video_id → offset_seconds` in Redis with TTL 90 days. On play start, API merges progress into manifest response. Client heartbeats every 30 sec (async, fire-and-forget) — do not block playback. Conflict: two devices — last-write-wins is fine for Netflix; live co-watch is out of scope. Progress writes are AP; losing a heartbeat loses at most 30 sec of position.

Advertisement

Search and catalog

Full-text search on title, description, tags via Elasticsearch — index updated when metadata status = ready. Trending and home feed can reuse news feed fan-out patterns or precomputed rails per region. Keep search off the playback hot path.

API sketch

  • POST /videos — initiate upload, return video_id + presigned URLs
  • GET /videos/{id}/playback — signed manifest URL + resume_offset
  • PUT /videos/{id}/progress — { position_sec } (async)
  • GET /videos/search?q= — Elasticsearch proxy

Failure modes

FailureMitigation
Transcoder crash mid-jobIdempotent job id; resume from last completed segment
CDN miss storm on new viral videoPre-warm top N PoPs; origin shield
S3 outage in one regionMulti-region replication for popular catalog
Stale manifest after transcodeVersion manifest URL; CDN short TTL on .m3u8
Upload interruptedMultipart resume with completed part ETags

Security and DRM (brief)

Premium content uses Widevine/FairPlay encryption — license server validates subscription before decryption key. Interviewers at Netflix may go deep; for general loops, mention "encrypted segments + license endpoint" and move on unless prompted. Payment subscription gates playback API.

Live streaming extension

Ingest RTMP/WebRTC to media server; segment live into 2 sec HLS chunks with 10–30 sec total latency. No full transcode pipeline — single bitrate or limited ladder. Chat and reactions use separate WebSocket channel. Different beast from VOD — say so explicitly.

Sample opening (first three minutes)

Interviewer: "Design YouTube." You: "I will focus on on-demand upload and playback at global scale. Upload goes to object storage, async transcode to HLS segments, metadata in SQL, delivery via CDN with adaptive bitrate. Playback API returns a signed manifest and resume position from Redis. I will estimate CDN bandwidth and keep transcoding off the critical read path."

View count and analytics

Increment view counter async on play start — Kafka event, batch aggregate to data warehouse. Do not synchronously write every view to SQL on the playback path. Trending list computed offline hourly. Same pattern as URL shortener click analytics.

What to say in the last five minutes

Close with: "S3 raw upload, async FFmpeg transcode to HLS, CDN serves segments, signed manifest API, resume in Redis." Mention adaptive bitrate and that transcoding is never on the playback hot path.

Mock interview checklist

  1. Separated upload/transcode (write) from CDN playback (read).
  2. Explained HLS segments and adaptive bitrate.
  3. Named object storage + CDN + metadata DB.
  4. Discussed resume progress and signed URLs.
  5. Gave rough capacity numbers for storage and peak bandwidth.

Closing summary

Video streaming is a blob delivery problem with a heavy offline pipeline. Transcode once, serve millions of times from the edge. Pair with file storage chunking intuition and caching for CDN — that trio covers most streaming interviews.

More in this series