Design a Video Streaming Platform (Netflix / YouTube)
System design for video streaming at scale: upload pipeline, transcoding, adaptive bitrate, CDN delivery, and playback APIs for interview prep.
Video streaming interviews test whether you understand the difference between metadata and multi-gigabyte blobs, and whether you can explain why playback must never hit your origin database. Netflix and YouTube are read-heavy systems with an expensive write path (transcoding) and a cheap read path (CDN). Start with the interview framework: clarify live vs on-demand, mobile vs TV, and whether uploads are in scope.
Requirements
Functional
- Creators upload video; viewers browse catalog and play with seek/pause/resume.
- Adaptive quality: switch bitrate based on bandwidth (360p → 4K).
- Resume playback from last position across devices.
- Search and recommendations (optional v2 — mention but defer depth).
Non-functional
- 100M DAU, peak evening traffic in each timezone.
- Playback start under 2 seconds on good network.
- Uploads can take minutes; playback is latency-sensitive.
- High availability for reads; eventual consistency OK for view counts.
Clarify live vs VOD
Live streaming (Twitch, sports) needs low-latency ingest and segment buffers measured in seconds. On-demand (Netflix) tolerates minutes of transcoding before publish. Mixing both in one interview usually means you pick one and say what you would add later.
High-level architecture
| Component | Role |
|---|---|
| Upload API + resumable chunks | Accept large files; store raw in object storage (S3) |
| Transcoding workers | FFmpeg jobs: H.264/H.265 at multiple bitrates and resolutions |
| Manifest service | HLS/DASH playlist linking segment URLs per quality level |
| Metadata DB (PostgreSQL) | Title, owner, status, duration, poster URL |
| Blob store + CDN | Segments at edge; origin shield reduces S3 egress |
| Playback API | Return signed manifest URL + resume offset |
| Progress service (Redis) | Last watched position per user per video |
Upload and transcoding pipeline
- Client requests presigned multipart upload URL for video_id.
- Chunks land in S3 `raw/{video_id}/`; completion event on Kafka.
- Transcoder fleet pulls job: produce 360p, 720p, 1080p segments (2–6 sec each).
- Write `transcoded/{video_id}/{quality}/segment_N.ts` to S3; update metadata status = ready.
- Generate master HLS manifest listing variant streams.
- Invalidate CDN cache for poster thumbnail only — segments are immutable URLs.
Transcoding is CPU-heavy and embarrassingly parallel — scale worker pool independently from API servers. Failed jobs retry with backoff; poison videos go to dead-letter queue for manual review. Never block the upload HTTP response on transcoding — return `202 Accepted` with `video_id` and poll status or use WebSocket (chat pattern for status push).
Adaptive bitrate playback
HLS/DASH clients download a manifest, then pick a quality rung based on measured throughput. Each segment is a separate HTTP GET — perfect for CDN caching at edge PoPs worldwide. Player buffers 3–5 segments ahead; on bandwidth drop it switches to a lower manifest variant without rebuffering if possible. Interview tip: say "segments are immutable" — cache TTL can be weeks.
CDN and origin shield
Without CDN, 10M concurrent viewers × 5 Mbps average = 50 Tbps — impossible from one data centre. CloudFront/Akamai/Fastly cache `.ts` segments at edge. Origin shield (mid-tier cache) collapses duplicate misses to S3. Signed URLs or signed cookies prevent hot-linking; short TTL (hours) on manifest, long on segments. Geographic load balancing sends users to nearest PoP via DNS anycast.
Capacity estimation
Assume 1M videos, average 500 MB raw each → ~500 TB raw storage. A transcoded bitrate ladder (360p/720p/1080p) often totals roughly 1–2× raw size depending on codec and segment count — budget ~0.5–1 PB transcoded, not “raw × number of qualities.” 50M views/day, average watch 20 min: at ~3 Mbps effective throughput that is ~450 MB per full session from the CDN (not origin). Peak 5M concurrent × 3 Mbps ≈ 15 Tbps CDN egress — you buy bandwidth at the edge, not serve from one data centre. Metadata DB: 1M titles × 2 KB = 2 GB — trivial in PostgreSQL with read replicas.
Resume playback and progress
Store `user_id, video_id → offset_seconds` in Redis with TTL 90 days. On play start, API merges progress into manifest response. Client heartbeats every 30 sec (async, fire-and-forget) — do not block playback. Conflict: two devices — last-write-wins is fine for Netflix; live co-watch is out of scope. Progress writes are AP; losing a heartbeat loses at most 30 sec of position.
Search and catalog
Full-text search on title, description, tags via Elasticsearch — index updated when metadata status = ready. Trending and home feed can reuse news feed fan-out patterns or precomputed rails per region. Keep search off the playback hot path.
API sketch
- POST /videos — initiate upload, return video_id + presigned URLs
- GET /videos/{id}/playback — signed manifest URL + resume_offset
- PUT /videos/{id}/progress — { position_sec } (async)
- GET /videos/search?q= — Elasticsearch proxy
Failure modes
| Failure | Mitigation |
|---|---|
| Transcoder crash mid-job | Idempotent job id; resume from last completed segment |
| CDN miss storm on new viral video | Pre-warm top N PoPs; origin shield |
| S3 outage in one region | Multi-region replication for popular catalog |
| Stale manifest after transcode | Version manifest URL; CDN short TTL on .m3u8 |
| Upload interrupted | Multipart resume with completed part ETags |
Security and DRM (brief)
Premium content uses Widevine/FairPlay encryption — license server validates subscription before decryption key. Interviewers at Netflix may go deep; for general loops, mention "encrypted segments + license endpoint" and move on unless prompted. Payment subscription gates playback API.
Live streaming extension
Ingest RTMP/WebRTC to media server; segment live into 2 sec HLS chunks with 10–30 sec total latency. No full transcode pipeline — single bitrate or limited ladder. Chat and reactions use separate WebSocket channel. Different beast from VOD — say so explicitly.
Sample opening (first three minutes)
Interviewer: "Design YouTube." You: "I will focus on on-demand upload and playback at global scale. Upload goes to object storage, async transcode to HLS segments, metadata in SQL, delivery via CDN with adaptive bitrate. Playback API returns a signed manifest and resume position from Redis. I will estimate CDN bandwidth and keep transcoding off the critical read path."
View count and analytics
Increment view counter async on play start — Kafka event, batch aggregate to data warehouse. Do not synchronously write every view to SQL on the playback path. Trending list computed offline hourly. Same pattern as URL shortener click analytics.
What to say in the last five minutes
Close with: "S3 raw upload, async FFmpeg transcode to HLS, CDN serves segments, signed manifest API, resume in Redis." Mention adaptive bitrate and that transcoding is never on the playback hot path.
Mock interview checklist
- Separated upload/transcode (write) from CDN playback (read).
- Explained HLS segments and adaptive bitrate.
- Named object storage + CDN + metadata DB.
- Discussed resume progress and signed URLs.
- Gave rough capacity numbers for storage and peak bandwidth.
Closing summary
Video streaming is a blob delivery problem with a heavy offline pipeline. Transcode once, serve millions of times from the edge. Pair with file storage chunking intuition and caching for CDN — that trio covers most streaming interviews.