Design a File Storage System (Dropbox / Google Drive)
System design for cloud file storage: upload chunking, metadata vs blob storage, sync, conflict resolution, and CDN delivery for interviews.
File storage interviews test whether you separate metadata from bytes, handle large uploads, and reason about sync across devices. It combines API design, object storage (S3), and caching. Clarify with the interview framework: personal files vs shared folders, max file size, and real-time sync vs eventual consistency.
Requirements
Functional
- Upload, download, delete files and folders.
- Sync across multiple devices for the same user.
- Share files with other users (read/write permissions).
- Support large files (multi-GB) with resumable upload.
- Deduplication optional — same content stored once (content-addressable).
Non-functional
- Metadata operations under 100ms; blob throughput limited by client bandwidth.
- 99.9% durability for blobs (S3 replication).
- Upload resume after network drop without re-sending completed chunks.
- ACL enforced on every metadata and download path.
High-level architecture
| Component | Role |
|---|---|
| Upload/Download API | REST + pre-signed URLs for blob transfer |
| Metadata DB (PostgreSQL) | file_id, user_id, path, version, blob_id, updated_at |
| Blob store (S3) | Actual file bytes, keyed by content hash or blob_id |
| Sync service | Long polling or WebSocket for change notifications |
| Block/chunk service | Split large files into fixed-size chunks |
| CDN | Serve popular downloads at edge |
Upload flow (large file)
- Client POST /v1/files/init { name, size, parent_folder_id } → file_id, upload_id.
- Client splits file into 4MB chunks; compute hash per chunk.
- For each chunk: POST /v1/files/{file_id}/chunks { index, hash } → pre-signed S3 PUT URL if chunk new.
- Skip upload if server already has chunk hash (dedup).
- POST /v1/files/{file_id}/complete { upload_id, chunk_list } → metadata commit.
- Metadata row points to ordered list of blob chunk IDs.
Why direct-to-S3
Bytes never stream through your API servers — only metadata and signed URLs. This is how you scale uploads without melting the app tier. Same pattern as news feed media upload.
Download flow
- GET /v1/files/{file_id}/download → check ACL.
- Resolve chunk list from metadata; generate pre-signed GET URLs (or single URL if small file).
- Client downloads chunks in parallel; reassemble locally.
- Hot public files: serve via CDN with cache key = content hash.
Sync across devices
Each file has monotonic version or updated_at. Client stores last_sync_cursor. On app open: GET /v1/sync?since=cursor → list of changed files (metadata only). Client pulls new blobs as needed. For near-real-time: WebSocket notifies "file X changed" — lighter than full chat but same push idea.
Multi-device edge cases
- Device offline for days: cursor may expire; fall back to full metadata snapshot for that user.
- Same file edited on two laptops offline: conflict copies or LWW — state clearly in interview.
- Partial upload on phone: resume with same upload_id until TTL; other devices see file only after complete.
- Delete on web while mobile is offline: tombstone in sync delta; mobile removes local copy on next sync.
Conflict resolution
Two devices edit offline: last-write-wins on metadata timestamp is simplest. Better UX: keep both versions as file (conflict copy). Interview answer: "I would start with LWW and mention conflict copies as v2." Use unique IDs for file versions.
Sharing and ACL
- shares table: file_id, grantee_user_id, permission (read/write).
- Check permission on every metadata and download request.
- Shared folder = tree of file_ids with inherited ACL (cache expanded ACL in Redis).
Data model
- files: file_id, owner_id, parent_id, name, is_folder, latest_version
- file_versions: version_id, file_id, chunk_ids[], size, created_at
- chunks: chunk_hash, s3_key, size (dedup table)
- shares: file_id, user_id, role
Capacity estimation
50M users, 5GB average stored → 250PB logical; with 30% dedup by chunk hash → ~175PB in S3. Metadata: 500 files/user × 200 bytes ≈ 5TB relational — tiny vs blobs. API: 10M DAU × 20 metadata ops/day ≈ 2,300 RPS average; upload init spikes higher. Blob egress dominates cost — CDN for shared public links, infrequent-access tier for cold archives.
Worked example: 2GB video upload
- Client calls init → receives file_id and upload_id.
- Splits into 512 × 4MB chunks; hashes each locally.
- For chunk 0: server returns pre-signed PUT URL; client uploads directly to S3.
- Chunk 47 already exists (same hash as another user's file) → server skips PUT, records chunk_hash in upload session.
- Complete commits file_versions row with ordered chunk list; sync pushes metadata delta to other devices.
- Other laptop sees new file in sync delta; downloads only missing chunks in parallel.
Capacity and cost
Storage cost dominates — S3 + infrequent access tiers. Metadata is tiny vs blobs. 100M users × 10GB average = 1EB storage — mention sharding metadata by user_id and geographic S3 buckets. API tier scales with load balancing; blob tier scales with object store.
Failure modes
| Failure | Mitigation |
|---|---|
| Chunk upload incomplete | upload_id expires; garbage-collect orphan chunks |
| Duplicate complete request | Idempotency on complete endpoint |
| S3 outage | Retry; multi-region replication for enterprise tier |
Small file fast path
Files under 5MB: single pre-signed PUT, no chunk orchestration. Metadata and blob commit in one transaction. Reduces API round-trips for photos and documents — most user files are small.
Trash and versioning
Soft-delete: set deleted_at on metadata; garbage-collect blobs after 30 days if no version references chunk hash. Version history: new row in file_versions on each save; current pointer on files table. Users restore previous version by pointing latest_version backward.
Latency budget
| Operation | Target |
|---|---|
| List folder metadata | < 100ms |
| Init upload (API only) | < 50ms |
| Chunk PUT (direct S3) | Limited by client bandwidth |
| Sync delta (metadata only) | < 200ms |
Security
- Pre-signed URLs expire in 15 minutes.
- Encrypt blobs at rest (S3 SSE).
- Virus scan optional hook on complete upload.
- ACL check on every metadata and download path.
- Rate limit upload init per user.
Public share links
Optional: share_token (random UUID) maps to file_id with read-only ACL. GET /s/{token} redirects to CDN signed URL — similar to URL shortener opaque links. Revoke by deleting share row.
API summary
| Endpoint | Purpose |
|---|---|
| POST /v1/files/init | Start upload; return file_id + upload_id |
| POST /v1/files/{id}/chunks | Get pre-signed URL per chunk |
| POST /v1/files/{id}/complete | Commit metadata after all chunks |
| GET /v1/files/{id}/download | Pre-signed GET URLs |
| GET /v1/folders/{id}/children | List folder metadata |
| GET /v1/sync?since=cursor | Delta sync for client |
Folder hierarchy
Folders are rows with is_folder=true. Path display is computed from parent chain or materialized path (/user/docs/2024). List children: SELECT * FROM files WHERE parent_id = ? AND deleted_at IS NULL — index on (parent_id, name) for fast folder browsing. Rename = update one metadata row; move = change parent_id with cycle check.
Metadata sharding
Shard PostgreSQL by user_id hash when metadata QPS grows. Each user's tree lives on one shard — no cross-shard folder moves in v1. Blobs stay in global S3; only metadata shards. Cross-user share references file_id UUID globally unique via Snowflake-style IDs.
Garbage collection
- Orphan chunks: uploaded but no file_version references after upload_id TTL.
- Deleted files: soft-delete metadata; after 30 days remove chunk refs.
- Reference-count chunks table; delete S3 object when refcount hits zero.
- Run GC as nightly async job — never on request path.
Metadata vs blob responsibilities
| Concern | Metadata DB | Blob store |
|---|---|---|
| Name, path, ACL | Yes | No |
| File bytes | No | Yes |
| Dedup by hash | Chunk registry | Content-addressed keys |
| CDN cache | No | Yes (GET URLs) |
| Transactional rename | Yes | Unchanged blobs |
Sample opening (first three minutes)
Interviewer: "Design Dropbox." You: "I will separate file metadata in PostgreSQL from blobs in S3. Uploads use pre-signed URLs so bytes never hit our API servers. Large files are chunked with content-hash dedup. Clients sync via metadata cursor and pull only changed blobs. Sharing uses ACL checks on every download."
What to say in the last five minutes
Close with: "Metadata in Postgres, blobs in S3 via pre-signed URLs, chunked upload with content-hash dedup, sync via cursor + optional WebSocket, ACL on every access." That is a complete Dropbox-level answer.
Mock interview checklist
- Separated metadata DB from blob object store.
- Walked chunked upload with content-hash dedup.
- Explained pre-signed S3 URLs — bytes bypass API tier.
- Described sync cursor and conflict strategy.
- Mentioned ACL on every access path.
Closing summary
Never route file bytes through your API at scale. Metadata path and blob path are separate designs — nail both in the interview.