Design a Typeahead / Autocomplete System (Google Search Bar)
System design for search autocomplete: trie vs Elasticsearch, ranking, caching hot prefixes, and handling millions of queries per second in interviews.
Typeahead is "return top 5 suggestions as the user types." It is read-heavy, latency-sensitive, and a natural fit for tries — the same prefix-tree thinking as LeetCode word search problems. Interviewers care about p99 latency under 100ms and how you rank "ap" → apple, app, application. Use the interview framework to clarify personal vs global suggestions.
Requirements
Functional
- Given prefix string, return top K suggestions (e.g. K=5).
- Rank by popularity (search frequency) or personal history.
- Support billions of past queries; millions of DAU.
- Highlight matching substring in UI (client-side; API returns plain text).
Non-functional
- Latency under 100ms p99; debounce handled client-side.
- Data updates as trends change (hourly or daily batch).
- Graceful degradation when trie is stale — return cached popular list.
- Abuse-resistant: rate limits and minimum prefix length.
Clarify scope
Google-scale typeahead vs Netflix title search vs IDE symbol complete — scale and ranking differ. Ask if suggestions are global, per-user, or blended.
High-level architecture
| Component | Role |
|---|---|
| Autocomplete API | GET /suggest?q=app&limit=5 |
| Suggestion service | Lookup + rank + merge personal/global |
| Trie or inverted index | Prefix → candidate terms |
| Popularity store | Term → score (Redis or embedded in trie) |
| Analytics pipeline | Aggregate query logs → update scores offline |
| CDN / edge cache | Cache hot prefixes ("a", "ap", "app") |
Client-side behavior
The browser debounces keystrokes (150–300ms) so "application" does not fire eight API calls. Cancel in-flight fetch when the user types the next character — stale responses must not overwrite newer ones (track request sequence number). Show cached suggestions from the previous prefix while loading ("app" results visible while "appl" loads). Minimum prefix length of 2–3 chars cuts noise and storage for single-letter prefixes that are almost always cache misses at scale.
Trie-based approach
Each trie node stores top K children by score along the path. Query "app": walk a→p→p, return node's precomputed top 5. Build trie offline from aggregated logs; refresh daily. Memory: compress with array trie or DAFSA for production. Fits in RAM for millions of terms if K is small per node.
- Offline job: count query frequencies, keep top N terms globally.
- Build trie; at each node store heap of top K terms in subtree.
- Online: traverse prefix in O(prefix length); return node.topK.
Elasticsearch / prefix index alternative
For richer ranking (freshness, location, user segment), use search index with edge n-grams or completion suggester. Higher latency (~20–50ms) but flexible scoring. Hybrid: trie for ultra-hot prefixes, ES for long tail. Mention SQL vs NoSQL — this is a search index problem, not relational.
Caching strategy
80% of traffic hits short prefixes. Cache Redis key suggest:app → JSON array of top 5. TTL 5–60 minutes. Warm cache after offline trie rebuild. Use CDN for anonymous global suggestions at edge PoPs — same idea as URL shortener redirect caching.
Personalization
Store per-user recent searches in Redis sorted set. On query, merge global trie results with user history (boost matching prefixes). Merge in app tier: take union, re-rank by blended score. Keep personal store small (last 50 queries).
Capacity estimation
10M DAU, 20 keystrokes per search session, 50% trigger API calls after debounce → 100M suggest requests/day ≈ 1,200 RPS average, ~6K peak. Each response ~200 bytes → ~1.2 MB/s average bandwidth. Trie in RAM: 10M terms × avg 10 bytes + scores ≈ hundreds of MB — feasible on few nodes with replication.
API design
- GET /v1/suggest?q={prefix}&limit=5 — 200 { suggestions: [{ text, score }] }
- Debounce 150–300ms on client — do not call API every keypress.
- Return 429 when abusive — rate limiter per IP.
Failure modes
| Failure | Mitigation |
|---|---|
| Trie node missing prefix | Return empty array; fall back to popular global list |
| Stale popularity | Acceptable for hours; critical news may need real-time stream update |
| Cache stampede on hot prefix | Single-flight rebuild; pre-warm top 1000 prefixes |
Trie node structure detail
Each node stores: children map (char → node), and min-heap or sorted array of top K (term, score) pairs in its subtree. When building offline, propagate best candidates up from leaves. Space trade-off: store only top 5 per node vs full term list at leaves — top-K per node is enough for autocomplete.
Fuzzy matching (v2)
Typo tolerance ("appl" → "apple") needs edit-distance search or secondary phonetic index — out of scope for v1. Mention as extension: Elasticsearch fuzzy query or SymSpell on top of prefix trie.
Latency budget
| Step | Target |
|---|---|
| CDN / Redis cache hit | 5–15ms |
| Trie traversal in memory | < 1ms |
| Personal merge + re-rank | 5–10ms |
| Miss → ES fallback | 20–50ms |
Offline analytics pipeline
- Ingest search logs to data warehouse (BigQuery/S3).
- Hourly job: aggregate query_text counts, filter bots.
- Rebuild trie artifact; upload to Redis / app servers.
- Atomic swap: point read path to new version flag.
- Drop queries shorter than 2 chars if noisy.
Worked trie example
Terms: app(100), apple(80), apply(50), apt(40). Node at "ap" stores top-3: app, apple, apply. Query "ap" returns those three without scanning full dictionary. Query "app" returns app, apple, apply from child node. Insert new trending term offline — rebuild affected subtree tops only.
Ranking signals
Popularity (global search count) is v1. v2 blends: recency boost for trending queries, user affinity, locale, and safe-search filters. All scoring happens on a bounded candidate set from the trie — never score the whole dictionary at request time. Precompute scores offline; online merge is arithmetic on ≤50 candidates.
Scaling the read path
Autocomplete API is stateless behind load balancer. Trie artifact loaded in memory per node OR served from local SSD snapshot on boot. Blue-green deploy: new fleet warms trie, flip traffic. For multi-region: replicate read-only trie to each region; analytics pipeline still global but suggest latency drops for international users.
Abuse and safety
- Minimum prefix length 2–3 chars before API call.
- Rate limit per IP and per API key.
- Blocklist offensive terms from trie during offline build.
- Do not log full queries with PII in plain text — hash user_id in analytics.
Memory compression (DAFSA)
A naive trie with hash maps per node uses heavy pointer overhead. Production systems use array-backed tries, double-array trie, or DAFSA (minimal acyclic automaton) built offline — same prefix logic as LeetCode trie problems, but the artifact is a flat byte array loaded mmap-style on boot. Interview line: "We build the structure offline; online path is read-only traversal with no allocation." That signals you understand ops cost, not just algorithm class.
Data freshness trade-off
Hourly trie rebuild means breaking news may lag. Breaking-glass path: stream trending queries into Redis sorted set, merge at read time with offline trie for prefixes matching news keywords. Most interviews accept hourly batch — mention real-time stream as v2.
| Approach | Latency | Freshness |
|---|---|---|
| In-memory trie only | < 5ms | Hours (batch rebuild) |
| Elasticsearch completion | 20–50ms | Seconds (near real-time index) |
| Hybrid trie + Redis trends | < 15ms | Minutes for trending layer |
Sample opening (first three minutes)
Interviewer: "Design search autocomplete." You: "Users type a prefix; we return top 5 suggestions under 100ms. I will assume global popularity ranking with optional personal history, offline trie built from query logs, Redis cache for hot prefixes, and cursor pagination not needed because responses are tiny. Ranking runs on a bounded candidate set, not the full index."
What to say in the last five minutes
Summarise: "Offline aggregate logs → trie with top-K per node, online O(prefix) lookup, Redis cache hot prefixes, optional personal merge from user history." Mention API and latency budget under 100ms.
Mock interview checklist
- Clarified global vs personal suggestions and scale.
- Explained trie with top-K per node and offline rebuild.
- Named Redis/CDN cache for hot prefixes.
- Gave latency budget under 100ms.
- Mentioned cursor-free prefix API and rate limiting.
Closing summary
Typeahead is prefix lookup plus ranking — trie for speed, batch analytics for scores, cache for traffic concentration. Connect to trie and heap patterns from DSA prep.