Case Study7 min read·June 18, 2026

Design a Typeahead / Autocomplete System (Google Search Bar)

System design for search autocomplete: trie vs Elasticsearch, ranking, caching hot prefixes, and handling millions of queries per second in interviews.

Typeahead is "return top 5 suggestions as the user types." It is read-heavy, latency-sensitive, and a natural fit for tries - the same prefix-tree thinking as LeetCode word search problems. Interviewers care about p99 latency under 100ms and how you rank "ap" → apple, app, application. Use the interview framework to clarify personal vs global suggestions.

Requirements

Functional

Given prefix string, return top K suggestions (e.g. K=5).
Rank by popularity (search frequency) or personal history.
Support billions of past queries; millions of DAU.
Highlight matching substring in UI (client-side; API returns plain text).

Non-functional

Latency under 100ms p99; debounce handled client-side.
Data updates as trends change (hourly or daily batch).
Graceful degradation when trie is stale - return cached popular list.
Abuse-resistant: rate limits and minimum prefix length.

Clarify scope

Google-scale typeahead vs Netflix title search vs IDE symbol complete - scale and ranking differ. Ask if suggestions are global, per-user, or blended.

High-level architecture

Component	Role
Autocomplete API	GET /suggest?q=app&limit=5
Suggestion service	Lookup + rank + merge personal/global
Trie or inverted index	Prefix → candidate terms
Popularity store	Term → score (Redis or embedded in trie)
Analytics pipeline	Aggregate query logs → update scores offline
CDN / edge cache	Cache hot prefixes ("a", "ap", "app")

Client-side behavior

The browser debounces keystrokes (150-300ms) so "application" does not fire eight API calls. Cancel in-flight fetch when the user types the next character - stale responses must not overwrite newer ones (track request sequence number). Show cached suggestions from the previous prefix while loading ("app" results visible while "appl" loads). Minimum prefix length of 2-3 chars cuts noise and storage for single-letter prefixes that are almost always cache misses at scale.

Trie-based approach

Each trie node stores top K children by score along the path. Query "app": walk a→p→p, return node's precomputed top 5. Build trie offline from aggregated logs; refresh daily. Memory: compress with array trie or DAFSA for production. Fits in RAM for millions of terms if K is small per node.

Offline job: count query frequencies, keep top N terms globally.
Build trie; at each node store heap of top K terms in subtree.
Online: traverse prefix in O(prefix length); return node.topK.

Elasticsearch / prefix index alternative

For richer ranking (freshness, location, user segment), use search index with edge n-grams or completion suggester. Higher latency (~20-50ms) but flexible scoring. Hybrid: trie for ultra-hot prefixes, ES for long tail. Mention SQL vs NoSQL - this is a search index problem, not relational.

Caching strategy

80% of traffic hits short prefixes. Cache Redis key suggest:app → JSON array of top 5. TTL 5-60 minutes. Warm cache after offline trie rebuild. Use CDN for anonymous global suggestions at edge PoPs - same idea as URL shortener redirect caching.

Personalization

Store per-user recent searches in Redis sorted set. On query, merge global trie results with user history (boost matching prefixes). Merge in app tier: take union, re-rank by blended score. Keep personal store small (last 50 queries).

Capacity estimation

10M DAU, 20 keystrokes per search session, 50% trigger API calls after debounce → 100M suggest requests/day ≈ 1,200 RPS average, ~6K peak. Each response ~200 bytes → ~1.2 MB/s average bandwidth. Trie in RAM: 10M terms × avg 10 bytes + scores ≈ hundreds of MB - feasible on few nodes with replication.

API design

GET /v1/suggest?q={prefix}&limit=5 - 200 { suggestions: [{ text, score }] }
Debounce 150-300ms on client - do not call API every keypress.
Return 429 when abusive - rate limiter per IP.

Failure modes

Failure	Mitigation
Trie node missing prefix	Return empty array; fall back to popular global list
Stale popularity	Acceptable for hours; critical news may need real-time stream update
Cache stampede on hot prefix	Single-flight rebuild; pre-warm top 1000 prefixes

Trie node structure detail

Each node stores: children map (char → node), and min-heap or sorted array of top K (term, score) pairs in its subtree. When building offline, propagate best candidates up from leaves. Space trade-off: store only top 5 per node vs full term list at leaves - top-K per node is enough for autocomplete.

Fuzzy matching (v2)

Typo tolerance ("appl" → "apple") needs edit-distance search or secondary phonetic index - out of scope for v1. Mention as extension: Elasticsearch fuzzy query or SymSpell on top of prefix trie.

Latency budget

Step	Target
CDN / Redis cache hit	5-15ms
Trie traversal in memory	< 1ms
Personal merge + re-rank	5-10ms
Miss → ES fallback	20-50ms

Offline analytics pipeline

Ingest search logs to data warehouse (BigQuery/S3).
Hourly job: aggregate query_text counts, filter bots.
Rebuild trie artifact; upload to Redis / app servers.
Atomic swap: point read path to new version flag.
Drop queries shorter than 2 chars if noisy.

Worked trie example

Terms: app(100), apple(80), apply(50), apt(40). Node at "ap" stores top-3: app, apple, apply. Query "ap" returns those three without scanning full dictionary. Query "app" returns app, apple, apply from child node. Insert new trending term offline - rebuild affected subtree tops only.

Ranking signals

Popularity (global search count) is v1. v2 blends: recency boost for trending queries, user affinity, locale, and safe-search filters. All scoring happens on a bounded candidate set from the trie - never score the whole dictionary at request time. Precompute scores offline; online merge is arithmetic on ≤50 candidates.

Scaling the read path

Autocomplete API is stateless behind load balancer. Trie artifact loaded in memory per node OR served from local SSD snapshot on boot. Blue-green deploy: new fleet warms trie, flip traffic. For multi-region: replicate read-only trie to each region; analytics pipeline still global but suggest latency drops for international users.

Abuse and safety

Minimum prefix length 2-3 chars before API call.
Rate limit per IP and per API key.
Blocklist offensive terms from trie during offline build.
Do not log full queries with PII in plain text - hash user_id in analytics.

Memory compression (DAFSA)

A naive trie with hash maps per node uses heavy pointer overhead. Production systems use array-backed tries, double-array trie, or DAFSA (minimal acyclic automaton) built offline - same prefix logic as LeetCode trie problems, but the artifact is a flat byte array loaded mmap-style on boot. Interview line: "We build the structure offline; online path is read-only traversal with no allocation." That signals you understand ops cost, not just algorithm class.

Data freshness trade-off

Hourly trie rebuild means breaking news may lag. Breaking-glass path: stream trending queries into Redis sorted set, merge at read time with offline trie for prefixes matching news keywords. Most interviews accept hourly batch - mention real-time stream as v2.

Approach	Latency	Freshness
In-memory trie only	< 5ms	Hours (batch rebuild)
Elasticsearch completion	20-50ms	Seconds (near real-time index)
Hybrid trie + Redis trends	< 15ms	Minutes for trending layer

Sample opening (first three minutes)

Interviewer: "Design search autocomplete." You: "Users type a prefix; we return top 5 suggestions under 100ms. I will assume global popularity ranking with optional personal history, offline trie built from query logs, Redis cache for hot prefixes, and cursor pagination not needed because responses are tiny. Ranking runs on a bounded candidate set, not the full index."

What to say in the last five minutes

Summarise: "Offline aggregate logs → trie with top-K per node, online O(prefix) lookup, Redis cache hot prefixes, optional personal merge from user history." Mention API and latency budget under 100ms.

Mock interview checklist

Clarified global vs personal suggestions and scale.
Explained trie with top-K per node and offline rebuild.
Named Redis/CDN cache for hot prefixes.
Gave latency budget under 100ms.
Mentioned cursor-free prefix API and rate limiting.

Closing summary

Typeahead is prefix lookup plus ranking - trie for speed, batch analytics for scores, cache for traffic concentration. Connect to trie and heap patterns from DSA prep.

More in this series

Design a URL Shortener - Complete Interview Walkthrough Design a Distributed Rate Limiter Design a News Feed (Twitter / Instagram Home Timeline)Design a Chat / Messaging System (WhatsApp / Slack DM)