Fundamentals7 min read·June 18, 2026

Message Queues and Async Processing (Kafka) for Interviews

When and how to use message queues in system design: Kafka vs RabbitMQ, pub/sub, consumer groups, ordering, and decoupling write paths in interviews.

The best system design answers use async somewhere - analytics on URL shortener clicks, fan-out in news feed, delivery in notifications. This article explains when a queue helps, what Kafka gives you, and how to draw it without name-dropping tools blindly. Pair with load balancing for consumer scaling.

Sync vs async

Sync (HTTP/RPC)	Async (queue)
Caller waits for result	Caller publishes event and returns
Simple mental model	Eventual consistency downstream
Couples availability	Decouples peak load
Good for read-your-writes	Good for fire-and-forget side effects

The sentence interviewers want

"The user-facing path returns 302 immediately; we publish a click event to Kafka and process analytics asynchronously so redirects stay fast."

Core concepts

Producer - service that publishes messages.
Topic - named stream of events (orders.created, clicks).
Partition - ordered sub-stream within a topic; enables parallelism.
Consumer group - workers share partitions; each partition consumed by one worker in group.
Offset - cursor of how far a consumer has read.

Kafka vs RabbitMQ (brief)

Kafka	RabbitMQ
Log-based; retain messages	Queue-based; often delete after ack
Replay by offset	Harder to replay history
High throughput analytics	Task queues, RPC-style work
Partition ordering	Per-queue ordering

In interviews, Kafka fits event streams and analytics; RabbitMQ/SQS fits job queues and simple task dispatch. AWS SQS is fine to mention for managed simplicity.

Ordering guarantees

Kafka orders within one partition, not across partitions. Key by user_id or conversation_id so related events land in the same partition - critical for chat message ordering per conversation. Global order requires single partition (does not scale).

Delivery semantics

At-most-once - may lose messages; rare in production.
At-least-once - default; consumers must be idempotent.
Exactly-once - Kafka transactions or dedup with idempotency keys; expensive.

Say "at-least-once with idempotent consumers" for most designs - ties to API idempotency and unique event IDs.

Worked examples on this site

URL shortener analytics

Redirect handler publishes { short_code, timestamp, ip_hash } to clicks topic. Flink or batch consumer writes to ClickHouse. Zero impact on redirect latency.

News feed fan-out

Post service publishes post_created; fan-out workers consume and update Redis timelines. Write path stays milliseconds; followers catch up async.

Notification delivery

Separate topics per channel (push, email) so email slowdown does not block push workers.

Consumer scaling

Consumers in the same group split partitions - max parallel consumers equals partition count. Need more throughput? Add partitions (plan ahead; rebalancing is costly). Separate consumer groups for inventory, email, and analytics so slow email workers do not steal partitions from fast inventory handlers. Scale horizontally behind load balancing is irrelevant here - Kafka assigns partitions, not round-robin HTTP.

Retry and exponential backoff

Transient failures (downstream DB timeout) should retry with jittered backoff inside the consumer - not infinite tight loops that poison the partition. After N failures, publish to DLQ with original payload and error reason. Ops replays DLQ after fix. Say: "Poison message does not block the whole topic forever." That is more convincing than only naming DLQ.

Backpressure and DLQ

If consumers lag, queue depth grows. Scale consumer instances (same group). Failed messages after N retries go to dead-letter queue (DLQ) for manual inspection. Alert on DLQ depth and consumer lag - ops concern interviewers like hearing.

When NOT to use a queue

User needs immediate confirmation of downstream result (payment settled).
Flow is simple CRUD with no peak decoupling need.
Team cannot operate Kafka yet - honest MVP answer: "start sync, queue when analytics lag hurts."

Schema registry and evolution

Events carry versioned JSON schema (Avro/Protobuf). Consumers tolerate unknown fields. Breaking changes get new topic or new schema version - mention briefly for senior loops.

Monitoring

Consumer lag (messages behind real time).
DLQ depth.
Publish rate vs consume rate per topic.
P99 processing time per consumer.

SQS vs Kafka quick pick

Use SQS when	Use Kafka when
Managed simplicity on AWS	Replay and stream analytics needed
Job queue with visibility timeout	Multiple consumers read same history
Low ops overhead MVP	High throughput event backbone

End-to-end tracing

Propagate correlation_id from HTTP request into message headers. Consumers log same ID - debug "order placed but email missing" across services. Standard in event-driven systems paired with notification workers.

Sample event payload

Topic orders.created: { "order_id": "sf_123", "user_id": "u_9", "total_cents": 4999, "correlation_id": "req_abc", "timestamp": 1710000000 }. Inventory service and email service consume same topic with different consumer groups - parallel independent processing.

Fan-out vs task queue

Fan-out: one message, many consumer groups each process fully (notification + analytics + search index). Task queue: one message, one worker claims job (competing consumers in same group). Kafka supports both patterns via consumer group design.

Publish and consume flow

Producer serializes event JSON; picks partition key (e.g. user_id hash).
Kafka appends to partition log; acks after replica quorum (configurable).
Consumer polls batch; processes message; updates offset on success.
Crash before offset commit → message redelivered (at-least-once).
Idempotent handler: check processed_events table or use natural idempotency key.

Capacity back-of-envelope

10K events/sec × 1KB payload ≈ 10 MB/sec ingress - modest for Kafka. Retain 7 days at roughly 860 GB/day per topic tier at that rate - tune retention. Partition count ≈ max desired parallel consumers in one group. Too few partitions → consumer scaling ceiling; too many → broker metadata overhead.

Exactly-once when payments matter

For analytics clicks, at-least-once is fine. For ledger or inventory deduction, duplicate consumption is costly. Options: idempotent consumer with business key (order_id), Kafka transactional producer + read-process-write in one transaction, or database dedup table. Most interviews stop at idempotency - only go deeper if interviewer pushes on money movement.

Transactional outbox pattern

Problem: DB commit succeeds but Kafka publish fails (or vice versa). Solution: insert business row + outbox event in one SQL transaction. Relay worker polls outbox, publishes to topic, deletes or marks sent. Same pattern links notifications to order creation without two-phase commit across Postgres and Kafka.

URL shortener click - full async path

User hits GET /abc123 → cache hit → 302 redirect in <20ms.
Handler publishes { short_code, ts, referrer } to clicks topic (non-blocking).
Analytics consumer group writes to ClickHouse.
Fraud consumer group checks referrer against blocklist asynchronously.
If Kafka briefly down: buffer in local queue or drop analytics - never block redirect.

Common mistakes in interviews

Mistake	Better answer
"We use Kafka" with no event named	Name producer, topic, consumer, payload
Queue for synchronous read	Queues for side effects only
Ignore ordering	Partition key when per-user order matters
No idempotency story	At-least-once + dedup table or natural keys
Single consumer for everything	Separate consumer groups per downstream

News feed fan-out - partition detail

Topic post_created, key = author_id so all posts from one author land in one partition (optional ordering for profile page). Fan-out workers in consumer group "timeline" read event, fetch follower list (or celebrity cache), write Redis timeline keys. Celebrity with 10M followers: hybrid - do not fan-out on write; merge on read for that author only. Connects directly to news feed hybrid model.

Choreography vs orchestration

Choreography: services react to events independently (order_created → inventory, email, analytics). Orchestration: central saga coordinator calls each step. Interviews usually prefer choreography with topics - simpler diagram. Mention saga coordinator only for strict multi-step rollback requirements (payments).

Sample opening (first three minutes)

Interviewer: "Where would you use a message queue in this design?" You: "Any side effect that must not block the user path - click analytics after a redirect, fan-out after a post, email after an order. I publish an event to a topic, return HTTP immediately, and consumers process at-least-once with idempotent handlers. Partition by user_id when order matters."

What to say in the last five minutes

Name the event, the topic, the producer, and the consumer. State at-least-once + idempotent handlers. Partition key for ordering. DLQ for poison messages.

Mock interview checklist

Explained why async (latency, decoupling, peak smoothing).
Drew producer → topic → consumer group.
Mentioned partition key for ordering.
Stated delivery semantics and idempotency.
Named DLQ or retry policy.

Closing summary

Queues turn synchronous bottlenecks into background work - use them for side effects, fan-out, and analytics, not for every read path.

More in this series

How to Approach System Design Interviews (Without Panicking)Caching Fundamentals Every Interview Candidate Should Know SQL vs NoSQL - How to Choose in System Design Interviews Load Balancing and Horizontal Scaling for Interviews