DDSA Solutions
Fundamentals7 min read·

Design a Unique ID Generator (Snowflake, UUID, Auto-Increment)

How to generate globally unique IDs in distributed systems: Snowflake, UUID, database sequences, and trade-offs for system design interviews.

"How do you generate IDs in a distributed system?" appears in almost every medium-to-hard interview — URL shorteners, chat messages, and notification logs all need them. The answer is not "use UUID" without trade-offs. This guide compares approaches so you can pick one and defend it.

Functional vs non-functional requirements

Functional

  • Generate a new ID on demand from any service instance.
  • IDs must be unique across all regions and services.
  • Support batch allocation for bulk inserts (imports, backfills).

Non-functional

  • Millions of IDs per second cluster-wide without a central DB round-trip per ID.
  • Rough time order for debugging and index locality.
  • Survive single machine or AZ failure without stopping ID issuance.
  • Low collision probability — effectively zero in production.

What properties matter

  • Uniqueness — no collisions across all services and regions.
  • Roughly sortable by time — helps DB indexing and debugging.
  • High throughput — millions of IDs per second cluster-wide.
  • No single point of failure — no one DB sequence bottleneck.
  • Opaque or short — URL-friendly vs 128-bit UUID string.

Approach comparison

ApproachProsCons
Auto-increment (single DB)Simple, sortableSingle point of failure; sharding breaks global order
UUID v4 (random)No coordinationNot sortable; 36-char string; index fragmentation
UUID v7 (time-ordered)Sortable, decentralizedStill long; newer standard
Snowflake (Twitter)64-bit, sortable, fastNeeds machine ID assignment; clock skew care
DB range per serverNumeric, sortableRange allocation service; gaps on crash

Snowflake layout (say this in interviews)

64-bit integer: timestamp (ms since epoch) | datacenter_id | machine_id | sequence. Same millisecond: increment sequence. New millisecond: reset sequence. Sortable by time. ~4096 IDs per ms per machine with 12 sequence bits — billions per day per node.

Clock skew

If system clock moves backward, wait until caught up or use last timestamp + sequence. NTP drift is a real ops issue — mention it briefly.

Assigning machine IDs

  • ZooKeeper / etcd: ephemeral nodes assign worker_id on startup.
  • Kubernetes: hash pod name into fixed range (with collision risk at small scale).
  • Config per deploy — works for < 1000 machines, painful at cloud scale.

When to use what

URL shortener short codes

Need short, non-guessable strings — base62 of auto-increment or random counter, not Snowflake integer. See URL shortener ID generation section.

Chat message IDs

Snowflake or per-conversation sequence — monotonic within conversation for ordering. Global sort helps debugging across shards.

Internal order IDs

Snowflake or DB sequence with read replicas for display — SQL vs NoSQL choice for storage separate from ID format.

Implementation sketch

  1. On service start: acquire worker_id from coordination service.
  2. nextId(): lock or synchronized block per JVM/process.
  3. If now == last_timestamp: sequence++; if sequence overflow, wait next ms.
  4. Else: sequence = 0, last_timestamp = now.
  5. Return (timestamp << shift) | (dc << shift) | (worker << shift) | sequence.

UUID v4 vs v7 detail

UUID v4 is 122 random bits — no sort order, painful for B-tree indexes (random inserts). UUID v7 embeds timestamp in high bits — better locality, still 128 bits string form. Snowflake fits in 64-bit BIGINT — compact URLs and fast comparisons. Mention collision probability: UUID v4 negligible; Snowflake needs unique (datacenter, machine) pairs.

Database range allocation

Alternative to Snowflake: ID service allocates ranges — server A gets 1–1,000,000, server B gets 1,000,001–2,000,000. Simple numeric IDs, sortable, no clock dependency. On crash, gap in sequence is acceptable for many apps. Coordination service hands out ranges via atomic compare-and-swap.

Advertisement

Security and guessability

Sequential IDs leak business metrics (order count). Use random short codes for public resources (URL shortener). Internal Snowflake IDs are fine behind auth. Never expose raw auto-increment in public APIs.

Multi-region IDs

Snowflake datacenter_id bits partition ID space per region — IDs remain globally unique without cross-region coordination on every insert. UUID v4 needs no regional planning. Centralized DB sequence needs a single primary writer per global sequence — bottleneck at scale.

Worked numeric example

41-bit timestamp ms + 5-bit dc + 5-bit machine + 12-bit sequence → 4096 IDs/ms/machine. One machine ≈ 4M IDs/sec theoretical; real limit is generation code and downstream write throughput. Enough for chat and order tables.

Interview comparison table

RequirementBest fit
Public short tokenBase62 + random or counter
Time-ordered events at scaleSnowflake
No infra coordinationUUID v4
Sortable + decentralizedUUID v7 or Snowflake
Per-shard numeric onlyDB sequence per shard

Common follow-up questions

  • "What if two machines get the same worker ID?" — Partition ID assignment must be exclusive; ZooKeeper ephemeral znodes prevent duplicates.
  • "Can IDs run out?" — 64-bit space with timestamp ms is centuries; sequence overflow per ms waits 1ms.
  • "Why not UUID for messages?" — Fine at moderate scale; Snowflake gives compact numeric sort for indexes.
  • "How do shards use IDs?" — Each shard generates locally; no central coordinator per insert.

Dedicated ID service API

At scale, embed Snowflake logic in a small ID service rather than every microservice. GET /v1/ids/batch?count=100 returns 100 IDs — reduces coordination overhead. Service holds worker_id lease; app servers call over gRPC with low latency. Alternative: library jar linked into each service with shared ZooKeeper lease — fewer network hops, more deploy coupling.

Contention and hot spots

A single global auto-increment column in one Postgres primary becomes a write hot spot — every insert contends on the same B-tree page. Sharded sequences (each shard owns a range) fix throughput but lose global sort order. Snowflake avoids both: each JVM generates locally with no network hop. If you centralize in an ID service, batch requests (100 IDs per RPC) amortize latency and keep QPS on the service manageable.

K-sortable and ULID (mention briefly)

ULID and KSUID encode time in the first bytes of a 128-bit string — lexicographically sortable in text form, no coordination beyond randomness in the tail. Good when you want string IDs in logs and URLs without a 64-bit integer. Snowflake is the numeric interview default; ULID is a credible "I know newer alternatives" follow-up.

Database indexing impact

Random UUID primary keys cause B-tree page splits and fragmented indexes in PostgreSQL. Time-ordered Snowflake or UUID v7 insert at the right edge of the index — better write throughput on large tables. Say this when interviewer asks "why not UUID?" — it is an indexing argument, not just aesthetics.

FailureEffectMitigation
Duplicate worker_idID collision riskExclusive lease via ZooKeeper/etcd
Clock backward jumpDuplicate timestamp bucketWait or use last_ts until caught up
Sequence overflow in 1msCannot issue more IDs this msSpin until next millisecond
ID service downWriters blockEmbedded generator fallback with local range cache

Worked example: order and notification

Order service creates order with Snowflake order_id, publishes order_created event with same id. Notification worker and warehouse consumer both reference order_id. Support can grep logs by time-sortable ID. Contrast with random UUID — harder to debug "what happened around 3pm" without a secondary created_at index.

Sample opening (first three minutes)

Interviewer: "Design unique IDs for a distributed order service." You: "I need IDs that are unique cluster-wide, roughly time-ordered for indexing, and generated without a single DB bottleneck. I would use 64-bit Snowflake-style IDs: timestamp, datacenter, machine, sequence. Worker IDs from ZooKeeper. For public-facing short codes I would use a different encoding — not raw integers."

What to say in the last five minutes

"I would use Snowflake-style 64-bit IDs: time-sortable, no central DB bottleneck, thousands per ms per node. Assign worker IDs via ZooKeeper. For public short URLs I would use a separate base62 encoding layer." That shows you know more than one hammer.

Mock interview checklist

  1. Stated requirements: uniqueness, sortability, throughput.
  2. Compared UUID vs Snowflake vs DB auto-increment.
  3. Explained Snowflake bit layout.
  4. Mentioned clock skew and worker ID assignment.
  5. Matched ID choice to use case (URL vs message vs order).

Closing summary

Pick the ID scheme for the access pattern: short opaque codes for URLs, Snowflake for high-volume ordered events, UUID when coordination is unacceptable and sort order does not matter.

More in this series