DDSA Solutions
Case Study6 min read·

Design a Ride Hailing System (Uber / Lyft)

System design for ride matching: geospatial indexing, ETA, surge pricing, trip state machine, and real-time location for interview prep.

Ride hailing connects riders and drivers in real time. The interview tests geospatial queries, state machines, and WebSocket-style location updates — not just CRUD. Clarify with the framework: city-level launch vs global, and whether you need pooling or food delivery on the same platform.

Requirements

Functional

  • Rider requests trip: pickup, dropoff, ride type.
  • Match nearby available drivers; rider accepts ETA and price.
  • Live trip tracking: driver location on map until dropoff.
  • Payment settle at end; rating both ways.
  • Driver goes online/offline; accept or decline offers.

Non-functional

  • Match driver within 30 seconds in dense cities.
  • Location updates every 3–5 seconds during active trip.
  • 1M daily trips; 100K concurrent drivers in peak city.
  • Strong consistency for trip state and payment — no double charge.

Clarify matching

Greedy nearest driver is v1. Pooling, scheduled rides, and multi-stop are v2. Say which you defer.

High-level architecture

ComponentRole
Trip serviceCreate trip, state machine, pricing quote
Matching serviceFind drivers in radius; dispatch offers
Location serviceIngest GPS; index driver positions (Redis GEO / geohash)
WebSocket / push gatewayBidirectional rider and driver updates
Payment serviceAuthorize at request, capture at completion
Notification serviceDriver offer push, rider ETA (notifications)
Trip DB (PostgreSQL)Trips, statuses, fare breakdown

Trip state machine

  1. REQUESTED → rider submits; quote returned.
  2. MATCHING → matching service searches drivers.
  3. DRIVER_ASSIGNED → driver accepted; rider notified.
  4. DRIVER_ARRIVED → geofence at pickup.
  5. IN_PROGRESS → rider picked up.
  6. COMPLETED → payment captured; ratings.
  7. CANCELLED — by rider or driver with policy rules.

Store state transitions with timestamps. Use optimistic locking or row version to prevent double-accept. Payment uses idempotency keys on capture.

Geospatial matching

Drivers heartbeat location to Location service every few seconds when online. Index with Redis GEOADD or geohash grid: query GEORADIUS pickup lat/lng, 3 km, LIMIT 20. Filter by vehicle type and not already on trip. Rank by ETA (road network API or distance heuristic). Send offer to top 3 drivers serially or in parallel with 15s timeout — first accept wins.

Pricing and surge

Base fare + per-mile + per-minute. Surge multiplier when demand/supply ratio in geohash cell exceeds threshold (computed every minute from open requests vs idle drivers). Quote locked at request time for rider transparency. Surge is hot-path cache per cell.

Real-time location path

Driver app → WebSocket to gateway → Location service → Redis GEO update. Active trip: fan-out driver position to rider subscription on same trip_id channel. Do not write every GPS point to Postgres — stream to Redis; archive trip polyline to object storage after completion for disputes.

Capacity estimation

100K concurrent drivers × 1 update/4 sec ≈ 25K location writes/sec — Redis cluster handles this. 50K active trips × 2 WebSocket clients ≈ 100K connections on gateway fleet behind load balancer. Matching: 5K new requests/min in peak city × 3 driver pings each ≈ 15K offer notifications/min — async via push queue.

Failure modes

FailureMitigation
No drivers in radiusExpand radius; increase surge; retry
Driver ignores offerTimeout; offer next driver
Payment capture failsRetry; flag account; support workflow
Location stream gapLast-known position; interpolate briefly
Split-brain double assignDB transaction + unique constraint on driver active trip

API sketch

EndpointPurpose
POST /v1/tripsRequest ride; returns quote and trip_id
POST /v1/trips/{id}/cancelCancel with policy fee
GET /v1/trips/{id}Status and driver info
WS /v1/stream?trip_id=Live location and status events
POST /v1/drivers/locationHeartbeat lat/lng when online

Matching flow step by step

  1. Rider POST /trips with pickup, dropoff, payment method.
  2. Trip service validates; payment service auth hold for estimate.
  3. Matching service queries Redis GEO within 3 km; filters online + correct vehicle.
  4. Push offer to driver app with 15s TTL.
  5. Driver accept → atomic update trip + driver busy flag.
  6. WebSocket streams driver ETA and location to rider.
  7. Trip completes → capture payment (payment system pattern).

Data model sketch

Advertisement
  • trips: id, rider_id, driver_id, status, pickup, dropoff, fare, surge_multiplier
  • drivers: id, vehicle_type, online, current_trip_id
  • driver_locations: Redis GEO — not long-term Postgres rows
  • trip_events: append-only status transitions for audit

Cancellation and disputes

Rider cancel before assign: no fee. After driver en route: cancellation fee to driver. Driver no-show vs rider no-show use geofence timestamps. Trip polyline in S3 for fare dispute. All policy rules in config service — not hardcoded in app.

Latency budget

PathTarget
Request trip + quote< 300ms
Match + first offer sent< 5s p99
Location update to rider map< 2s end-to-end
Payment capture on complete< 3s

What to say in the last five minutes

Summarise: Redis GEO matching, trip state machine in Postgres, WebSocket location fan-out, surge in cache, payment auth/capture with idempotency. Defer pooling and multi-region to v2.

Sample opening (first three minutes)

Interviewer: "Design Uber." You: "Rider requests a trip; I match an available driver nearby using a geospatial index, track both parties over WebSockets, and run a trip state machine through payment. I will assume one city at first, greedy matching, and separate services for location, matching, trips, and payments with idempotent capture at the end."

Geohash grid detail

Geohash encodes lat/lng into a string prefix — nearby locations share prefix. Store drivers in cells; query pickup cell and 8 neighbors. Finer precision for dense downtown, coarser for suburbs (configurable). Alternative: Redis GEOADD is simpler in interviews — same concept, less math on the whiteboard.

Multi-region and maps

City-level deployment first: all matching in one region for low latency. ETA uses routing API (Google/OSRM) cached by (origin_cell, dest_cell) pairs. Map tiles are CDN — out of scope unless interviewer asks. Database sharding by city_id if trip history grows large.

Driver supply incentives

Surge is demand-side pricing; heatmaps push drivers toward high-demand zones (optional v2). Mention as product layer on top of location index — not required for core design pass.

Pooling and scheduled rides (v2)

Pooling matches multiple riders on overlapping routes — harder matching NP-hard heuristic. Scheduled rides pre-assign drivers 30 min ahead. Defer unless interviewer insists; mention as future complexity on top of core state machine.

Observability

  • Metrics: match time p99, cancel rate, active WebSocket connections.
  • Traces: trip_id across matching, payment, notification.
  • Alerts: payment capture failure spike, Redis GEO lag.

ETA and routing cache

Road network ETA is expensive — cache by rounded pickup/dropoff geohash pairs for 5 minutes. Fallback: haversine distance ÷ average city speed for matching rank only. Refresh ETA every 30s during trip from live driver position. Rider sees "3 min away" from cached route polyline progress.

Trust and safety (brief)

Share trip with contacts, masked phone numbers between rider and driver, and SOS button route to ops. Trip_id links all events for support. Out of core matching design but worth one sentence in senior interviews.

Mock interview checklist

  1. Drew trip state machine with clear transitions.
  2. Explained geospatial index and driver search radius.
  3. Separated location hot path from trip metadata DB.
  4. Mentioned surge pricing and quote at request time.
  5. Named WebSocket fan-out and payment idempotency.

Closing summary

Ride hailing is real-time matching plus a strict trip state machine. Nail geospatial search, location streaming, and payment capture — the rest is product features on top.

More in this series