Design a Sharded Database | StaffSignal Playbook

Technologies referenced in this playbook: PostgreSQL · Cassandra · DynamoDB · ZooKeeper & etcd

How to Use This Playbook

This playbook supports three reading modes:

Mode	Time	What to Read
Quick Review	15 min	Executive Summary → Interview Walkthrough → Fault Lines (§3) → Drills (§7)
Targeted Study	1-2 hrs	Executive Summary → Interview Walkthrough → Core Flow, expand appendices where you're weak
Deep Dive	3+ hrs	Everything, including all appendices

What is Database Sharding? — Why interviewers pick this topic

The Problem

Sharding splits a single database into multiple smaller databases (shards), each holding a subset of the data. When a single database instance can no longer handle the write throughput, storage volume, or query load your application demands, you distribute the data horizontally across multiple instances. Each shard is an independent database that owns a portion of the keyspace.

Common Use Cases

Write-heavy workloads: Social feeds, messaging, IoT telemetry where a single master can't keep up with insert volume
Storage ceiling: Tables exceeding single-node storage limits (multi-TB datasets)
Latency isolation: Keeping per-shard working sets in memory for consistent query performance
Tenant isolation: SaaS platforms where each customer's data lives on dedicated shards for compliance or performance guarantees

Why Interviewers Ask About This

Sharding is the canonical "simple concept, brutal execution" problem. Everyone knows you split data across nodes. The Staff question is: how do you choose the shard key, what happens when you choose wrong, and who owns the rebalancing when traffic patterns shift? Interviewers want to see you reason about the organizational cost of a sharding decision that's nearly impossible to reverse.

Mechanics Refresher: Partitioning Strategies — Hash vs range vs directory-based routing

Strategy	How It Works	Sweet Spot	Failure Mode
Hash partitioning	`shard = hash(key) % N`	Uniform distribution, no range queries needed	Adding shards requires rehashing (consistent hashing mitigates)
Range partitioning	Contiguous key ranges per shard (e.g., A-M, N-Z)	Range scans, time-series data, locality	Hot ranges (recent data, popular prefixes)
Directory-based	Lookup table maps key → shard	Flexible, supports rebalancing	Directory is SPOF, adds latency
Consistent hashing	Hash ring with virtual nodes	Adding/removing shards moves minimal data	Uneven load without enough virtual nodes

Why this matters for fault lines: Hash partitioning eliminates hot ranges but kills range queries. Range partitioning preserves locality but creates hot shards. Every shard key decision forces this tradeoff — and the wrong choice is a 6-month migration to fix.

What This Interview Actually Tests

Sharding is not a scaling question. Everyone knows "split the data across machines."

This is a data architecture ownership question that tests:

Whether you reason about shard key selection as a business decision, not a technical one
Whether you understand that resharding is the most dangerous operation in your system
Whether you can articulate who pays when cross-shard queries are needed
Whether you design for the rebalancing that will inevitably happen

The key insight: Sharding is the most expensive decision to reverse in a database architecture. Staff engineers treat the shard key as a contract with the organization, not a config parameter.

The L5 vs L6 Contrast (Memorize This)

Level Calibration

Behavior	L5 (Senior)	L6 (Staff)
First move	"We'll shard by user_id"	Asks "What are the access patterns and what queries must stay single-shard?"
Shard key	Picks the obvious primary key	Evaluates 3-4 candidates against read patterns, write distribution, and growth trajectory
Cross-shard queries	"We'll use scatter-gather"	Quantifies the cost: "Scatter-gather across 64 shards at p99 means the slowest shard dominates latency"
Rebalancing	"We'll use consistent hashing"	Asks "What's the blast radius of a rebalance? Can we do it live without downtime?"
Ownership	Focuses on the sharding library	Asks "Who owns the shard map? What happens when a new feature needs a query that crosses shard boundaries?"

The Three Intents (Pick One and Commit)

Intent	Constraint	Strategy	Growth Model
Write Throughput	Single-master bottleneck	Hash-based, maximize write distribution	Horizontal write scaling
Data Isolation	Compliance, tenancy, blast radius	Tenant-per-shard or range-based	Isolation guarantees per partition
Query Locality	Read latency, working set size	Range or composite key, co-locate related data	Keep hot data on fewer shards

🎯 Staff Insight: "I'll assume we're sharding to solve write throughput at scale — our single master is hitting the insert ceiling. This means hash-based partitioning optimized for even write distribution, accepting that some read patterns will require scatter-gather."

The Four Fault Lines (The Core of This Interview)

Shard Key Selection — Every key optimizes for one access pattern and penalizes others. Who decides which queries matter most?
Cross-Shard Operations — Joins, transactions, and aggregations that span shards. How expensive are they, and who absorbs the cost?
Rebalancing & Resharding — When traffic patterns shift, shards become uneven. How do you rebalance without downtime, and who owns the migration?
Shard-Aware Application Layer — Does the application know about shards, or is it abstracted away? Who pays for the abstraction leaking?

Each fault line has a tradeoff matrix with explicit "who pays" analysis. See §3.

Quick Reference: What Interviewers Probe

After You Say...	They Will Ask...
"Shard by user_id"	"What about queries that need data across users? Analytics? Admin dashboards?"
"Consistent hashing"	"Walk me through what happens when you add a shard. How much data moves?"
"Scatter-gather for cross-shard"	"What's the p99 latency when one of 64 shards is slow? What's your timeout strategy?"
"We'll use Vitess/Citus"	"What happens when the abstraction leaks? When does the app need to know about shards?"
"Range partition by time"	"How do you handle the hot shard problem? All writes go to the latest partition."

Jump to Practice

→ Active Drills (§7) — 8 practice prompts with expected answer shapes

System Architecture Overview

Rendering diagram...

Interview Walkthrough

The six phases below are compressed for a deep-dive format. Phases 1-3 deliver the crisp answer in 2-3 minutes. If probed, Phase 5 has depth for 15+ minutes.

Phase 1: Requirements & Framing (30 seconds)

Name when to shard, not how:

"Sharding is the last resort for scaling, not the first. Before sharding: read replicas (read scaling), vertical scaling (bigger machine), caching (reduce load), query optimization (reduce per-query cost). Shard only when data volume exceeds a single node's storage or write throughput exceeds a single node's capacity."

Phase 2: Core Entities & API (30 seconds)

Shard: a partition of the dataset on a separate database instance
Shard Key: the column used to determine which shard a row belongs to (e.g., user_id, tenant_id)
Router: the component that maps a query's shard key to the correct shard
Shard Map: the mapping from shard key ranges (or hash ranges) to physical database instances

Phase 3: The 2-Minute Explanation (2 minutes)

Staff-grade phrasing

"Three sharding strategies:

1. Range-based sharding. Shard key ranges map to shards: user_id 1-1M → shard 1, 1M-2M → shard 2. Pro: range queries are efficient (scan one shard). Con: hot spots — if new users (high IDs) are most active, shard N is hot while shard 1 is cold.

2. Hash-based sharding. Hash the shard key, mod by shard count: hash(user_id) % N. Pro: uniform distribution, no hot spots. Con: range queries require scatter-gather across all shards.

3. Directory-based sharding. A lookup table maps each key to its shard. Pro: flexible — can move individual keys between shards. Con: the directory is a SPOF and adds a lookup on every query."

"I'd default to hash-based for most workloads: uniform distribution is more important than range query efficiency. If the workload is primarily range-based (time-series, log data), use range-based with auto-splitting."

Phase 4: Transition to Depth (15 seconds)

"Sharding strategy is table stakes. The hard problems are: choosing the shard key, handling cross-shard queries, and rebalancing when shards become uneven."

Phase 5: Deep Dives (5-15 minutes if probed)

Probe 1: "How do you choose the shard key?" (3-5 min)

"The shard key must satisfy three properties: (1) high cardinality (many distinct values), (2) uniform distribution (no hot spots), (3) query locality (most queries can be answered by one shard)."

Walk through examples:

user_id as shard key: High cardinality ✓. Uniform distribution ✓ (with hash-based). Query locality ✓ for user-scoped queries. ✗ for queries that span users (admin dashboards, analytics).
timestamp as shard key: High cardinality ✓. Non-uniform distribution ✗ (recent data is hot, old data is cold). Query locality ✓ for time-range queries.
tenant_id (multi-tenant SaaS): Medium cardinality. Non-uniform distribution ✗ (one large tenant dominates). Query locality ✓ for tenant-scoped queries. "The one large tenant needs its own shard — that's the 'large tenant' problem."

The anti-pattern: "Never shard by a low-cardinality column (status, country). 'status = active' is 90% of rows — that shard is 9x larger than the others."

Probe 2: "How do you handle cross-shard queries?" (3-5 min)

"A query that doesn't include the shard key can't be routed to a single shard. It must scatter to all shards and gather the results. This is the fundamental tax of sharding."

Walk through the options:

Scatter-gather: Send the query to all N shards, merge results. Latency: max(shard response times). Works for simple aggregations (COUNT, SUM). "At 16 shards, you're sending 16 queries and waiting for the slowest. The tail latency of scatter-gather degrades linearly with shard count."
Denormalization: Duplicate data across shards to avoid cross-shard lookups. "If orders are sharded by user_id and we need 'orders by product', maintain a secondary index sharded by product_id. The cost: dual writes and eventual consistency."
Secondary index service: Build a separate search index (Elasticsearch) that indexes across all shards. "Query the index to find shard+key pairs, then fetch full records from the correct shards."

Probe 3: "How do you rebalance?" (3-5 min)

"A shard grows to 2x the size of others (hot tenant, uneven data growth). We need to split it without downtime."

Walk through the process:

Online migration: Start writing to both old and new shard simultaneously (dual-write). Backfill the new shard with existing data. Once backfill is complete and caught up, switch reads to the new shard. Drop the old data.
Virtual shards (pre-splitting): Instead of N physical shards, create 10×N virtual shards mapped to N physical nodes. Rebalancing moves virtual shards between physical nodes — no data re-partitioning needed. "This is how MongoDB, CockroachDB, and DynamoDB work."
Consistent hashing: Adding a new shard only remaps ~1/N of the key space. Other shards are unaffected. "Better than modular hashing where adding a shard remaps most keys."

Phase 6: Wrap-Up

"Database sharding is an organizational commitment, not just a technical decision. Once you shard, every query must be shard-aware. Every schema migration runs N times. Every backup is N backups. The Staff-level insight: delay sharding as long as possible, choose the shard key based on your primary access pattern, and use virtual shards so rebalancing is minutes, not days."

Quick-Reference: The 30-Second Cheat Sheet