Technologies referenced in this playbook: PostgreSQL · Cassandra · DynamoDB · ZooKeeper & etcd
How to Use This Playbook
This playbook supports three reading modes:
| Mode | Time | What to Read |
|---|---|---|
| Quick Review | 15 min | Executive Summary → Interview Walkthrough → Fault Lines (§3) → Drills (§7) |
| Targeted Study | 1-2 hrs | Executive Summary → Interview Walkthrough → Core Flow, expand appendices where you're weak |
| Deep Dive | 3+ hrs | Everything, including all appendices |
What is Database Sharding? — Why interviewers pick this topic
The Problem
Sharding splits a single database into multiple smaller databases (shards), each holding a subset of the data. When a single database instance can no longer handle the write throughput, storage volume, or query load your application demands, you distribute the data horizontally across multiple instances. Each shard is an independent database that owns a portion of the keyspace.
Common Use Cases
- Write-heavy workloads: Social feeds, messaging, IoT telemetry where a single master can't keep up with insert volume
- Storage ceiling: Tables exceeding single-node storage limits (multi-TB datasets)
- Latency isolation: Keeping per-shard working sets in memory for consistent query performance
- Tenant isolation: SaaS platforms where each customer's data lives on dedicated shards for compliance or performance guarantees
Why Interviewers Ask About This
Sharding is the canonical "simple concept, brutal execution" problem. Everyone knows you split data across nodes. The Staff question is: how do you choose the shard key, what happens when you choose wrong, and who owns the rebalancing when traffic patterns shift? Interviewers want to see you reason about the organizational cost of a sharding decision that's nearly impossible to reverse.
Mechanics Refresher: Partitioning Strategies — Hash vs range vs directory-based routing
| Strategy | How It Works | Sweet Spot | Failure Mode |
|---|---|---|---|
| Hash partitioning | shard = hash(key) % N | Uniform distribution, no range queries needed | Adding shards requires rehashing (consistent hashing mitigates) |
| Range partitioning | Contiguous key ranges per shard (e.g., A-M, N-Z) | Range scans, time-series data, locality | Hot ranges (recent data, popular prefixes) |
| Directory-based | Lookup table maps key → shard | Flexible, supports rebalancing | Directory is SPOF, adds latency |
| Consistent hashing | Hash ring with virtual nodes | Adding/removing shards moves minimal data | Uneven load without enough virtual nodes |
Why this matters for fault lines: Hash partitioning eliminates hot ranges but kills range queries. Range partitioning preserves locality but creates hot shards. Every shard key decision forces this tradeoff — and the wrong choice is a 6-month migration to fix.
What This Interview Actually Tests
Sharding is not a scaling question. Everyone knows "split the data across machines."
This is a data architecture ownership question that tests:
- Whether you reason about shard key selection as a business decision, not a technical one
- Whether you understand that resharding is the most dangerous operation in your system
- Whether you can articulate who pays when cross-shard queries are needed
- Whether you design for the rebalancing that will inevitably happen
The key insight: Sharding is the most expensive decision to reverse in a database architecture. Staff engineers treat the shard key as a contract with the organization, not a config parameter.
The L5 vs L6 Contrast (Memorize This)
| Behavior | L5 (Senior) | L6 (Staff) |
|---|---|---|
| First move | "We'll shard by user_id" | Asks "What are the access patterns and what queries must stay single-shard?" |
| Shard key | Picks the obvious primary key | Evaluates 3-4 candidates against read patterns, write distribution, and growth trajectory |
| Cross-shard queries | "We'll use scatter-gather" | Quantifies the cost: "Scatter-gather across 64 shards at p99 means the slowest shard dominates latency" |
| Rebalancing | "We'll use consistent hashing" | Asks "What's the blast radius of a rebalance? Can we do it live without downtime?" |
| Ownership | Focuses on the sharding library | Asks "Who owns the shard map? What happens when a new feature needs a query that crosses shard boundaries?" |
The Three Intents (Pick One and Commit)
| Intent | Constraint | Strategy | Growth Model |
|---|---|---|---|
| Write Throughput | Single-master bottleneck | Hash-based, maximize write distribution | Horizontal write scaling |
| Data Isolation | Compliance, tenancy, blast radius | Tenant-per-shard or range-based | Isolation guarantees per partition |
| Query Locality | Read latency, working set size | Range or composite key, co-locate related data | Keep hot data on fewer shards |
🎯 Staff Insight: "I'll assume we're sharding to solve write throughput at scale — our single master is hitting the insert ceiling. This means hash-based partitioning optimized for even write distribution, accepting that some read patterns will require scatter-gather."
The Four Fault Lines (The Core of This Interview)
-
Shard Key Selection — Every key optimizes for one access pattern and penalizes others. Who decides which queries matter most?
-
Cross-Shard Operations — Joins, transactions, and aggregations that span shards. How expensive are they, and who absorbs the cost?
-
Rebalancing & Resharding — When traffic patterns shift, shards become uneven. How do you rebalance without downtime, and who owns the migration?
-
Shard-Aware Application Layer — Does the application know about shards, or is it abstracted away? Who pays for the abstraction leaking?
Each fault line has a tradeoff matrix with explicit "who pays" analysis. See §3.
Quick Reference: What Interviewers Probe
| After You Say... | They Will Ask... |
|---|---|
| "Shard by user_id" | "What about queries that need data across users? Analytics? Admin dashboards?" |
| "Consistent hashing" | "Walk me through what happens when you add a shard. How much data moves?" |
| "Scatter-gather for cross-shard" | "What's the p99 latency when one of 64 shards is slow? What's your timeout strategy?" |
| "We'll use Vitess/Citus" | "What happens when the abstraction leaks? When does the app need to know about shards?" |
| "Range partition by time" | "How do you handle the hot shard problem? All writes go to the latest partition." |
Jump to Practice
→ Active Drills (§7) — 8 practice prompts with expected answer shapes
System Architecture Overview
Interview Walkthrough
The six phases below are compressed for a deep-dive format. Phases 1-3 deliver the crisp answer in 2-3 minutes. If probed, Phase 5 has depth for 15+ minutes.
Phase 1: Requirements & Framing (30 seconds)
Name when to shard, not how:
- "Sharding is the last resort for scaling, not the first. Before sharding: read replicas (read scaling), vertical scaling (bigger machine), caching (reduce load), query optimization (reduce per-query cost). Shard only when data volume exceeds a single node's storage or write throughput exceeds a single node's capacity."
Phase 2: Core Entities & API (30 seconds)
- Shard: a partition of the dataset on a separate database instance
- Shard Key: the column used to determine which shard a row belongs to (e.g., user_id, tenant_id)
- Router: the component that maps a query's shard key to the correct shard
- Shard Map: the mapping from shard key ranges (or hash ranges) to physical database instances
Phase 3: The 2-Minute Explanation (2 minutes)
Phase 4: Transition to Depth (15 seconds)
"Sharding strategy is table stakes. The hard problems are: choosing the shard key, handling cross-shard queries, and rebalancing when shards become uneven."
Phase 5: Deep Dives (5-15 minutes if probed)
Probe 1: "How do you choose the shard key?" (3-5 min)
"The shard key must satisfy three properties: (1) high cardinality (many distinct values), (2) uniform distribution (no hot spots), (3) query locality (most queries can be answered by one shard)."
Walk through examples:
- user_id as shard key: High cardinality ✓. Uniform distribution ✓ (with hash-based). Query locality ✓ for user-scoped queries. ✗ for queries that span users (admin dashboards, analytics).
- timestamp as shard key: High cardinality ✓. Non-uniform distribution ✗ (recent data is hot, old data is cold). Query locality ✓ for time-range queries.
- tenant_id (multi-tenant SaaS): Medium cardinality. Non-uniform distribution ✗ (one large tenant dominates). Query locality ✓ for tenant-scoped queries. "The one large tenant needs its own shard — that's the 'large tenant' problem."
The anti-pattern: "Never shard by a low-cardinality column (status, country). 'status = active' is 90% of rows — that shard is 9x larger than the others."
Probe 2: "How do you handle cross-shard queries?" (3-5 min)
"A query that doesn't include the shard key can't be routed to a single shard. It must scatter to all shards and gather the results. This is the fundamental tax of sharding."
Walk through the options:
- Scatter-gather: Send the query to all N shards, merge results. Latency: max(shard response times). Works for simple aggregations (COUNT, SUM). "At 16 shards, you're sending 16 queries and waiting for the slowest. The tail latency of scatter-gather degrades linearly with shard count."
- Denormalization: Duplicate data across shards to avoid cross-shard lookups. "If orders are sharded by user_id and we need 'orders by product', maintain a secondary index sharded by product_id. The cost: dual writes and eventual consistency."
- Secondary index service: Build a separate search index (Elasticsearch) that indexes across all shards. "Query the index to find shard+key pairs, then fetch full records from the correct shards."
Probe 3: "How do you rebalance?" (3-5 min)
"A shard grows to 2x the size of others (hot tenant, uneven data growth). We need to split it without downtime."
Walk through the process:
- Online migration: Start writing to both old and new shard simultaneously (dual-write). Backfill the new shard with existing data. Once backfill is complete and caught up, switch reads to the new shard. Drop the old data.
- Virtual shards (pre-splitting): Instead of N physical shards, create 10×N virtual shards mapped to N physical nodes. Rebalancing moves virtual shards between physical nodes — no data re-partitioning needed. "This is how MongoDB, CockroachDB, and DynamoDB work."
- Consistent hashing: Adding a new shard only remaps ~1/N of the key space. Other shards are unaffected. "Better than modular hashing where adding a shard remaps most keys."
Phase 6: Wrap-Up
"Database sharding is an organizational commitment, not just a technical decision. Once you shard, every query must be shard-aware. Every schema migration runs N times. Every backup is N backups. The Staff-level insight: delay sharding as long as possible, choose the shard key based on your primary access pattern, and use virtual shards so rebalancing is minutes, not days."
Quick-Reference: The 30-Second Cheat Sheet
| Topic | The L5 Answer | The L6 Answer (say this) |
|---|---|---|
| When to shard | "When data grows" | "Last resort — after read replicas, caching, vertical scaling, and query optimization" |
| Strategy | "Hash the primary key" | "Hash-based default; range-based for time-series; directory-based for key-level migration" |
| Shard key | "Use the primary key" | "High cardinality + uniform distribution + query locality — choose which to sacrifice" |
| Cross-shard queries | "Join across shards" | "Scatter-gather degrades linearly — design to minimize cross-shard queries" |
| Rebalancing | "Migrate data" | "Virtual shards for minutes-not-days rebalancing; consistent hashing for shard additions" |