Sharding & Partitioning — Staff Interview Quick Reference
The 60-Second Version
- Hash-based partitioning gives even distribution but makes range queries impossible. Use it for point lookups (user profiles, session stores).
- Range-based partitioning supports range queries naturally but creates hot partitions on sequential keys (timestamps, auto-increment IDs).
- Shard key selection is a multi-year commitment. Changing it later means full data migration: dual-write, backfill, verify, cutover. This is not a refactor; it is a project.
- Cross-shard queries use scatter-gather, which fans out to every shard and merges results. Design your data model so the dominant access pattern hits a single shard.
- Hot keys break the sharding contract. One viral user or popular product concentrates load on one shard. Mitigate with compound keys, virtual shards, or request-level caching.
- The real question is never "how many shards?" It is "what does the access pattern look like in three years?"
- Re-sharding is an operational migration, not a code change. Plan for it from day one by using logical shards mapped to physical nodes.
What Staff Engineers Say (That Seniors Don't)
| Concept | Senior Response | Staff Response |
|---|---|---|
| Shard key choice | "Pick the highest-cardinality column" | "Optimize for query isolation first. Cardinality matters, but cross-shard fan-out kills p99 latency." |
| Hot partitions | "Add more shards" | "More shards don't fix skew. Introduce virtual shards or split the hot key with a suffix so load disperses without resharding." |
| Re-sharding | "We'll reshard when we need to" | "We provision 2-4x logical shards from day one and rebalance by remapping, not migrating." |
| Cross-shard joins | "Use a distributed query layer" | "Scatter-gather is O(shards). Denormalize or co-locate related entities on the same shard key to avoid it entirely." |
| Range vs. hash | "Range for time-series, hash for everything else" | "Range works until write volume skews to the latest partition. Combine range with salting or use a two-level scheme." |
The Numbers That Matter
- Single PostgreSQL instance: ~10K write TPS before saturation
- Single MySQL instance: ~15K write TPS under typical workloads
- Sweet spot per shard: 100 GB - 500 GB of data; beyond 500 GB, backup, restore, and replication lag become operational risks
- Scatter-gather penalty: latency is bounded by the slowest shard, so p99 degrades as shard count grows
- Logical-to-physical ratio: start with 4-8x logical shards per physical node to allow rebalancing without data movement
Common Interview Traps
- Jumping to sharding before exhausting vertical scaling. Read replicas, connection pooling, and better indexing solve most problems below 10K TPS. State the threshold before proposing shards.
- Ignoring the re-sharding story. If your design cannot add capacity without downtime, the interviewer will press on it. Always mention logical shards and live migration.
- Choosing shard key based on write path alone. The read path usually dominates. A shard key that distributes writes perfectly but forces scatter-gather on every read is the wrong key.
- Treating sharding as only a database concern. Caches, queues, and event streams all need consistent partitioning. Mismatched partition strategies between layers cause data inconsistency.
Sharding Strategy Selection
Rendering diagram...
Re-Sharding: The Operation Nobody Wants
| Phase | What Happens | Duration | Risk |
|---|---|---|---|
| 1. Dual-write | New writes go to both old and new shard layout | Days–weeks | Data divergence if writes conflict |
| 2. Backfill | Historical data copied to new layout | Hours–days | Must be idempotent; re-runnable on failure |
| 3. Verify | Checksums compared between old and new | Hours | Missing rows surface as silent data loss |
| 4. Cutover | Read path switches to new layout | Minutes | Rollback plan must exist; test it before cutover |
| 5. Cleanup | Old shard layout decommissioned | Days | Keep old layout for 1-2 weeks as safety net |
Staff insight: The teams that re-shard smoothly are the ones who started with logical shards from day one. Re-mapping logical → physical is a config change. Re-hashing keys is a migration.
Practice Prompt
Staff-Caliber Answer ShapeExpand
- Acknowledge the conflict. user_id sharding optimizes for per-user queries. Date-range queries require scatter-gather across all 32 shards — that's 32 parallel queries with the result bounded by the slowest shard.
- Don't re-shard. Changing the shard key breaks the per-user path. Instead, add a secondary index layer: replicate events into a date-partitioned analytical store (ClickHouse, BigQuery, or a date-range-sharded read replica).
- CDC pipeline. Use change data capture to stream from the user_id-sharded OLTP store to the date-partitioned OLAP store. Latency: seconds to minutes, acceptable for analytics.
- Query routing. Per-user queries hit the OLTP shards. Date-range analytics hit the OLAP store. Each store is optimized for its access pattern.
The Staff move: Recognize that one sharding strategy cannot serve both access patterns. The answer is two specialized stores connected by a replication pipeline, not a compromise shard key.
Additional Traps
- "We'll use a UUID as shard key." UUIDs distribute well but make debugging impossible — you can't tell which shard holds a specific user's data without a lookup.
- Ignoring cross-shard transaction cost. Any operation spanning multiple shards requires distributed transactions (2PC/Saga) or accepts eventual consistency. Name which one.
- Underestimating shard count growth. Starting with 4 shards and planning to double at each threshold means you need 3 re-sharding operations to reach 32. Start with 128 logical shards mapped to 4 physical nodes.