Why This Matters
System design interviews are estimation conversations. Every capacity plan, every SLA negotiation, every sharding decision starts with a number. Not a precise number — an order-of-magnitude number that tells you whether your architecture is in the right ballpark or off by a factor of 100.
Interviewers do not care whether an SSD random read takes 100 microseconds or 150 microseconds. They care whether you think it takes 100 milliseconds — because that changes your entire design. A candidate who places a network hop at 0.5ms (correct) designs a microservice chain differently from one who places it at 50ms (wrong). The numbers are not trivia. They are the language of trade-off conversations.
Staff candidates do something that senior candidates rarely do: they connect numbers to decisions. "At 100K QPS, a single Postgres instance can serve reads from its buffer pool. At 500K QPS, we need a caching layer. At 5M QPS, we need sharded caches." The numbers narrow the design space. Without them, you're guessing — and the interviewer knows it.
This guide teaches you not just what the numbers are, but how to use them. By the end, you should be able to take any "design X for Y users" prompt and produce a capacity estimate in under two minutes — the same estimate that a Staff engineer would produce on a whiteboard.
The 60-Second Version
- System design interviews test whether you can size a system without a calculator. Wrong orders of magnitude signal you have not operated production infrastructure.
- Interviewers do not expect exact figures. They expect you to stay within 2x of reality. Being 10x off on latency or throughput raises immediate credibility concerns.
- Memorize order of magnitude, not decimal places. L1 cache is nanoseconds, disk seek is milliseconds, cross-region is tens to hundreds of milliseconds.
- Numbers anchor every capacity plan, every SLA discussion, and every sharding decision. They are not trivia — they are the language of trade-off conversations.
- Staff candidates connect numbers to architectural choices: "At 100K QPS we need horizontal scaling; at 1K QPS a single Postgres instance is fine."
- Round aggressively. Use powers of 10. State your assumptions out loud. This is what interviewers actually evaluate.
- Seconds in a day: ~100K. This single conversion is the most-used tool in estimation. 1M events/day = ~10 QPS.
How Estimation Works
The Core Skill
Back-of-envelope estimation is a three-step process:
- Start with a known anchor. A real-world number you're confident about — daily active users, average payload size, a throughput ceiling from the numbers table below.
- Chain multiplications. Walk from the anchor to the number you need. Each step should involve a single, justifiable assumption.
- Sanity check against a known system. Does your result make sense compared to systems at similar scale?
The entire process takes 60–90 seconds in an interview. The interviewer is evaluating three things: whether your anchors are in the right ballpark, whether your chain of reasoning is logical, and whether you catch your own mistakes before they have to point them out.
The Two Conversions You'll Use Every Time
Daily volume to QPS:
QPS = daily_volume ÷ 86,400
≈ daily_volume ÷ 100,000 (round for speed)
1M events/day ÷ 100K = 10 QPS
100M events/day ÷ 100K = 1,000 QPS
1B events/day ÷ 100K = 10,000 QPS
Storage per year:
annual_storage = daily_events × avg_size × 365
1M events/day × 1 KB = 1 GB/day → ~365 GB/year
1B events/day × 100 bytes = 100 GB/day → ~36 TB/year
These two conversions handle 80% of estimation questions. Practice them until they're automatic.
Peak vs. Average
Average QPS is useless for capacity planning. You provision for peak load, which is typically 3–10x average depending on the workload type:
| Workload | Peak / Average Ratio | Why |
|---|---|---|
| Social media feeds | 3–5x | Morning/evening usage spikes |
| E-commerce | 5–10x | Flash sales, Black Friday |
| Gaming | 2–3x | Predictable session-based usage |
| Financial trading | 10–50x | Market open, news events |
| SaaS / B2B | 2–3x | Business hours concentration |
Staff rule: Always state both. "Average 1K QPS, peak 5K QPS" tells the interviewer you understand capacity planning. "1K QPS" alone tells them you don't.
The Numbers
Latency Numbers
These are the numbers that shape every architecture decision. Memorize the order-of-magnitude column — the exact numbers are for reference.
| Operation | Latency | Order of Magnitude | Design Implication |
|---|---|---|---|
| L1 cache reference | 1 ns | nanoseconds | CPU-bound work is measured here |
| L2 cache reference | 4 ns | nanoseconds | Still "free" — no design concern |
| Main memory reference | 100 ns | nanoseconds | In-process caches operate at this speed |
| SSD random read | 100 μs | microseconds | 10,000 reads/sec per SSD |
| SSD sequential read (1 MB) | 1 ms | milliseconds | Sequential is 10x faster than random |
| Network round-trip, same AZ | 0.5 ms | sub-millisecond | Baseline for microservice call chains |
| Network round-trip, same region | 1–2 ms | milliseconds | Cost of synchronous cross-AZ replication |
| Disk seek (HDD) | 10 ms | milliseconds | 100 seeks/sec — this is why SSDs won |
| Network round-trip, cross-region | 50–150 ms | tens of milliseconds | Hard floor for global architectures |
The key insight: There is a 1000x gap between memory (100ns) and network (0.5ms), and another 100x gap between same-region (1ms) and cross-region (100ms). These gaps are where architectural decisions live. When someone says "add a cache," they're exploiting the memory-network gap. When someone says "deploy to multiple regions," they're paying the same-region-to-cross-region tax.
Throughput Numbers
| System | Throughput | Notes | Design Implication |
|---|---|---|---|
| Single web server | ~10K req/s | CPU-bound; I/O-bound varies | Fine for most internal services |
| Redis (single thread) | ~100K ops/s | ~500K with pipelining | One instance handles most applications |
| Memcached (multithreaded) | ~500K ops/s | Pure key-value, no data structures | Choose over Redis for raw throughput |
| Kafka (per partition) | ~1M msgs/s | Throughput scales linearly with partitions | Add partitions, not clusters |
| Postgres | ~10K writes/s, ~50K reads/s | Tuned config, SSDs | The ceiling before you need sharding |
| MySQL (InnoDB) | ~15K writes/s | Commodity hardware | Similar to Postgres for write ceiling |
| 1 Gbps network link | ~120 MB/s | Practical ceiling after protocol overhead | Plan for 70% utilization |
| 10 Gbps network link | ~1.2 GB/s | Standard data center interconnect | Rarely the bottleneck |
Storage & Scale Numbers
| Calculation | Result | Rule of Thumb |
|---|---|---|
| 1M users × 1 KB each | 1 GB | Fits in RAM on a single machine |
| 10M users × 10 KB each | 100 GB | Fits on one SSD, queryable from one Postgres |
| 1B events/day × 100 bytes | 100 GB/day, ~36 TB/year | Plan for compression + retention policy |
| 500M tweets/day × 300 bytes | 150 GB/day, ~55 TB/year | Before indexes or replicas |
| Seconds in a day | ~86,400 (~100K) | The most-used conversion in estimation |
| Seconds in a month | ~2.6M (~2.5M) | For monthly volume calculations |
| Seconds in a year | ~31.5M (~30M) | For annual projections |
Availability Nines
Every SLA conversation uses the "nines" shorthand. Know these cold — and know how to compute composite availability across dependencies.
| Availability | Annual Downtime | Monthly Downtime | Common Tier |
|---|---|---|---|
| 99% (two nines) | 3.65 days | 7.3 hours | Batch systems, internal tools |
| 99.9% (three nines) | 8.76 hours | 43 minutes | Most SaaS, business apps |
| 99.95% | 4.38 hours | 21.6 minutes | Standard cloud SLA (EC2, RDS) |
| 99.99% (four nines) | 52.6 minutes | 4.3 minutes | Payment systems, core infra |
| 99.999% (five nines) | 5.26 minutes | 26 seconds | Telecom, emergency services |
Composite availability: If your service depends on N independent services in series, the composite availability is the product. Three dependencies at 99.95% each: 0.9995³ = 99.85%, not 99.95%.
Parallel redundancy: Two instances of a 99.9% service in active-active: unavailability = 0.001² = 0.000001 → 99.9999%. Redundancy is how you buy nines.
Quick Conversion Table
| From | To | Rule | Example |
|---|---|---|---|
| Daily volume → QPS | ÷ 100K | 1M/day ≈ 10 QPS | 500M/day ≈ 5K QPS |
| QPS → daily volume | × 100K | 100 QPS ≈ 10M/day | 50K QPS ≈ 5B/day |
| GB/day → MB/s | ÷ 86,400 × 1,000 | 100 GB/day ≈ 1.2 MB/s | 1 TB/day ≈ 12 MB/s |
| Users → concurrent | × 0.01 to 0.10 | 10M users → 100K–1M concurrent | Depends on session length |
| MAU → DAU | × 0.30 to 0.50 | 100M MAU → 30–50M DAU | Highly product-dependent |
| Peak → average | × 3 to 10 | Average 1K QPS → peak 3K–10K | E-commerce is toward 10x |
| Raw storage → actual | × 3 (replicas) × 1.4 (indexes) | 100 TB raw → ~420 TB actual | Include replication and indexing |
Back-of-Envelope Reasoning
The value of these numbers is in chaining them to reach architectural conclusions. Here are the reasoning patterns you'll use in interviews.
Pattern 1: Volume → QPS → Database Fit
Question: "A URL shortener handles 100M new URLs per day. Can a single database handle the writes?"
100M writes/day ÷ 100K seconds/day = 1,000 writes/sec
Postgres handles ~10K writes/sec
→ A single Postgres instance handles this comfortably (10% capacity)
→ No sharding needed for writes at this scale
Pattern 2: Volume → Storage → Retention Strategy
Question: "A chat app handles 1B messages per day, average 200 bytes each. How much storage per year?"
1B × 200 bytes = 200 GB/day raw
× 365 = 73 TB/year raw
× 3 (replicas) = 219 TB/year
× 1.3 (indexes) = ~285 TB/year actual
→ This needs sharding and a retention strategy
→ Hot data (last 30 days): 200 GB × 30 × 3 = 18 TB — fits in a sharded cluster
→ Cold data: tier to object storage after 90 days
Pattern 3: Users → Bandwidth → CDN Need
Question: "An image service serves 10M images per day, average 500 KB each. Do we need a CDN?"
10M × 500 KB = 5 TB/day outbound
5 TB / 86,400 seconds = ~58 MB/s sustained
At 120 MB/s per 1 Gbps link → 50% of one link's capacity
Peak at 5x average → 290 MB/s → saturates 2+ links
→ CDN absorbs 80%+ of this traffic at the edge
→ Origin sees ~12 MB/s (20% miss rate), easily handled
Pattern 4: Availability → Downtime → Dependency Budget
Question: "We target 99.9% availability. We have 3 critical dependencies."
99.9% = 8.76 hours downtime/year = 43 minutes/month
If each dependency is 99.95%:
Composite = 0.9995³ = 0.9985 = 99.85%
= 13 hours downtime/year — exceeds budget by 4+ hours
→ Need circuit breakers, fallbacks, and graceful degradation
→ Or: accept 99.85% and set SLO accordingly
Pattern 5: Write Amplification → True Cost
Question: "A social post generates how many actual writes?"
1 user post = 1 write to posts table
+ 1 write to user timeline
+ N writes for follower fan-out (avg 500 followers)
+ 1 write to search index
+ 1 write to notification queue
+ 1 write to analytics stream
= ~505 actual writes per user action
At 500M posts/day:
500M × 505 = ~250B write operations/day
250B ÷ 100K = 2.5M writes/sec across all systems
→ This is why fan-out is the Staff-level conversation, not the post write
Visual Guide
Latency Hierarchy
Estimation Decision Tree
Common Scale Anchors
Use these as sanity checks. If your estimate for a similar system is 10x higher or lower than a known real-world system, re-examine your assumptions.
| System | Known Scale | Useful As |
|---|---|---|
| Twitter/X | ~500M tweets/day, ~300K QPS peak | High-write social benchmark |
| Google Search | ~8.5B queries/day, ~100K QPS | Read-heavy search benchmark |
| Uber | ~20M rides/day, 5M location updates/sec | Real-time location at scale |
| ~100B messages/day | Messaging throughput ceiling | |
| YouTube | ~500 hours uploaded/min, ~1B hours watched/day | Media storage + bandwidth |
| Stripe | ~millions of transactions/day | Payment processing benchmark |
| Netflix | ~200M subscribers, ~15% concurrent in peak | Streaming bandwidth reference |
| ~2B+ MAU, ~100M photos uploaded/day | Media upload + storage reference |
How This Shows Up in Interviews
Scenario 1: "Estimate the storage for a new feature"
The interviewer is testing your estimation chain, not the final number. Do not say "it'll be a lot of data." Say: "100M users × 5 posts/day × 1 KB average = 500 GB/day raw. With 3x replication and 1.3x for indexes, that's ~2 TB/day actual. Over a year: ~730 TB. We need sharding, and a retention policy that tiers cold data to object storage after 90 days brings the hot dataset to under 200 TB." Show the chain, state assumptions aloud, and end with an architectural conclusion.
Scenario 2: "Design a notification system for 500M users" (Full Walkthrough)
This is a classic estimation-first question. Here is how a Staff engineer works through it before touching any architecture:
Step 1 — Anchor on user behavior. "Let me start with user activity. 500M DAU. Each user triggers roughly 20 notification-worthy events per day — likes, comments, follows, messages. That's 10B events/day."
Step 2 — Convert to throughput. "10B events/day ÷ 100K seconds/day = 100K events/sec average. Peak at 5x during morning/evening: 500K events/sec. This is well beyond a single database — we're in distributed queue territory. Kafka handles 1M msgs/sec per partition, so we need at least 1 partition at average load, 1-2 at peak. In practice, we'd use 10-20 partitions for headroom and parallel consumers."
Step 3 — Estimate notification fan-out. "Not every event generates a notification. Maybe 30% result in a push notification (3B/day), and each push payload is ~500 bytes. That's 1.5 TB/day of notification payload. Over a year with 90-day retention and 3x replication: 1.5 TB × 90 × 3 = 405 TB for the notification store."
Step 4 — Check the delivery path. "3B push notifications/day = 30K/sec average, 150K/sec peak. APNS and FCM have rate limits per connection — typically 1K-5K/sec. We need 30-150 concurrent connections to the push providers. This is a connection pool problem, not a throughput problem."
Step 5 — Sanity check. "WhatsApp handles 100B messages/day. Our 10B events/day is 10x smaller. Instagram has 2B+ users generating a similar notification volume. Our numbers are in the right ballpark."
Why this is a Staff answer: Every number leads to an architectural conclusion. 100K events/sec → Kafka. 3B push/day → connection pooling. 405 TB → retention policy. The candidate never said a number without immediately connecting it to a design decision.
Scenario 3: "Can a single machine handle this?"
Do not say "probably not at scale." Say: "Let me check three axes. Reads: 50K QPS — Postgres handles 50K reads/s, so we're at the ceiling. One more feature doubles it past capacity. Writes: 2K QPS — well within the 10K writes/s ceiling. Storage: 200 GB — fits on one SSD. So reads are the bottleneck. Two read replicas double our read capacity to 100K, which buys us headroom. No sharding needed yet." Always check all three axes — reads, writes, storage — and identify which one forces the scaling decision.
Scenario 4: "What's the cost of this design?"
Cloud cost estimation is increasingly asked at Staff+ interviews. The quick rules:
| Resource | Approximate Cost | Quick Math |
|---|---|---|
| Compute (EC2/GCE) | ~$0.05/hr per vCPU | $35/month per core |
| SSD storage (EBS/PD) | ~$0.10/GB/month | $100/TB/month |
| Object storage (S3/GCS) | ~$0.023/GB/month | $23/TB/month |
| Data transfer (egress) | ~$0.09/GB | $90/TB — often the surprise cost |
| Redis (managed) | ~$0.06/GB/hour | ~$43/GB/month |
"100TB across 3 replicas with indexes is 500TB actual. At $0.10/GB that's $50K/month for storage alone — do we need 7-year retention or can we tier to object storage after 90 days?"
In the Wild
Google: The Jeff Dean Numbers
The most famous estimation reference in system design interviews traces back to Jeff Dean's 2009 presentation "Numbers Everyone Should Know." These numbers — L1 cache at 0.5ns, memory at 100ns, SSD at 100μs, network at 150ms cross-continent — became the canonical set that interviewers expect candidates to know. Google's entire infrastructure philosophy is built on these gaps: Bigtable exploits the sequential-vs-random disk gap, Spanner exploits the same-region-vs-cross-region gap with TrueTime, and the move to SSD across all GFS storage in the 2010s was driven by the 100x latency improvement over spinning disk.
The Staff-level insight: These numbers haven't changed dramatically in 15 years. Memory got faster, SSDs got cheaper, but the ratios between tiers remain roughly the same. The 1000x gap between memory and network is as real in 2026 as it was in 2009. This is why the estimation skill is durable — you're learning ratios, not absolute values.
Slack: Message Volume Estimation in Practice
Slack processes roughly 1.5 billion messages per day across all workspaces. Their engineering team publicly shared how they reason about capacity: average message size is ~1 KB (including metadata), giving ~1.5 TB/day of raw message data. But the actual storage cost is dominated by read amplification — every message is read by every channel member, and popular channels might have 1,000 members. The read fan-out turns 1.5B writes into ~150B reads/day, or roughly 1.7M reads/sec. This is why Slack moved from a single MySQL cluster to a sharded architecture — not because write volume exceeded capacity, but because read amplification at scale demanded it.
The Staff-level insight: The naive estimate (1.5B messages × 1 KB = 1.5 TB/day) suggests a manageable problem. The Staff estimate accounts for read amplification: every message is read N times where N is the channel size. The writes are easy; the reads determine the architecture.
Uber: Real-Time Location at 5M Updates/Second
Uber processes 5 million driver location updates per second at peak. Each update is roughly 100 bytes (lat/lng, timestamp, driver ID, trip ID). That's 500 MB/sec of raw location data, or ~43 TB/day. But the interesting estimation challenge is on the read side: every active rider is querying for nearby drivers multiple times per minute. With 20M rides/day and an average ride-matching window of 2 minutes with 10 queries, that's ~140M read queries/day for proximity matching alone — about 1,600 QPS. The location writes dwarf the reads by 3,000x, which is why Uber uses an in-memory spatial index (not a database) for real-time matching and batches the writes to persistent storage asynchronously.
The Staff-level insight: The write-to-read ratio flips the normal assumption. Most systems are read-heavy. Uber's location system is write-heavy by 3,000x, which demands a fundamentally different architecture — in-memory write-optimized stores instead of read-optimized databases.
Staff Calibration
The sections below are calibration tools for Staff-level interviews. If you already understand estimation mechanics, start here to sharpen the framing that separates L5 from L6 answers.
What Staff Engineers Say (That Seniors Don't)
| Number | Senior Engineers Say | Staff Engineers Say |
|---|---|---|
| 99th percentile latency | "We should optimize the p99" | "p99 at 500ms means 1% of our 10M daily users hit this — that's 100K frustrated sessions. Is that acceptable for checkout vs. search?" |
| Throughput (QPS) | "We need to handle 50K QPS" | "50K QPS average means 150–500K peak. A single Postgres instance tops out at 50K reads/s — we need a caching layer, not more replicas" |
| Storage cost | "We'll store everything in S3" | "100TB across 3 replicas with indexes is 500TB actual. At $0.023/GB that's $11.5K/month — do we need 7-year retention or can we tier to Glacier after 90 days?" |
| Network bandwidth | "We have 10Gbps links" | "10Gbps theoretical is ~7Gbps goodput after overhead. Our 5TB/day outbound needs 460Mbps sustained — one link handles it, but during peak we'll saturate at 3x average" |
| Failure rate | "We target 99.9% availability" | "99.9% = 43 minutes downtime/month. With 3 dependencies each at 99.95%, our composite availability is 99.85% — we need circuit breakers and fallbacks to close the 0.05% gap" |
| Cache hit ratio | "Our cache hit rate is 95%" | "95% hit rate at 100K QPS means 5K cache misses/second hitting the database. If DB handles 10K reads/s, we're at 50% capacity from misses alone — a cache failure doubles DB load instantly" |
Common Interview Traps
- Confusing latency units. Mixing up microseconds and milliseconds changes your architecture. SSD random read is 100 μs, not 100 ms. State your units explicitly.
- Ignoring replication and indexing overhead. Raw data size is never the full storage cost. Multiply by 3x for replicas, add 30–50% for indexes. This is the difference between "100 TB" and "420 TB."
- Forgetting to convert units consistently. Always normalize to the same time horizon (per second, per day, per year) before comparing. Mixing daily and per-second numbers in the same chain guarantees errors.
- Over-precision. Saying "we need 11,574 QPS" instead of "roughly 12K QPS" signals inexperience with real estimation. Round aggressively — the goal is ballpark, not precision.
- Forgetting peak-to-average ratio. Average QPS is useless for capacity planning. You provision for peak, which is 3–10x average depending on the workload.
- Treating storage as free. "We'll just store everything" ignores that 100 TB of hot data across 3 replicas with indexes is 500+ TB of actual storage cost.
- Ignoring write amplification. One user action (post a tweet) can generate 10+ writes: the tweet itself, timeline fan-out, index updates, notification triggers, analytics events.
- Confusing network throughput with goodput. Protocol overhead, retransmissions, and encryption reduce usable throughput to ~70% of theoretical maximum.
Practice Drill
Staff-Caliber Answer ShapeExpand
- Total feed loads/day: 500M × 8 = 4B feed loads
- Total post reads/day: 4B × 20 = 80B post reads
- QPS: 80B / 100K seconds ≈ 800K QPS (peak: 2–3M QPS)
- Bandwidth: 80B × 2 KB = 160 TB/day ≈ 1.8 GB/s sustained
- Can a single DB handle it? No. Postgres handles ~50K reads/s. We need at least 16 read replicas for average load and 40+ for peak. This is a caching problem — a 95% cache hit rate reduces DB load to 40K QPS, within single-instance range.
The Staff move: Don't just compute the number. Follow through to the architectural implication: this volume demands a caching layer, not just database scaling.
Where This Appears
These playbooks apply estimation skills to complete system design problems with full Staff-level walkthroughs, evaluator-grade rubrics, and practice drills.
- Capacity Planning — Structured framework for translating business requirements into infrastructure numbers, with worked examples of estimating compute, storage, and bandwidth for real systems
- Rate Limiting — Per-client and per-endpoint rate calculation, token bucket sizing, and the math behind distributed rate limiting across multiple nodes
- Load Balancer — Throughput-based routing decisions, connection pool sizing, and why the throughput ceiling of a single server drives your load balancing strategy
- Database Sharding — When single-instance ceilings force a sharding decision, how to estimate shard count from throughput and storage projections, and rebalancing math
Related Technologies: Redis · PostgreSQL · Kafka · Cassandra · Elasticsearch