StaffSignal
Foundation — Study Guide22 min read

Back-of-Envelope Estimation

"How many servers do we need?" — answered in 90 seconds. The 20 numbers every Staff engineer has memorized (1MB/s per Postgres connection, 100K ops/s per Redis node, 10ms SSD read), plus the estimation framework to chain them into capacity decisions on a whiteboard.

Why This Matters

System design interviews are estimation conversations. Every capacity plan, every SLA negotiation, every sharding decision starts with a number. Not a precise number — an order-of-magnitude number that tells you whether your architecture is in the right ballpark or off by a factor of 100.

Interviewers do not care whether an SSD random read takes 100 microseconds or 150 microseconds. They care whether you think it takes 100 milliseconds — because that changes your entire design. A candidate who places a network hop at 0.5ms (correct) designs a microservice chain differently from one who places it at 50ms (wrong). The numbers are not trivia. They are the language of trade-off conversations.

Staff candidates do something that senior candidates rarely do: they connect numbers to decisions. "At 100K QPS, a single Postgres instance can serve reads from its buffer pool. At 500K QPS, we need a caching layer. At 5M QPS, we need sharded caches." The numbers narrow the design space. Without them, you're guessing — and the interviewer knows it.

This guide teaches you not just what the numbers are, but how to use them. By the end, you should be able to take any "design X for Y users" prompt and produce a capacity estimate in under two minutes — the same estimate that a Staff engineer would produce on a whiteboard.

The 60-Second Version

  • System design interviews test whether you can size a system without a calculator. Wrong orders of magnitude signal you have not operated production infrastructure.
  • Interviewers do not expect exact figures. They expect you to stay within 2x of reality. Being 10x off on latency or throughput raises immediate credibility concerns.
  • Memorize order of magnitude, not decimal places. L1 cache is nanoseconds, disk seek is milliseconds, cross-region is tens to hundreds of milliseconds.
  • Numbers anchor every capacity plan, every SLA discussion, and every sharding decision. They are not trivia — they are the language of trade-off conversations.
  • Staff candidates connect numbers to architectural choices: "At 100K QPS we need horizontal scaling; at 1K QPS a single Postgres instance is fine."
  • Round aggressively. Use powers of 10. State your assumptions out loud. This is what interviewers actually evaluate.
  • Seconds in a day: ~100K. This single conversion is the most-used tool in estimation. 1M events/day = ~10 QPS.

How Estimation Works

The Core Skill

Back-of-envelope estimation is a three-step process:

  1. Start with a known anchor. A real-world number you're confident about — daily active users, average payload size, a throughput ceiling from the numbers table below.
  2. Chain multiplications. Walk from the anchor to the number you need. Each step should involve a single, justifiable assumption.
  3. Sanity check against a known system. Does your result make sense compared to systems at similar scale?

The entire process takes 60–90 seconds in an interview. The interviewer is evaluating three things: whether your anchors are in the right ballpark, whether your chain of reasoning is logical, and whether you catch your own mistakes before they have to point them out.

The Two Conversions You'll Use Every Time

Daily volume to QPS:

QPS = daily_volume ÷ 86,400
    ≈ daily_volume ÷ 100,000   (round for speed)

1M events/day  ÷ 100K = 10 QPS
100M events/day ÷ 100K = 1,000 QPS
1B events/day  ÷ 100K = 10,000 QPS

Storage per year:

annual_storage = daily_events × avg_size × 365

1M events/day × 1 KB = 1 GB/day → ~365 GB/year
1B events/day × 100 bytes = 100 GB/day → ~36 TB/year

These two conversions handle 80% of estimation questions. Practice them until they're automatic.

Peak vs. Average

Average QPS is useless for capacity planning. You provision for peak load, which is typically 3–10x average depending on the workload type:

WorkloadPeak / Average RatioWhy
Social media feeds3–5xMorning/evening usage spikes
E-commerce5–10xFlash sales, Black Friday
Gaming2–3xPredictable session-based usage
Financial trading10–50xMarket open, news events
SaaS / B2B2–3xBusiness hours concentration

Staff rule: Always state both. "Average 1K QPS, peak 5K QPS" tells the interviewer you understand capacity planning. "1K QPS" alone tells them you don't.

The Numbers

Latency Numbers

These are the numbers that shape every architecture decision. Memorize the order-of-magnitude column — the exact numbers are for reference.

OperationLatencyOrder of MagnitudeDesign Implication
L1 cache reference1 nsnanosecondsCPU-bound work is measured here
L2 cache reference4 nsnanosecondsStill "free" — no design concern
Main memory reference100 nsnanosecondsIn-process caches operate at this speed
SSD random read100 μsmicroseconds10,000 reads/sec per SSD
SSD sequential read (1 MB)1 msmillisecondsSequential is 10x faster than random
Network round-trip, same AZ0.5 mssub-millisecondBaseline for microservice call chains
Network round-trip, same region1–2 msmillisecondsCost of synchronous cross-AZ replication
Disk seek (HDD)10 msmilliseconds100 seeks/sec — this is why SSDs won
Network round-trip, cross-region50–150 mstens of millisecondsHard floor for global architectures

The key insight: There is a 1000x gap between memory (100ns) and network (0.5ms), and another 100x gap between same-region (1ms) and cross-region (100ms). These gaps are where architectural decisions live. When someone says "add a cache," they're exploiting the memory-network gap. When someone says "deploy to multiple regions," they're paying the same-region-to-cross-region tax.

Throughput Numbers

SystemThroughputNotesDesign Implication
Single web server~10K req/sCPU-bound; I/O-bound variesFine for most internal services
Redis (single thread)~100K ops/s~500K with pipeliningOne instance handles most applications
Memcached (multithreaded)~500K ops/sPure key-value, no data structuresChoose over Redis for raw throughput
Kafka (per partition)~1M msgs/sThroughput scales linearly with partitionsAdd partitions, not clusters
Postgres~10K writes/s, ~50K reads/sTuned config, SSDsThe ceiling before you need sharding
MySQL (InnoDB)~15K writes/sCommodity hardwareSimilar to Postgres for write ceiling
1 Gbps network link~120 MB/sPractical ceiling after protocol overheadPlan for 70% utilization
10 Gbps network link~1.2 GB/sStandard data center interconnectRarely the bottleneck

Storage & Scale Numbers

CalculationResultRule of Thumb
1M users × 1 KB each1 GBFits in RAM on a single machine
10M users × 10 KB each100 GBFits on one SSD, queryable from one Postgres
1B events/day × 100 bytes100 GB/day, ~36 TB/yearPlan for compression + retention policy
500M tweets/day × 300 bytes150 GB/day, ~55 TB/yearBefore indexes or replicas
Seconds in a day~86,400 (~100K)The most-used conversion in estimation
Seconds in a month~2.6M (~2.5M)For monthly volume calculations
Seconds in a year~31.5M (~30M)For annual projections

Availability Nines

Every SLA conversation uses the "nines" shorthand. Know these cold — and know how to compute composite availability across dependencies.

AvailabilityAnnual DowntimeMonthly DowntimeCommon Tier
99% (two nines)3.65 days7.3 hoursBatch systems, internal tools
99.9% (three nines)8.76 hours43 minutesMost SaaS, business apps
99.95%4.38 hours21.6 minutesStandard cloud SLA (EC2, RDS)
99.99% (four nines)52.6 minutes4.3 minutesPayment systems, core infra
99.999% (five nines)5.26 minutes26 secondsTelecom, emergency services

Composite availability: If your service depends on N independent services in series, the composite availability is the product. Three dependencies at 99.95% each: 0.9995³ = 99.85%, not 99.95%.

Parallel redundancy: Two instances of a 99.9% service in active-active: unavailability = 0.001² = 0.000001 → 99.9999%. Redundancy is how you buy nines.

Quick Conversion Table

FromToRuleExample
Daily volume → QPS÷ 100K1M/day ≈ 10 QPS500M/day ≈ 5K QPS
QPS → daily volume× 100K100 QPS ≈ 10M/day50K QPS ≈ 5B/day
GB/day → MB/s÷ 86,400 × 1,000100 GB/day ≈ 1.2 MB/s1 TB/day ≈ 12 MB/s
Users → concurrent× 0.01 to 0.1010M users → 100K–1M concurrentDepends on session length
MAU → DAU× 0.30 to 0.50100M MAU → 30–50M DAUHighly product-dependent
Peak → average× 3 to 10Average 1K QPS → peak 3K–10KE-commerce is toward 10x
Raw storage → actual× 3 (replicas) × 1.4 (indexes)100 TB raw → ~420 TB actualInclude replication and indexing

Back-of-Envelope Reasoning

The value of these numbers is in chaining them to reach architectural conclusions. Here are the reasoning patterns you'll use in interviews.

Pattern 1: Volume → QPS → Database Fit

Question: "A URL shortener handles 100M new URLs per day. Can a single database handle the writes?"

100M writes/day ÷ 100K seconds/day = 1,000 writes/sec
Postgres handles ~10K writes/sec
→ A single Postgres instance handles this comfortably (10% capacity)
→ No sharding needed for writes at this scale

Pattern 2: Volume → Storage → Retention Strategy

Question: "A chat app handles 1B messages per day, average 200 bytes each. How much storage per year?"

1B × 200 bytes = 200 GB/day raw
× 365 = 73 TB/year raw
× 3 (replicas) = 219 TB/year
× 1.3 (indexes) = ~285 TB/year actual

→ This needs sharding and a retention strategy
→ Hot data (last 30 days): 200 GB × 30 × 3 = 18 TB — fits in a sharded cluster
→ Cold data: tier to object storage after 90 days

Pattern 3: Users → Bandwidth → CDN Need

Question: "An image service serves 10M images per day, average 500 KB each. Do we need a CDN?"

10M × 500 KB = 5 TB/day outbound
5 TB / 86,400 seconds = ~58 MB/s sustained
At 120 MB/s per 1 Gbps link → 50% of one link's capacity

Peak at 5x average → 290 MB/s → saturates 2+ links
→ CDN absorbs 80%+ of this traffic at the edge
→ Origin sees ~12 MB/s (20% miss rate), easily handled

Pattern 4: Availability → Downtime → Dependency Budget

Question: "We target 99.9% availability. We have 3 critical dependencies."

99.9% = 8.76 hours downtime/year = 43 minutes/month

If each dependency is 99.95%:
  Composite = 0.9995³ = 0.9985 = 99.85%
  = 13 hours downtime/year — exceeds budget by 4+ hours

→ Need circuit breakers, fallbacks, and graceful degradation
→ Or: accept 99.85% and set SLO accordingly

Pattern 5: Write Amplification → True Cost

Question: "A social post generates how many actual writes?"

1 user post = 1 write to posts table
+ 1 write to user timeline
+ N writes for follower fan-out (avg 500 followers)
+ 1 write to search index
+ 1 write to notification queue
+ 1 write to analytics stream
= ~505 actual writes per user action

At 500M posts/day:
500M × 505 = ~250B write operations/day
250B ÷ 100K = 2.5M writes/sec across all systems
→ This is why fan-out is the Staff-level conversation, not the post write

Visual Guide

Latency Hierarchy

Rendering diagram...

Estimation Decision Tree

Rendering diagram...

Common Scale Anchors

Use these as sanity checks. If your estimate for a similar system is 10x higher or lower than a known real-world system, re-examine your assumptions.

SystemKnown ScaleUseful As
Twitter/X~500M tweets/day, ~300K QPS peakHigh-write social benchmark
Google Search~8.5B queries/day, ~100K QPSRead-heavy search benchmark
Uber~20M rides/day, 5M location updates/secReal-time location at scale
WhatsApp~100B messages/dayMessaging throughput ceiling
YouTube~500 hours uploaded/min, ~1B hours watched/dayMedia storage + bandwidth
Stripe~millions of transactions/dayPayment processing benchmark
Netflix~200M subscribers, ~15% concurrent in peakStreaming bandwidth reference
Instagram~2B+ MAU, ~100M photos uploaded/dayMedia upload + storage reference

How This Shows Up in Interviews

Scenario 1: "Estimate the storage for a new feature"

The interviewer is testing your estimation chain, not the final number. Do not say "it'll be a lot of data." Say: "100M users × 5 posts/day × 1 KB average = 500 GB/day raw. With 3x replication and 1.3x for indexes, that's ~2 TB/day actual. Over a year: ~730 TB. We need sharding, and a retention policy that tiers cold data to object storage after 90 days brings the hot dataset to under 200 TB." Show the chain, state assumptions aloud, and end with an architectural conclusion.

Scenario 2: "Design a notification system for 500M users" (Full Walkthrough)

This is a classic estimation-first question. Here is how a Staff engineer works through it before touching any architecture:

Step 1 — Anchor on user behavior. "Let me start with user activity. 500M DAU. Each user triggers roughly 20 notification-worthy events per day — likes, comments, follows, messages. That's 10B events/day."

Step 2 — Convert to throughput. "10B events/day ÷ 100K seconds/day = 100K events/sec average. Peak at 5x during morning/evening: 500K events/sec. This is well beyond a single database — we're in distributed queue territory. Kafka handles 1M msgs/sec per partition, so we need at least 1 partition at average load, 1-2 at peak. In practice, we'd use 10-20 partitions for headroom and parallel consumers."

Step 3 — Estimate notification fan-out. "Not every event generates a notification. Maybe 30% result in a push notification (3B/day), and each push payload is ~500 bytes. That's 1.5 TB/day of notification payload. Over a year with 90-day retention and 3x replication: 1.5 TB × 90 × 3 = 405 TB for the notification store."

Step 4 — Check the delivery path. "3B push notifications/day = 30K/sec average, 150K/sec peak. APNS and FCM have rate limits per connection — typically 1K-5K/sec. We need 30-150 concurrent connections to the push providers. This is a connection pool problem, not a throughput problem."

Step 5 — Sanity check. "WhatsApp handles 100B messages/day. Our 10B events/day is 10x smaller. Instagram has 2B+ users generating a similar notification volume. Our numbers are in the right ballpark."

Why this is a Staff answer: Every number leads to an architectural conclusion. 100K events/sec → Kafka. 3B push/day → connection pooling. 405 TB → retention policy. The candidate never said a number without immediately connecting it to a design decision.

Scenario 3: "Can a single machine handle this?"

Do not say "probably not at scale." Say: "Let me check three axes. Reads: 50K QPS — Postgres handles 50K reads/s, so we're at the ceiling. One more feature doubles it past capacity. Writes: 2K QPS — well within the 10K writes/s ceiling. Storage: 200 GB — fits on one SSD. So reads are the bottleneck. Two read replicas double our read capacity to 100K, which buys us headroom. No sharding needed yet." Always check all three axes — reads, writes, storage — and identify which one forces the scaling decision.

Scenario 4: "What's the cost of this design?"

Cloud cost estimation is increasingly asked at Staff+ interviews. The quick rules:

ResourceApproximate CostQuick Math
Compute (EC2/GCE)~$0.05/hr per vCPU$35/month per core
SSD storage (EBS/PD)~$0.10/GB/month$100/TB/month
Object storage (S3/GCS)~$0.023/GB/month$23/TB/month
Data transfer (egress)~$0.09/GB$90/TB — often the surprise cost
Redis (managed)~$0.06/GB/hour~$43/GB/month

"100TB across 3 replicas with indexes is 500TB actual. At $0.10/GB that's $50K/month for storage alone — do we need 7-year retention or can we tier to object storage after 90 days?"

In the Wild

Google: The Jeff Dean Numbers

The most famous estimation reference in system design interviews traces back to Jeff Dean's 2009 presentation "Numbers Everyone Should Know." These numbers — L1 cache at 0.5ns, memory at 100ns, SSD at 100μs, network at 150ms cross-continent — became the canonical set that interviewers expect candidates to know. Google's entire infrastructure philosophy is built on these gaps: Bigtable exploits the sequential-vs-random disk gap, Spanner exploits the same-region-vs-cross-region gap with TrueTime, and the move to SSD across all GFS storage in the 2010s was driven by the 100x latency improvement over spinning disk.

The Staff-level insight: These numbers haven't changed dramatically in 15 years. Memory got faster, SSDs got cheaper, but the ratios between tiers remain roughly the same. The 1000x gap between memory and network is as real in 2026 as it was in 2009. This is why the estimation skill is durable — you're learning ratios, not absolute values.

Slack: Message Volume Estimation in Practice

Slack processes roughly 1.5 billion messages per day across all workspaces. Their engineering team publicly shared how they reason about capacity: average message size is ~1 KB (including metadata), giving ~1.5 TB/day of raw message data. But the actual storage cost is dominated by read amplification — every message is read by every channel member, and popular channels might have 1,000 members. The read fan-out turns 1.5B writes into ~150B reads/day, or roughly 1.7M reads/sec. This is why Slack moved from a single MySQL cluster to a sharded architecture — not because write volume exceeded capacity, but because read amplification at scale demanded it.

The Staff-level insight: The naive estimate (1.5B messages × 1 KB = 1.5 TB/day) suggests a manageable problem. The Staff estimate accounts for read amplification: every message is read N times where N is the channel size. The writes are easy; the reads determine the architecture.

Uber: Real-Time Location at 5M Updates/Second

Uber processes 5 million driver location updates per second at peak. Each update is roughly 100 bytes (lat/lng, timestamp, driver ID, trip ID). That's 500 MB/sec of raw location data, or ~43 TB/day. But the interesting estimation challenge is on the read side: every active rider is querying for nearby drivers multiple times per minute. With 20M rides/day and an average ride-matching window of 2 minutes with 10 queries, that's ~140M read queries/day for proximity matching alone — about 1,600 QPS. The location writes dwarf the reads by 3,000x, which is why Uber uses an in-memory spatial index (not a database) for real-time matching and batches the writes to persistent storage asynchronously.

The Staff-level insight: The write-to-read ratio flips the normal assumption. Most systems are read-heavy. Uber's location system is write-heavy by 3,000x, which demands a fundamentally different architecture — in-memory write-optimized stores instead of read-optimized databases.


Staff Calibration

The sections below are calibration tools for Staff-level interviews. If you already understand estimation mechanics, start here to sharpen the framing that separates L5 from L6 answers.

What Staff Engineers Say (That Seniors Don't)

NumberSenior Engineers SayStaff Engineers Say
99th percentile latency"We should optimize the p99""p99 at 500ms means 1% of our 10M daily users hit this — that's 100K frustrated sessions. Is that acceptable for checkout vs. search?"
Throughput (QPS)"We need to handle 50K QPS""50K QPS average means 150–500K peak. A single Postgres instance tops out at 50K reads/s — we need a caching layer, not more replicas"
Storage cost"We'll store everything in S3""100TB across 3 replicas with indexes is 500TB actual. At $0.023/GB that's $11.5K/month — do we need 7-year retention or can we tier to Glacier after 90 days?"
Network bandwidth"We have 10Gbps links""10Gbps theoretical is ~7Gbps goodput after overhead. Our 5TB/day outbound needs 460Mbps sustained — one link handles it, but during peak we'll saturate at 3x average"
Failure rate"We target 99.9% availability""99.9% = 43 minutes downtime/month. With 3 dependencies each at 99.95%, our composite availability is 99.85% — we need circuit breakers and fallbacks to close the 0.05% gap"
Cache hit ratio"Our cache hit rate is 95%""95% hit rate at 100K QPS means 5K cache misses/second hitting the database. If DB handles 10K reads/s, we're at 50% capacity from misses alone — a cache failure doubles DB load instantly"

Common Interview Traps

  • Confusing latency units. Mixing up microseconds and milliseconds changes your architecture. SSD random read is 100 μs, not 100 ms. State your units explicitly.
  • Ignoring replication and indexing overhead. Raw data size is never the full storage cost. Multiply by 3x for replicas, add 30–50% for indexes. This is the difference between "100 TB" and "420 TB."
  • Forgetting to convert units consistently. Always normalize to the same time horizon (per second, per day, per year) before comparing. Mixing daily and per-second numbers in the same chain guarantees errors.
  • Over-precision. Saying "we need 11,574 QPS" instead of "roughly 12K QPS" signals inexperience with real estimation. Round aggressively — the goal is ballpark, not precision.
  • Forgetting peak-to-average ratio. Average QPS is useless for capacity planning. You provision for peak, which is 3–10x average depending on the workload.
  • Treating storage as free. "We'll just store everything" ignores that 100 TB of hot data across 3 replicas with indexes is 500+ TB of actual storage cost.
  • Ignoring write amplification. One user action (post a tweet) can generate 10+ writes: the tweet itself, timeline fan-out, index updates, notification triggers, analytics events.
  • Confusing network throughput with goodput. Protocol overhead, retransmissions, and encryption reduce usable throughput to ~70% of theoretical maximum.

Practice Drill

Staff-Caliber Answer Shape
Expand
  1. Total feed loads/day: 500M × 8 = 4B feed loads
  2. Total post reads/day: 4B × 20 = 80B post reads
  3. QPS: 80B / 100K seconds ≈ 800K QPS (peak: 2–3M QPS)
  4. Bandwidth: 80B × 2 KB = 160 TB/day ≈ 1.8 GB/s sustained
  5. Can a single DB handle it? No. Postgres handles ~50K reads/s. We need at least 16 read replicas for average load and 40+ for peak. This is a caching problem — a 95% cache hit rate reduces DB load to 40K QPS, within single-instance range.

The Staff move: Don't just compute the number. Follow through to the architectural implication: this volume demands a caching layer, not just database scaling.

Where This Appears

These playbooks apply estimation skills to complete system design problems with full Staff-level walkthroughs, evaluator-grade rubrics, and practice drills.

  • Capacity Planning — Structured framework for translating business requirements into infrastructure numbers, with worked examples of estimating compute, storage, and bandwidth for real systems
  • Rate Limiting — Per-client and per-endpoint rate calculation, token bucket sizing, and the math behind distributed rate limiting across multiple nodes
  • Load Balancer — Throughput-based routing decisions, connection pool sizing, and why the throughput ceiling of a single server drives your load balancing strategy
  • Database Sharding — When single-instance ceilings force a sharding decision, how to estimate shard count from throughput and storage projections, and rebalancing math

Related Technologies: Redis · PostgreSQL · Kafka · Cassandra · Elasticsearch

This is one of 9 foundation guides. The full library also includes deep-dive system design playbooks with evaluator-grade breakdowns, practice drills, and failure-mode analysis. Explore the full library