Why This Matters
Every distributed system is a network conversation. Your microservices, databases, caches, and users communicate over a network that introduces latency, packet loss, and bandwidth constraints. The architecture decisions you make — where to terminate TLS, how to pool connections, which protocol to use between services — are all network decisions. And yet, most system design candidates treat the network as an invisible wire that moves data instantly between boxes on their whiteboard.
Staff-level interviewers probe networking because it exposes a candidate's depth of operational experience. A senior engineer might say "we add a cache to reduce latency." A Staff engineer says "our latency budget is 200ms, and cross-region RTT alone consumes 150ms. Caching must happen at the edge, or we've blown the budget before application logic runs." That response demonstrates that the candidate understands latency budgets, the physics of network propagation, and the constraint that geography imposes on architecture.
This guide teaches you the networking concepts that actually appear in system design interviews: RTT and latency budgets, connection lifecycle costs, protocol selection, and the tail-latency amplification that kills microservice architectures. You will not need to explain TCP congestion control in detail — but you will need to explain why connection pooling matters and when HTTP/3 is the right choice.
The 60-Second Version
- RTT is the fundamental design constraint. Every network decision is a latency budget decision. Same-AZ: ~0.5ms. Same-region: ~1ms. Cross-continent: ~150ms. These numbers shape every architecture choice.
- DNS TTL is a staleness injection point. DNS-based load balancing means clients cache resolved IPs. A 60s TTL means up to 60s of traffic to a dead backend after failover.
- TLS has a connection tax. TLS 1.3 costs 1 RTT; TLS 1.2 costs 2 RTTs. Connection reuse (keep-alive, pooling) eliminates this cost for all subsequent requests.
- TCP slow start throttles new connections. The first ~14KB trickles through a fresh connection. Connection pooling amortizes this ramp-up across many requests.
- HTTP/2 multiplexing solves HTTP head-of-line blocking but not TCP's. A single dropped packet stalls all streams on that connection. HTTP/3 (QUIC) over UDP eliminates this.
- Connection lifecycle cost drives architecture. Where you terminate TLS, where you pool connections, and where you place proxies are the decisions that determine your tail latency.
- Tail-latency amplification is the microservice killer. P99 of N sequential service calls is worse than the p99 of any single call. The fix is architectural (parallelize, batch, eliminate), not operational.
How Networking Works in Distributed Systems
The Latency Budget Model
Every user-facing request has a latency budget — the maximum time the user will wait before the experience feels broken. For a web application, this is typically 200–500ms. For a real-time system, it might be 50ms. For batch processing, it might be 5 seconds.
The network consumes a portion of this budget before your code runs. Understanding how much it consumes is the first step in any architecture decision:
| Network Hop | Latency Cost | Budget Consumed (of 200ms) |
|---|---|---|
| Client → CDN edge | 5–20ms | 10% |
| CDN → origin (same region) | 1–2ms | 1% |
| Service → Service (same AZ) | 0.5ms | 0.25% |
| Service → Service (cross-AZ) | 1–2ms | 1% |
| Service → Database | 0.5–2ms | 1% |
| Cross-region replication | 50–150ms | 75% |
The key insight: A single cross-region hop can consume 75% of your latency budget. This is why multi-region architectures require local reads. It's not an optimization — it's a mathematical necessity.
Connection Lifecycle
Every network connection has a startup cost that most candidates ignore. Understanding this cost explains why connection pooling is not an optimization — it's a requirement.
A fresh HTTPS connection requires:
- DNS resolution: 1–50ms (cached: <1ms, cold: 50ms+)
- TCP handshake: 1 RTT (SYN → SYN-ACK → ACK)
- TLS handshake: 1 RTT for TLS 1.3, 2 RTTs for TLS 1.2
- First data transfer: Limited by TCP slow start (~14KB initial window)
Total cost for a fresh cross-region connection: 3 RTTs + slow start ≈ 450ms before any application data flows.
A reused connection (keep-alive or pooled): 0 RTTs overhead. Data flows immediately at the full congestion window.
# Fresh connection (cross-region, 150ms RTT):
DNS: 0ms (cached)
TCP: 150ms (1 RTT)
TLS 1.3: 150ms (1 RTT)
Slow start: 450ms (3 RTTs to ramp up for 200KB response)
Total: 750ms before full data transfer
# Pooled connection:
Overhead: 0ms
Data: 150ms (1 RTT for request/response)
Total: 150ms
This 5x difference is why connection pooling is mandatory for any latency-sensitive path.
TCP Slow Start
A fresh TCP connection starts with an initial congestion window (IW) of 10 segments × 1,460 bytes = ~14 KB. After each RTT, the window roughly doubles:
| RTT # | Congestion Window | Cumulative Data |
|---|---|---|
| 0 | 14 KB | 14 KB |
| 1 | 28 KB | 42 KB |
| 2 | 56 KB | 98 KB |
| 3 | 112 KB | 210 KB |
| 4 | 224 KB | 434 KB |
Implication: A 200 KB API response on a fresh connection takes 3 RTTs just to deliver the payload — on top of the TLS handshake. On a cross-region connection (150ms RTT), that's 450ms of slow-start overhead alone. Connection pooling eliminates this entirely because the congestion window is already warm.
Head-of-Line Blocking
This concept is the key to understanding protocol selection:
HTTP/1.1: Each connection handles one request at a time. To parallelize, browsers open 6 connections per domain. Each connection wastes slow-start ramp-up independently.
HTTP/2: Multiplexes multiple request/response streams over a single TCP connection. One connection, many concurrent requests. But: if a single TCP packet is lost, all streams on that connection stall while TCP retransmits. This is TCP-level head-of-line blocking.
HTTP/3 (QUIC): Runs over UDP with its own reliability per stream. A lost packet only stalls the stream it belongs to. Other streams continue unimpeded. This is why HTTP/3 wins on lossy networks (mobile, Wi-Fi) — the tail-latency improvement comes from eliminating cross-stream blocking.
Protocol Selection
Choosing the right protocol at each layer of your architecture is a Staff-level decision that most candidates skip. Here's the decision framework:
| Boundary | Protocol | Why |
|---|---|---|
| Browser → API Gateway | HTTPS/2 | Broad compatibility, multiplexing, TLS required |
| Mobile → API (lossy network) | HTTP/3 (QUIC) | Independent stream recovery, 0-RTT resumption |
| Service → Service (same region) | gRPC over HTTP/2 | Binary protobuf, streaming, strong typing, efficient |
| Service → Service (cross-region) | gRPC with retries + deadlines | High RTT demands efficient protocol + explicit timeout budgets |
| Real-time bidirectional | WebSocket over TLS | Persistent connection, low per-message overhead |
| Static assets | CDN + HTTP/2 | Edge caching eliminates origin RTT entirely |
| Event streaming | TCP (Kafka, NATS) | High throughput, binary protocol, persistent connections |
gRPC vs REST: The Staff-Level Framing
Do not say "gRPC is faster than REST." Say: "gRPC uses protobuf (binary, ~10x smaller than JSON, with schema enforcement) over HTTP/2 (multiplexed). It is the right choice for internal service-to-service communication where we control both ends and need type safety. REST is the right choice at organizational boundaries — external APIs, partner integrations — where discoverability, tooling, and human readability matter more than wire efficiency."
The Numbers That Matter
| Metric | Value | Design Implication |
|---|---|---|
| Same-AZ RTT | ~0.5ms | Baseline for microservice call chains; 10 sequential calls = 5ms |
| Same-region, cross-AZ RTT | ~1ms | Cost of synchronous replication; 3 AZs = 2ms for quorum write |
| Cross-continent RTT | ~150ms | Hard floor for global user-facing requests; must cache at edge |
| TLS 1.3 handshake | 1 RTT | Connection reuse eliminates this entirely |
| TLS 1.2 handshake | 2 RTTs | Upgrade to TLS 1.3 or use session resumption |
| TCP slow start initial window | ~14KB | First request on a fresh connection is bandwidth-limited |
| TCP ramp to full speed | ~4–5 RTTs | New connections are slow for the first ~100ms |
| 1 Gbps throughput | ~120 MB/s practical | Plan for 70% utilization; 840 Mbps usable |
| DNS resolution (cached) | <1ms | Negligible with warm resolver |
| DNS resolution (cold/miss) | 10–50ms | First request from new container pays this cost |
| WebSocket keepalive overhead | ~50 bytes/30s | Negligible per connection; at 1M connections = 1.6 MB/s |
Tail-Latency Amplification
This is the most important networking concept for microservice architectures, and the one most candidates miss.
When a user request involves N sequential service calls, the user-visible latency is the sum of all calls. But the p99 latency is worse than the sum of individual p99s because you're taking the worst case across multiple independent probabilistic events.
Example: 4 sequential services, each with p99 = 50ms.
P(single call < 50ms) = 0.99
P(all 4 calls < 50ms) = 0.99⁴ = 0.961
→ p99 of the chain ≈ p96 of any single call
→ To keep chain p99 at 200ms, each service needs p99.75 < 50ms
The fix is architectural, not operational:
| Strategy | Effect |
|---|---|
| Parallelize calls | Max(latencies) instead of Sum(latencies) |
| Batch calls | 1 network hop instead of N |
| Eliminate calls | Cache locally, embed data, denormalize |
| Set explicit deadlines | Each service knows its budget; no unbounded waits |
| Hedged requests | Send to 2 backends, take first response; p99 → p99² |
Visual Guide
Connection Lifecycle
DNS Failover Timeline
How This Shows Up in Interviews
Scenario 1: "Your API latency is too high"
Do not start with code profiling. Say: "First, let me decompose the network path. How many hops? Our request goes client → CDN → LB → app → DB — that's 4 hops. Same-AZ at 0.5ms each: 2ms of network time. But if any hop is cross-region (150ms RTT), that single hop dominates the entire latency budget. Second, are connections pooled? A fresh TLS + TCP connection to the DB adds 2ms same-AZ but 300ms cross-region. Let me check connection reuse before looking at application code." The network topology is always the first diagnostic, not the last.
Scenario 2: "Design for users in multiple regions" (Full Walkthrough)
This tests whether you understand the physics of network propagation. Here's how a Staff engineer works through it:
Step 1 — Quantify the problem. "If our servers are in us-east-1 and a user in Tokyo makes a request, the cross-Pacific RTT is ~150ms. A TLS 1.3 handshake adds 1 RTT, and the request/response is another RTT. That's 450ms minimum before any application logic. Our latency budget is 200ms. The math doesn't work — we cannot serve Tokyo from us-east."
Step 2 — CDN for static content. "Static assets and cacheable API responses go through a CDN with edge PoPs in Tokyo. The user-to-edge RTT drops to 5–10ms. This handles 60–80% of requests — page loads, images, public API responses with cache headers."
Step 3 — Regional deployment for dynamic content. "For user-specific dynamic content (feed, notifications, dashboard), we deploy application servers and read replicas in ap-northeast-1 (Tokyo). Reads are local: user → Tokyo LB → Tokyo app → Tokyo read replica. RTT per hop: 0.5ms. Total: ~3ms for the network path."
Step 4 — The write path is the hard problem. "Writes must reach the primary database. If the primary is in us-east, every write pays 150ms cross-region RTT. Options: (a) async replication with eventual consistency — user sees their write locally, primary catches up within 1–2s. (b) Multi-master with conflict resolution — both regions accept writes, conflicts resolved by timestamp or application logic. (c) Route writes to the primary synchronously and accept 150ms write latency."
Step 5 — Choose based on the product. "For a social media feed, option (a) is correct. Users can tolerate 1–2s before their post is visible globally. For a payment system, option (c) is correct — we accept the write latency because we cannot tolerate conflicts on financial transactions."
Why this is a Staff answer: The candidate starts with physics (RTT), quantifies why the naive approach fails, layers solutions (CDN → regional reads → write strategy), and makes the final choice based on the product's consistency requirements, not a technical preference.
Scenario 3: "Our microservices have unpredictable latency spikes"
This tests tail-latency amplification. Say: "4 sequential services, each with p99 of 50ms. The chain p99 isn't 200ms — it's worse because you're rolling the dice 4 times. P(all 4 under 50ms) = 0.99⁴ = 96.1%, so our chain p99 is actually closer to the p96 of each service. First fix: can any calls be parallelized? Parallel calls are max-of instead of sum-of latencies. Second: set explicit per-hop deadline budgets. Third: are we hedging? Send the same request to two backends, take whichever responds first — that turns p99 into p99²."
Scenario 4: "How would you handle a DNS failover?"
This tests operational maturity around DNS TTLs. A 300s TTL means 5 minutes of degraded traffic post-failover. The Staff answer: short TTLs (30–60s) for critical endpoints, client-side retry with fallback IPs, health-check-based routing (Route 53 health checks), and acceptance that DNS failover has an inherent propagation delay — it is not instantaneous.
In the Wild
Cloudflare: The Edge Network Architecture
Cloudflare operates 300+ edge PoPs worldwide, terminating TLS as close to the user as possible. Their key architectural insight: by terminating TLS at the edge, the expensive handshake (1 RTT) happens over a short path (user to nearest PoP, typically <20ms RTT). The connection from the edge to the origin uses persistent, pre-warmed connection pools over Cloudflare's private backbone — eliminating both TLS overhead and TCP slow start on the origin path.
The Staff-level insight: Cloudflare's architecture is a physical implementation of the connection lifecycle optimization. They've pushed TLS termination to where the RTT is smallest (the edge) and used connection pooling where RTTs are larger (edge to origin). The same principle applies at smaller scale: terminate TLS at your load balancer, not at each application server.
Google: gRPC and the Internal Network
Google built gRPC because their internal network carries tens of billions of RPCs per second across millions of machines. At this scale, the overhead of REST (JSON serialization at 1–5ms, verbose headers, no streaming) is architectural, not incidental. gRPC's protobuf encoding is ~10x smaller and ~10x faster to serialize than JSON. Multiplied by billions of RPCs, this translates to measurable reductions in CPU utilization and network bandwidth.
The Staff-level insight: Google didn't choose gRPC because it's "faster." They chose it because at 10B+ RPCs/sec, the serialization overhead of JSON becomes a significant fraction of their total compute cost. The protocol choice was a capacity planning decision, not a performance optimization. At smaller scale (<10K RPS), the difference is negligible and REST's tooling advantage dominates.
Netflix: The Open Connect CDN
Netflix built its own CDN, Open Connect, with embedded cache servers inside ISP networks worldwide. When a user in São Paulo watches a show, the video streams from a cache box physically inside their ISP's data center — RTT is effectively zero (sub-millisecond). Netflix pre-positions content overnight during off-peak hours, using available bandwidth to fill caches rather than competing with peak-hour traffic.
The Staff-level insight: Netflix's innovation isn't technical — it's operational. They solved the cross-region latency problem not by building faster networks but by moving the data to where the user already is. This is the ultimate expression of "compute at the edge": if the latency budget is too tight for any network architecture, eliminate the network entirely.
Staff Calibration
The sections below are calibration tools for Staff-level interviews. If you already understand networking mechanics, start here to sharpen the framing that separates L5 from L6 answers.
What Staff Engineers Say (That Seniors Don't)
| Concept | Senior Response | Staff Response |
|---|---|---|
| Latency | "We can add a cache to reduce latency" | "Our latency budget is 200ms. Cross-region RTT alone consumes 150ms, so caching must be at the edge or we fail budget before application logic runs" |
| DNS failover | "DNS will route to the healthy region" | "DNS TTL of 300s means 5 minutes of degraded traffic post-failover. We need client-side retry with a fallback IP, or we accept that SLA gap" |
| TLS | "We terminate TLS at the load balancer" | "We terminate TLS at the edge to pay the handshake cost once, then run plaintext inside the VPC to avoid re-encryption overhead per hop" |
| Connection reuse | "We use connection pooling" | "Each new TCP connection costs 1 RTT for handshake plus slow start. A warm pool of 50 connections per backend eliminates that for p99, but we size the pool to avoid file descriptor exhaustion" |
| HTTP/2 vs HTTP/3 | "HTTP/2 is faster because of multiplexing" | "HTTP/2 multiplexing helps, but a single TCP packet loss stalls every stream. For mobile or lossy networks, QUIC gives us independent stream recovery — that is where the real tail latency win lives" |
| Call chain latency | "Each service is fast, so the chain is fast" | "P99 of 4 sequential services is worse than individual p99s due to probability stacking. We parallelize independent calls and set per-hop deadline budgets to cap the chain p99" |
Common Interview Traps
- Ignoring RTT in call chain math. Five sequential microservice calls at 1ms each is 5ms, not "negligible." Cross-region, that same chain is 750ms and your design is broken.
- Treating DNS as instant and reliable. Candidates propose DNS failover without accounting for TTL propagation delay or client-side caching behavior.
- Proposing HTTP/2 as a silver bullet. Multiplexing helps, but TCP head-of-line blocking remains. Interviewers probe whether you understand the layer at which the problem actually lives.
- Forgetting connection lifecycle costs. Adding a new proxy hop means a new TLS termination and TCP slow start unless you explicitly design for connection pooling at that layer.
- Assuming same-region means low latency. Cross-AZ RTT (1ms) × a 10-hop microservice chain = 10ms of pure network overhead before any computation.
- Designing for bandwidth when latency is the constraint. Most microservice payloads are <10 KB. The bottleneck is RTT count, not throughput.
- Ignoring keepalive configuration. HTTP keepalive defaults vary by language and framework. A 5-second idle timeout means connections are frequently re-established under bursty traffic.
- Forgetting DNS resolution latency. Each DNS lookup can add 1–50ms depending on caching. In a fresh container, the first request pays full resolution cost.
Practice Drill
Staff-Caliber Answer ShapeExpand
- Decompose the 800ms. Instrument each hop: what's the RTT between each service pair? Are these same-AZ (0.5ms expected) or cross-AZ (1ms)? Is any leg cross-region?
- Check connection reuse. Are connections being pooled or re-established per request? Four fresh TLS 1.3 handshakes at 1ms each is 4ms — negligible. But four fresh connections at 150ms cross-region is 600ms just in handshakes.
- Measure serialization overhead. Is this JSON over REST (parsing cost) or protobuf over gRPC (binary, fast)? For large payloads, serialization can dominate.
- Look at the dependency graph. Can any of the 4 calls be parallelized? Sequential calls are additive latency; parallel calls are max-of-group latency.
- Check tail latency amplification. P99 of 4 sequential calls is worse than p99 of any single call. If each service has p99 of 200ms, the chain p99 is higher than 200ms due to probability stacking.
The Staff move: Don't start with code profiling. Start with the network topology and ask whether the call chain can be restructured (parallel, batched, or eliminated).
Where This Appears
These playbooks apply networking foundations to complete system design problems with full Staff-level walkthroughs, evaluator-grade rubrics, and practice drills.
- CDN & Edge Caching — Edge PoP architecture, TLS termination at the edge, cache key design, and the operational cost of purge propagation across a global edge network
- Load Balancer — L4 vs L7 load balancing, TCP connection termination, health checking, and why connection-aware routing outperforms round-robin under uneven payload sizes
- API Gateway — TLS termination strategy, connection pooling between gateway and backends, request timeout budgets, and protocol translation (REST → gRPC)
- Service Discovery — DNS-based discovery with TTL tradeoffs, client-side vs server-side discovery, health check propagation latency, and why DNS is not a real-time routing mechanism
- Chat & Messaging — Persistent WebSocket connections at scale, keepalive overhead at 1M+ connections, and the connection-per-user model vs multiplexed protocols
Related Technologies: API Gateway · Redis