StaffSignal
Foundation — Study Guide19 min read

Network Latency & Protocols

Every millisecond of RTT is a design decision you can't optimize away later. Latency budgets, connection lifecycle costs, protocol selection (HTTP/2 vs gRPC vs WebSocket), and how tail-latency amplification turns a 5-service call chain into a p99 nightmare.

Why This Matters

Every distributed system is a network conversation. Your microservices, databases, caches, and users communicate over a network that introduces latency, packet loss, and bandwidth constraints. The architecture decisions you make — where to terminate TLS, how to pool connections, which protocol to use between services — are all network decisions. And yet, most system design candidates treat the network as an invisible wire that moves data instantly between boxes on their whiteboard.

Staff-level interviewers probe networking because it exposes a candidate's depth of operational experience. A senior engineer might say "we add a cache to reduce latency." A Staff engineer says "our latency budget is 200ms, and cross-region RTT alone consumes 150ms. Caching must happen at the edge, or we've blown the budget before application logic runs." That response demonstrates that the candidate understands latency budgets, the physics of network propagation, and the constraint that geography imposes on architecture.

This guide teaches you the networking concepts that actually appear in system design interviews: RTT and latency budgets, connection lifecycle costs, protocol selection, and the tail-latency amplification that kills microservice architectures. You will not need to explain TCP congestion control in detail — but you will need to explain why connection pooling matters and when HTTP/3 is the right choice.

The 60-Second Version

  • RTT is the fundamental design constraint. Every network decision is a latency budget decision. Same-AZ: ~0.5ms. Same-region: ~1ms. Cross-continent: ~150ms. These numbers shape every architecture choice.
  • DNS TTL is a staleness injection point. DNS-based load balancing means clients cache resolved IPs. A 60s TTL means up to 60s of traffic to a dead backend after failover.
  • TLS has a connection tax. TLS 1.3 costs 1 RTT; TLS 1.2 costs 2 RTTs. Connection reuse (keep-alive, pooling) eliminates this cost for all subsequent requests.
  • TCP slow start throttles new connections. The first ~14KB trickles through a fresh connection. Connection pooling amortizes this ramp-up across many requests.
  • HTTP/2 multiplexing solves HTTP head-of-line blocking but not TCP's. A single dropped packet stalls all streams on that connection. HTTP/3 (QUIC) over UDP eliminates this.
  • Connection lifecycle cost drives architecture. Where you terminate TLS, where you pool connections, and where you place proxies are the decisions that determine your tail latency.
  • Tail-latency amplification is the microservice killer. P99 of N sequential service calls is worse than the p99 of any single call. The fix is architectural (parallelize, batch, eliminate), not operational.

How Networking Works in Distributed Systems

The Latency Budget Model

Every user-facing request has a latency budget — the maximum time the user will wait before the experience feels broken. For a web application, this is typically 200–500ms. For a real-time system, it might be 50ms. For batch processing, it might be 5 seconds.

The network consumes a portion of this budget before your code runs. Understanding how much it consumes is the first step in any architecture decision:

Network HopLatency CostBudget Consumed (of 200ms)
Client → CDN edge5–20ms10%
CDN → origin (same region)1–2ms1%
Service → Service (same AZ)0.5ms0.25%
Service → Service (cross-AZ)1–2ms1%
Service → Database0.5–2ms1%
Cross-region replication50–150ms75%

The key insight: A single cross-region hop can consume 75% of your latency budget. This is why multi-region architectures require local reads. It's not an optimization — it's a mathematical necessity.

Connection Lifecycle

Every network connection has a startup cost that most candidates ignore. Understanding this cost explains why connection pooling is not an optimization — it's a requirement.

A fresh HTTPS connection requires:

  1. DNS resolution: 1–50ms (cached: <1ms, cold: 50ms+)
  2. TCP handshake: 1 RTT (SYN → SYN-ACK → ACK)
  3. TLS handshake: 1 RTT for TLS 1.3, 2 RTTs for TLS 1.2
  4. First data transfer: Limited by TCP slow start (~14KB initial window)

Total cost for a fresh cross-region connection: 3 RTTs + slow start ≈ 450ms before any application data flows.

A reused connection (keep-alive or pooled): 0 RTTs overhead. Data flows immediately at the full congestion window.

# Fresh connection (cross-region, 150ms RTT):
DNS:        0ms   (cached)
TCP:      150ms   (1 RTT)
TLS 1.3:  150ms   (1 RTT)
Slow start: 450ms  (3 RTTs to ramp up for 200KB response)
Total:     750ms   before full data transfer

# Pooled connection:
Overhead:    0ms
Data:      150ms   (1 RTT for request/response)
Total:     150ms

This 5x difference is why connection pooling is mandatory for any latency-sensitive path.

TCP Slow Start

A fresh TCP connection starts with an initial congestion window (IW) of 10 segments × 1,460 bytes = ~14 KB. After each RTT, the window roughly doubles:

RTT #Congestion WindowCumulative Data
014 KB14 KB
128 KB42 KB
256 KB98 KB
3112 KB210 KB
4224 KB434 KB

Implication: A 200 KB API response on a fresh connection takes 3 RTTs just to deliver the payload — on top of the TLS handshake. On a cross-region connection (150ms RTT), that's 450ms of slow-start overhead alone. Connection pooling eliminates this entirely because the congestion window is already warm.

Head-of-Line Blocking

This concept is the key to understanding protocol selection:

HTTP/1.1: Each connection handles one request at a time. To parallelize, browsers open 6 connections per domain. Each connection wastes slow-start ramp-up independently.

HTTP/2: Multiplexes multiple request/response streams over a single TCP connection. One connection, many concurrent requests. But: if a single TCP packet is lost, all streams on that connection stall while TCP retransmits. This is TCP-level head-of-line blocking.

HTTP/3 (QUIC): Runs over UDP with its own reliability per stream. A lost packet only stalls the stream it belongs to. Other streams continue unimpeded. This is why HTTP/3 wins on lossy networks (mobile, Wi-Fi) — the tail-latency improvement comes from eliminating cross-stream blocking.

Protocol Selection

Choosing the right protocol at each layer of your architecture is a Staff-level decision that most candidates skip. Here's the decision framework:

BoundaryProtocolWhy
Browser → API GatewayHTTPS/2Broad compatibility, multiplexing, TLS required
Mobile → API (lossy network)HTTP/3 (QUIC)Independent stream recovery, 0-RTT resumption
Service → Service (same region)gRPC over HTTP/2Binary protobuf, streaming, strong typing, efficient
Service → Service (cross-region)gRPC with retries + deadlinesHigh RTT demands efficient protocol + explicit timeout budgets
Real-time bidirectionalWebSocket over TLSPersistent connection, low per-message overhead
Static assetsCDN + HTTP/2Edge caching eliminates origin RTT entirely
Event streamingTCP (Kafka, NATS)High throughput, binary protocol, persistent connections

gRPC vs REST: The Staff-Level Framing

Do not say "gRPC is faster than REST." Say: "gRPC uses protobuf (binary, ~10x smaller than JSON, with schema enforcement) over HTTP/2 (multiplexed). It is the right choice for internal service-to-service communication where we control both ends and need type safety. REST is the right choice at organizational boundaries — external APIs, partner integrations — where discoverability, tooling, and human readability matter more than wire efficiency."

The Numbers That Matter

MetricValueDesign Implication
Same-AZ RTT~0.5msBaseline for microservice call chains; 10 sequential calls = 5ms
Same-region, cross-AZ RTT~1msCost of synchronous replication; 3 AZs = 2ms for quorum write
Cross-continent RTT~150msHard floor for global user-facing requests; must cache at edge
TLS 1.3 handshake1 RTTConnection reuse eliminates this entirely
TLS 1.2 handshake2 RTTsUpgrade to TLS 1.3 or use session resumption
TCP slow start initial window~14KBFirst request on a fresh connection is bandwidth-limited
TCP ramp to full speed~4–5 RTTsNew connections are slow for the first ~100ms
1 Gbps throughput~120 MB/s practicalPlan for 70% utilization; 840 Mbps usable
DNS resolution (cached)<1msNegligible with warm resolver
DNS resolution (cold/miss)10–50msFirst request from new container pays this cost
WebSocket keepalive overhead~50 bytes/30sNegligible per connection; at 1M connections = 1.6 MB/s

Tail-Latency Amplification

This is the most important networking concept for microservice architectures, and the one most candidates miss.

When a user request involves N sequential service calls, the user-visible latency is the sum of all calls. But the p99 latency is worse than the sum of individual p99s because you're taking the worst case across multiple independent probabilistic events.

Example: 4 sequential services, each with p99 = 50ms.

P(single call < 50ms) = 0.99
P(all 4 calls < 50ms) = 0.99⁴ = 0.961
→ p99 of the chain ≈ p96 of any single call
→ To keep chain p99 at 200ms, each service needs p99.75 < 50ms

The fix is architectural, not operational:

StrategyEffect
Parallelize callsMax(latencies) instead of Sum(latencies)
Batch calls1 network hop instead of N
Eliminate callsCache locally, embed data, denormalize
Set explicit deadlinesEach service knows its budget; no unbounded waits
Hedged requestsSend to 2 backends, take first response; p99 → p99²

Visual Guide

Connection Lifecycle

Rendering diagram...

DNS Failover Timeline

Rendering diagram...

How This Shows Up in Interviews

Scenario 1: "Your API latency is too high"

Do not start with code profiling. Say: "First, let me decompose the network path. How many hops? Our request goes client → CDN → LB → app → DB — that's 4 hops. Same-AZ at 0.5ms each: 2ms of network time. But if any hop is cross-region (150ms RTT), that single hop dominates the entire latency budget. Second, are connections pooled? A fresh TLS + TCP connection to the DB adds 2ms same-AZ but 300ms cross-region. Let me check connection reuse before looking at application code." The network topology is always the first diagnostic, not the last.

Scenario 2: "Design for users in multiple regions" (Full Walkthrough)

This tests whether you understand the physics of network propagation. Here's how a Staff engineer works through it:

Step 1 — Quantify the problem. "If our servers are in us-east-1 and a user in Tokyo makes a request, the cross-Pacific RTT is ~150ms. A TLS 1.3 handshake adds 1 RTT, and the request/response is another RTT. That's 450ms minimum before any application logic. Our latency budget is 200ms. The math doesn't work — we cannot serve Tokyo from us-east."

Step 2 — CDN for static content. "Static assets and cacheable API responses go through a CDN with edge PoPs in Tokyo. The user-to-edge RTT drops to 5–10ms. This handles 60–80% of requests — page loads, images, public API responses with cache headers."

Step 3 — Regional deployment for dynamic content. "For user-specific dynamic content (feed, notifications, dashboard), we deploy application servers and read replicas in ap-northeast-1 (Tokyo). Reads are local: user → Tokyo LB → Tokyo app → Tokyo read replica. RTT per hop: 0.5ms. Total: ~3ms for the network path."

Step 4 — The write path is the hard problem. "Writes must reach the primary database. If the primary is in us-east, every write pays 150ms cross-region RTT. Options: (a) async replication with eventual consistency — user sees their write locally, primary catches up within 1–2s. (b) Multi-master with conflict resolution — both regions accept writes, conflicts resolved by timestamp or application logic. (c) Route writes to the primary synchronously and accept 150ms write latency."

Step 5 — Choose based on the product. "For a social media feed, option (a) is correct. Users can tolerate 1–2s before their post is visible globally. For a payment system, option (c) is correct — we accept the write latency because we cannot tolerate conflicts on financial transactions."

Why this is a Staff answer: The candidate starts with physics (RTT), quantifies why the naive approach fails, layers solutions (CDN → regional reads → write strategy), and makes the final choice based on the product's consistency requirements, not a technical preference.

Scenario 3: "Our microservices have unpredictable latency spikes"

This tests tail-latency amplification. Say: "4 sequential services, each with p99 of 50ms. The chain p99 isn't 200ms — it's worse because you're rolling the dice 4 times. P(all 4 under 50ms) = 0.99⁴ = 96.1%, so our chain p99 is actually closer to the p96 of each service. First fix: can any calls be parallelized? Parallel calls are max-of instead of sum-of latencies. Second: set explicit per-hop deadline budgets. Third: are we hedging? Send the same request to two backends, take whichever responds first — that turns p99 into p99²."

Scenario 4: "How would you handle a DNS failover?"

This tests operational maturity around DNS TTLs. A 300s TTL means 5 minutes of degraded traffic post-failover. The Staff answer: short TTLs (30–60s) for critical endpoints, client-side retry with fallback IPs, health-check-based routing (Route 53 health checks), and acceptance that DNS failover has an inherent propagation delay — it is not instantaneous.

In the Wild

Cloudflare: The Edge Network Architecture

Cloudflare operates 300+ edge PoPs worldwide, terminating TLS as close to the user as possible. Their key architectural insight: by terminating TLS at the edge, the expensive handshake (1 RTT) happens over a short path (user to nearest PoP, typically <20ms RTT). The connection from the edge to the origin uses persistent, pre-warmed connection pools over Cloudflare's private backbone — eliminating both TLS overhead and TCP slow start on the origin path.

The Staff-level insight: Cloudflare's architecture is a physical implementation of the connection lifecycle optimization. They've pushed TLS termination to where the RTT is smallest (the edge) and used connection pooling where RTTs are larger (edge to origin). The same principle applies at smaller scale: terminate TLS at your load balancer, not at each application server.

Google: gRPC and the Internal Network

Google built gRPC because their internal network carries tens of billions of RPCs per second across millions of machines. At this scale, the overhead of REST (JSON serialization at 1–5ms, verbose headers, no streaming) is architectural, not incidental. gRPC's protobuf encoding is ~10x smaller and ~10x faster to serialize than JSON. Multiplied by billions of RPCs, this translates to measurable reductions in CPU utilization and network bandwidth.

The Staff-level insight: Google didn't choose gRPC because it's "faster." They chose it because at 10B+ RPCs/sec, the serialization overhead of JSON becomes a significant fraction of their total compute cost. The protocol choice was a capacity planning decision, not a performance optimization. At smaller scale (<10K RPS), the difference is negligible and REST's tooling advantage dominates.

Netflix: The Open Connect CDN

Netflix built its own CDN, Open Connect, with embedded cache servers inside ISP networks worldwide. When a user in São Paulo watches a show, the video streams from a cache box physically inside their ISP's data center — RTT is effectively zero (sub-millisecond). Netflix pre-positions content overnight during off-peak hours, using available bandwidth to fill caches rather than competing with peak-hour traffic.

The Staff-level insight: Netflix's innovation isn't technical — it's operational. They solved the cross-region latency problem not by building faster networks but by moving the data to where the user already is. This is the ultimate expression of "compute at the edge": if the latency budget is too tight for any network architecture, eliminate the network entirely.


Staff Calibration

The sections below are calibration tools for Staff-level interviews. If you already understand networking mechanics, start here to sharpen the framing that separates L5 from L6 answers.

What Staff Engineers Say (That Seniors Don't)

ConceptSenior ResponseStaff Response
Latency"We can add a cache to reduce latency""Our latency budget is 200ms. Cross-region RTT alone consumes 150ms, so caching must be at the edge or we fail budget before application logic runs"
DNS failover"DNS will route to the healthy region""DNS TTL of 300s means 5 minutes of degraded traffic post-failover. We need client-side retry with a fallback IP, or we accept that SLA gap"
TLS"We terminate TLS at the load balancer""We terminate TLS at the edge to pay the handshake cost once, then run plaintext inside the VPC to avoid re-encryption overhead per hop"
Connection reuse"We use connection pooling""Each new TCP connection costs 1 RTT for handshake plus slow start. A warm pool of 50 connections per backend eliminates that for p99, but we size the pool to avoid file descriptor exhaustion"
HTTP/2 vs HTTP/3"HTTP/2 is faster because of multiplexing""HTTP/2 multiplexing helps, but a single TCP packet loss stalls every stream. For mobile or lossy networks, QUIC gives us independent stream recovery — that is where the real tail latency win lives"
Call chain latency"Each service is fast, so the chain is fast""P99 of 4 sequential services is worse than individual p99s due to probability stacking. We parallelize independent calls and set per-hop deadline budgets to cap the chain p99"

Common Interview Traps

  • Ignoring RTT in call chain math. Five sequential microservice calls at 1ms each is 5ms, not "negligible." Cross-region, that same chain is 750ms and your design is broken.
  • Treating DNS as instant and reliable. Candidates propose DNS failover without accounting for TTL propagation delay or client-side caching behavior.
  • Proposing HTTP/2 as a silver bullet. Multiplexing helps, but TCP head-of-line blocking remains. Interviewers probe whether you understand the layer at which the problem actually lives.
  • Forgetting connection lifecycle costs. Adding a new proxy hop means a new TLS termination and TCP slow start unless you explicitly design for connection pooling at that layer.
  • Assuming same-region means low latency. Cross-AZ RTT (1ms) × a 10-hop microservice chain = 10ms of pure network overhead before any computation.
  • Designing for bandwidth when latency is the constraint. Most microservice payloads are <10 KB. The bottleneck is RTT count, not throughput.
  • Ignoring keepalive configuration. HTTP keepalive defaults vary by language and framework. A 5-second idle timeout means connections are frequently re-established under bursty traffic.
  • Forgetting DNS resolution latency. Each DNS lookup can add 1–50ms depending on caching. In a fresh container, the first request pays full resolution cost.

Practice Drill

Staff-Caliber Answer Shape
Expand
  1. Decompose the 800ms. Instrument each hop: what's the RTT between each service pair? Are these same-AZ (0.5ms expected) or cross-AZ (1ms)? Is any leg cross-region?
  2. Check connection reuse. Are connections being pooled or re-established per request? Four fresh TLS 1.3 handshakes at 1ms each is 4ms — negligible. But four fresh connections at 150ms cross-region is 600ms just in handshakes.
  3. Measure serialization overhead. Is this JSON over REST (parsing cost) or protobuf over gRPC (binary, fast)? For large payloads, serialization can dominate.
  4. Look at the dependency graph. Can any of the 4 calls be parallelized? Sequential calls are additive latency; parallel calls are max-of-group latency.
  5. Check tail latency amplification. P99 of 4 sequential calls is worse than p99 of any single call. If each service has p99 of 200ms, the chain p99 is higher than 200ms due to probability stacking.

The Staff move: Don't start with code profiling. Start with the network topology and ask whether the call chain can be restructured (parallel, batched, or eliminated).

Where This Appears

These playbooks apply networking foundations to complete system design problems with full Staff-level walkthroughs, evaluator-grade rubrics, and practice drills.

  • CDN & Edge Caching — Edge PoP architecture, TLS termination at the edge, cache key design, and the operational cost of purge propagation across a global edge network
  • Load Balancer — L4 vs L7 load balancing, TCP connection termination, health checking, and why connection-aware routing outperforms round-robin under uneven payload sizes
  • API Gateway — TLS termination strategy, connection pooling between gateway and backends, request timeout budgets, and protocol translation (REST → gRPC)
  • Service Discovery — DNS-based discovery with TTL tradeoffs, client-side vs server-side discovery, health check propagation latency, and why DNS is not a real-time routing mechanism
  • Chat & Messaging — Persistent WebSocket connections at scale, keepalive overhead at 1M+ connections, and the connection-per-user model vs multiplexed protocols

Related Technologies: API Gateway · Redis

This is one of 9 foundation guides. The full library also includes deep-dive system design playbooks with evaluator-grade breakdowns, practice drills, and failure-mode analysis. Explore the full library