StaffSignal

Design a Load Balancer

Staff-Level Playbook

Technologies referenced in this playbook: ZooKeeper & etcd

How to Use This Playbook

If you have 2 hours before your interview, read the Interview Walkthrough and §3 (Fault Lines). Everything else is depth you can pull on where you're weak. Appendices are collapsed — expand them for targeted review.

What is a Load Balancer? — Why interviewers pick this topic

The Problem

A load balancer distributes incoming network traffic across multiple backend servers to prevent any single server from becoming overwhelmed. Without it, one server handles all traffic until it crashes, while others sit idle. It's the traffic cop that keeps your system responsive and resilient.

Common Use Cases

  • Horizontal Scaling: Spread requests across a fleet of servers to handle more traffic than one machine could
  • High Availability: Route around failed servers automatically so users don't notice outages
  • Zero-Downtime Deployments: Drain traffic from servers during updates, then add them back
  • Geographic Distribution: Route users to the nearest datacenter for lower latency
  • SSL Termination: Offload encryption/decryption from backend servers

Why Interviewers Ask About This

Load balancing seems simple—just round-robin, right? But Staff-level interviews probe the hidden complexity: When do you need L7 vs L4? What happens when the load balancer itself fails? How do you handle sticky sessions without killing scalability? This topic reveals whether you understand the operational realities of running distributed systems at scale, not just the happy-path architecture diagrams.

Executive Summary

What This Interview Actually Tests

Load balancing is not a "just add nginx" question. Everyone knows round-robin.

This is a distributed systems ownership question that tests:

  • Whether you understand the L4 vs L7 tradeoff and when each matters
  • Whether you reason about health checking failure modes proactively
  • Whether you recognize session affinity as a scalability anti-pattern
  • Whether you can design for load balancer failure itself

The key insight: Load balancing is a single point of failure disguised as a reliability feature. Staff engineers reason about what happens when the "reliable" component fails.

The L5 vs L6 Contrast (Memorize This)

Level Calibration
BehaviorL5 (Senior)L6 (Staff)
First move"We'll add an nginx load balancer"Asks "What's our latency budget? Do we need application-layer routing?"
Algorithm"Round-robin is fine"Identifies when round-robin fails: long-lived connections, heterogeneous backends, stateful requests
Health checks"We'll ping every 5 seconds"Asks "What's the blast radius of a false positive? What's our detection-to-removal latency budget?"
Session affinity"We'll use sticky sessions"Warns that sticky sessions break horizontal scaling and asks "Can we make the backend stateless instead?"
FailureAssumes LB is reliableDesigns for LB failure: redundant LBs, DNS failover, client-side fallback
Ownership"DevOps handles load balancing"Defines SLOs for routing latency, health check accuracy, and failover time

Default Staff Positions

These are your opening stances. Adjust based on requirements.

Default Staff Positions
DimensionDefault PositionRationale
L4 vs L7Start with L4, upgrade to L7 only if neededL4 is faster and simpler; L7 adds latency but enables content-based routing
AlgorithmLeast connections for most workloadsRound-robin fails with variable request durations; least-connections adapts
Health checksActive + passive, tuned for workloadActive catches silent failures; passive reduces detection latency
Session affinityAvoid if possible; use external session storeSticky sessions are a scalability trap; externalize state to Redis/DB
RedundancyActive-passive LB pair minimumSingle LB is a SPOF; active-active adds complexity but improves capacity

System Architecture Overview

Rendering diagram...

Interview Walkthrough

The six phases below are compressed for a deep-dive format. Phases 1-3 fit in 2-3 minutes. If the interviewer keeps probing, you expand into Phase 5's detailed fault lines. Most interviews don't go beyond the first probe — know when to stop talking.

Phase 1: Requirements & Framing (30 seconds)

Name the intent before drawing a single box:

  • "Load balancing serves three purposes: distribute traffic for horizontal scaling, detect and route around unhealthy instances, and enable zero-downtime deployments."

Then immediately frame the four decisions:

  • "Four things matter: L4 vs L7, algorithm selection, health checking strategy, and LB redundancy. Let me walk through each."

Phase 2: Core Entities & API (30 seconds)

State the components (not entities — this is infrastructure):

  • VIP (Virtual IP): the stable endpoint clients connect to; maps to the load balancer
  • Backend pool: set of healthy server instances, each with weight and health status
  • Health check: active probe (HTTP GET /health) + passive monitoring (error rate tracking)
  • Connection drain: graceful removal of a backend — finish in-flight requests before cutting traffic

Phase 3: The 2-Minute Architecture (2 minutes)

Deliver this in ~90 seconds. Hit the four key decisions and move on:

Then stop. Let the interviewer steer.

Phase 4: Transition to Depth (15 seconds)

If the interviewer wants more, offer choices:

"I can go deeper on any of these four. The most interesting tradeoffs are: health check tuning and cascading failure, sticky sessions as a scalability trap, or what happens to in-flight requests when the LB itself fails."

Phase 5: Deep Dives (5-15 minutes if probed)

Probe 1: "What happens when a health check is wrong?" (3-5 min)

Walk through the fix:

  1. Blast radius protection: Never remove more than 20% of the fleet at once. If >20% are failing health checks, assume the problem is the health check, not the servers. "Panic mode: if half your fleet fails health checks simultaneously, the health check is lying."
  2. Gradual drain, not instant removal: When a server fails health checks, reduce its weight over 30 seconds before removing. This gives time for false positives to self-correct.
  3. Health check circuit breaker: If the health check endpoint itself is slow (because the server is under load), don't fail the check — that creates a death spiral. Separate the "is the server alive?" check from the "is the server healthy?" check.

Probe 2: "What about sticky sessions?" (3-5 min)

Walk through the alternatives:

  1. Best answer: Externalize session state to Redis. Any server handles any request. Sticky sessions become unnecessary.
  2. If sticky sessions are unavoidable (legacy app, WebSocket connections): Use consistent hashing so adding/removing servers only remaps ~1/N sessions. "With 10 servers, adding one remaps ~10% of sessions. Round-robin remaps 100%."
  3. WebSocket sticky sessions: These are legitimate — the connection IS the session. Use a connection registry (Redis hash: connection_id → server_id) so other services know where to route messages for a specific connection.

Probe 3: "What if the load balancer itself fails?" (3-5 min)

  1. Active-passive pair: Two LB instances, one active. VRRP failover in 1-5 seconds. During failover, active TCP connections are reset — clients must reconnect.
  2. Active-active with ECMP: Multiple LB instances sharing the same VIP via BGP/ECMP. The network distributes packets across all instances. No failover — just capacity reduction when one fails.
  3. Cloud-managed LB: AWS ALB/NLB, GCP Cloud LB. Already multi-AZ redundant. "The LB is someone else's problem — but you still need to reason about cross-region failure."

The deeper question: "What happens to in-flight requests? L4 LBs reset TCP connections on failover — the client sees a connection timeout and must retry. L7 LBs can retry transparently for idempotent requests. Clients need retry logic regardless."

Probe 4: "How do you handle zero-downtime deployments?" (3-5 min)

"Connection draining is the key. When removing a server for deployment: (1) stop sending NEW requests, (2) let existing requests complete (drain timeout = p99 request duration × 2), (3) once all connections are closed or timeout expires, shut down the server."

"The reverse — bringing a new server online — uses slow-start. Don't immediately give it full traffic weight. Ramp from 10% to 100% over 30 seconds. This lets the new server warm up (JIT compilation, cache population) before taking full load."

Phase 6: Wrap-Up

If you've gone deep, close with the organizational insight:

"Load balancing is the most over-engineered and under-monitored part of most architectures. Teams spend weeks choosing between nginx and HAProxy, then never set up alerting for connection drain failures or health check false positives. The Staff question isn't 'which load balancer' — it's 'what are the failure modes, who gets paged, and what does the runbook say.'"

Quick-Reference: The 30-Second Cheat Sheet

Level Calibration
TopicThe L5 AnswerThe L6 Answer (say this)
Layer choice"L7 because we might need it""L4 unless we need content-based routing — L7 adds 5-10ms per request"
Algorithm"Round-robin""Least-connections — round-robin fails with variable request durations"
Health checks"Ping every 5 seconds""Active + passive combined, blast radius protected — never remove >20% of fleet"
Session state"Sticky sessions""Externalize to Redis — sticky sessions are a scalability trap"
LB failure"Use a managed LB""Active-passive minimum — and plan for in-flight TCP resets during failover"
Deployment"Zero-downtime deploy""Connection draining with timeout + slow-start on recovery"

Key Numbers Worth Memorizing

MetricValueWhy It Matters
L4 vs L7 latency5-10ms per request differenceL7 tax on every request
Health check cascadeRemove 1 of 5 = 25% load spikeWhy blast radius protection matters
Active-passive failover1-5 seconds (VRRP)Sets expectation for connection resets
Connection drain timeoutp99 request duration × 2Too short = dropped requests
Slow-start ramp10% → 100% over 30 secondsPrevents cold-cache overload

Subscribe to continue reading

Core sections
  • 1. The Staff Lens
  • 2. Problem Framing & Intent
  • 3. The Fault Lines
  • 4. Failure Modes & Degradation
  • 5. Evaluation Rubric
  • 6. Interview Flow & Pivots
  • 7. Active Drills
  • 8. Deep Dive Scenarios
  • 9. Level Expectations Summary
  • 10. Staff Insiders: Controversial Opinions
Practice & Reference
  • A.1 Layer 4 Load Balancing
  • A.2 Layer 7 Load Balancing
  • A.3 Decision Matrix
  • B.1 Round-Robin
  • B.2 Weighted Round-Robin
  • B.3 Least Connections
  • B.4 Least Response Time
  • B.5 Consistent Hashing
  • B.6 Random
  • C.1 Active Health Checks
  • C.2 Passive Health Checks
  • C.3 Combining Active + Passive
  • C.4 Tuning Parameters
  • D.1 Cookie-Based Affinity
  • D.2 IP-Based Affinity
  • D.3 Consistent Hashing
  • D.4 Externalized Session State
  • D.5 Decision Tree
  • E.1 Active-Passive (VRRP/Keepalived)
  • E.2 Active-Active
  • E.3 Cloud Load Balancers
  • E.4 Anycast
  • E.5 Client-Side Load Balancing
  • F.1 GeoDNS
  • F.2 Anycast + BGP
  • F.3 Global Server Load Balancing (GSLB)
  • F.4 Multi-Region Failover Strategy
  • G.1 Key Metrics
  • G.2 Alerting Thresholds
  • G.3 Dashboards
  • H.1 nginx
  • H.2 HAProxy
  • H.3 AWS Application Load Balancer (ALB)
  • H.4 AWS Network Load Balancer (NLB)
  • H.5 GCP Cloud Load Balancer
  • H.6 Envoy (Service Mesh)