Real-time Updates | StaffSignal Framework

Why This Matters

"Add real-time updates" sounds like a feature request. In practice, it is an infrastructure commitment that touches load balancing, deployment pipelines, memory budgets, and failure recovery. Every persistent connection — whether WebSocket, SSE, or long poll — is operational state that your fleet must manage through deployments, crashes, and traffic spikes. Most candidates pick WebSockets by default without sizing the operational cost. Staff engineers pick the cheapest transport that meets the latency SLA, then focus on the harder problem: how do you route a message to the right user on the right server without broadcasting to the entire fleet?

Real-time appears in nearly every system design interview — chat, notifications, live feeds, collaborative editing, dashboards. The transport decision (WebSocket vs SSE vs polling) is the easy part. The fan-out strategy (how a single message reaches thousands of recipients efficiently) is what separates Staff from senior answers. If you can articulate why SSE is your default, when WebSockets earn their complexity, and how fan-out scales across a 200-server gateway fleet — you demonstrate the operational reasoning that Staff interviews demand.

The 60-Second Version

Every persistent connection is operational state. Memory per socket, file descriptors, health checking, reconnection logic, drain during deploys. Multiply connection_count × memory_per_connection × deploy_frequency before choosing WebSockets.
SSE is the Staff default for server-to-client push. Auto-reconnect built into the browser spec, Last-Event-ID for replay on reconnect, works through every HTTP proxy and load balancer. WebSockets only when bidirectional data flow is genuinely required.
Fan-out strategy matters more than transport. A message to a 5-person group is O(5) lookups. A message to a 50K live stream is a pub/sub fan-out across 50 gateway servers. Different strategies for different group sizes — not one pattern for all.
Presence is the hidden cost center. A user with 500 contacts going online triggers 500 message deliveries. At 10K logins/sec, that is 5M presence messages/sec. Design presence as best-effort, batchable, and lazy from day one.
Redis pub/sub is fire-and-forget. If a subscriber is slow or disconnected, messages are lost silently. Use Redis Streams or Kafka for durable delivery. Reserve pub/sub for best-effort features (typing indicators, presence).
Tier your real-time requirements. Chat messages need sub-second push. Leaderboard positions can refresh every 5 seconds. Profile changes can poll every 60 seconds. Different SLAs, different transports, different cost profiles.

The Problem

Users expect live data. But every persistent connection is operational state — memory per socket, file descriptors, health checking, reconnection logic, and graceful drain during deploys. The transport you choose determines your operational ceiling long before it determines your latency floor. Pick wrong and you spend more time managing connections than building features.

Playbooks That Use This Pattern

Real-Time & WebSocket Systems — Connection lifecycle, presence
Chat & Messaging — Message delivery, typing indicators
Feed Generation — Live feed updates, new post notifications
Leaderboard & Counting — Live score updates
Collaborative Editing — Real-time document sync

The Core Tradeoff

Strategy	What Works	What Breaks	Who Pays
WebSockets	True bidirectional, low latency, efficient for high-frequency updates	Sticky sessions, connection draining on deploy, load balancer complexity, no auto-reconnect	Infra team — every deploy is a connection migration event
Server-Sent Events (SSE)	Server-push with auto-reconnect built in, works through HTTP proxies, simple ops	Unidirectional only, limited browser connection cap (~6 per domain on HTTP/1.1)	Nobody, if server-push is all you need
Long Polling	Works everywhere, no special infra, simple client logic	Connection churn, thundering herd on reconnect, wasted server threads holding idle connections	Backend team — thread/connection pool pressure
Short Polling	Operationally trivial, stateless, cacheable	Latency proportional to interval, wasted requests when nothing changes	CDN/API layer — cost scales with poll frequency, not user activity

Staff Default Position

"Every persistent connection is operational state." Staff default: SSE for server-to-client push (simpler ops, built-in reconnect, works through every proxy and load balancer). WebSockets only when bidirectional communication is genuinely required — collaborative editing, gaming, interactive whiteboards. Short polling for low-frequency updates (<1/min) because the operational simplicity outweighs the latency cost.

Before reaching for WebSockets, multiply: connection_count x memory_per_connection x deploy_frequency. If the answer makes you uncomfortable, SSE or polling is the right call.

Fan-out strategy matters more than transport. Fan-out-on-write (push to all subscribers at write time) gives low read latency but amplifies write cost. Fan-out-on-read (pull on demand) is cheaper to write but shifts cost to every reader. Most systems need a hybrid — fan-out-on-write for active users, fan-out-on-read for the rest.

When to Deviate

Bidirectional data flow is real, not speculative. Collaborative editing, multiplayer state sync, and interactive drawing all require the client to push structured data upstream continuously. SSE cannot do this.
Sub-50ms latency is a hard product requirement. Financial tickers, live auctions, competitive gaming. Short polling and SSE add latency floors that matter here.
You already operate sticky-session infrastructure. If your load balancers and deploy pipeline handle connection draining, the operational tax of WebSockets is already paid.
Clients are not browsers. Mobile apps and backend services don't share the browser's 6-connection-per-domain limit, making SSE's main drawback irrelevant.

Common Interview Mistakes

What Candidates Say	What Interviewers Hear	What Staff Engineers Say
"We'll use WebSockets for everything"	"I haven't considered operational cost"	"SSE for push, WebSockets only where bidirectional is required"
"WebSockets are faster than polling"	"I'm comparing transports without considering fan-out"	"Fan-out strategy determines perceived latency more than transport"
"We'll add a reconnection layer"	"I don't know SSE has this built in"	"SSE gives us auto-reconnect and Last-Event-ID replay for free"
"Long polling as a fallback"	"I haven't sized the connection pool impact"	"Short polling at a reasonable interval is cheaper than holding idle long-poll connections"
"We need real-time for all updates"	"I haven't triaged what actually needs sub-second delivery"	"Scores need sub-second push. Profile updates can poll every 60s. Different paths for different SLAs."

Quick Reference

Rendering diagram...

Staff Sentence Templates

Implementation Deep Dive

1. WebSocket Connection Management — Redis Connection Registry

Every WebSocket server in the fleet maintains local connections, but the system needs to know which user is connected to which server. A Redis-backed connection registry solves this.

Connection Registry Pattern

# When a client connects to gateway server
function onConnect(userId, serverId, connectionId):
    # Register connection with TTL (heartbeat-refreshed)
    redis.HSET("conn:" + userId,
        "server", serverId,
        "connId", connectionId,
        "connectedAt", now())
    redis.EXPIRE("conn:" + userId, 90)       # 90s TTL, refreshed by heartbeat

    # Add to server's connection set (for drain enumeration)
    redis.SADD("server:" + serverId + ":connections", userId)

    # Publish presence event
    redis.PUBLISH("presence", serialize({ userId, status: "online" }))

# Heartbeat every 30 seconds
function onHeartbeat(userId):
    redis.EXPIRE("conn:" + userId, 90)        # Refresh TTL

# When a client disconnects
function onDisconnect(userId, serverId):
    redis.DEL("conn:" + userId)
    redis.SREM("server:" + serverId + ":connections", userId)
    redis.PUBLISH("presence", serialize({ userId, status: "offline" }))

Why Redis HASH over plain SET: The hash stores connection metadata (server ID, connection time) alongside registration. When routing a message to a user, you look up the server and send directly — no broadcast to the entire fleet.

Message Routing with Connection Registry

function sendToUser(targetUserId, message):
    conn = redis.HGETALL("conn:" + targetUserId)

    if conn is empty:
        # User offline — queue for later delivery or drop
        messageQueue.enqueue(targetUserId, message)
        return

    targetServer = conn["server"]

    if targetServer == THIS_SERVER:
        # Local delivery — find local WebSocket and send
        localSocket = localConnections.get(targetUserId)
        if localSocket:
            localSocket.send(serialize(message))
    else:
        # Remote delivery — publish to server-specific channel
        redis.PUBLISH("server:" + targetServer + ":inbox", serialize({
            targetUserId, message
        }))

Connection Draining During Deploy

function gracefulDrain(serverId, drainDurationSec=60):
    # Step 1 — Stop accepting new connections
    loadBalancer.removeServer(serverId)

    # Step 2 — Notify all local clients to reconnect elsewhere
    clients = redis.SMEMBERS("server:" + serverId + ":connections")
    for userId in clients:
        socket = localConnections.get(userId)
        if socket:
            socket.send(serialize({ type: "reconnect",
                                    reason: "server_drain",
                                    delay: random(0, drainDurationSec * 1000) }))

    # Step 3 — Wait for clients to reconnect with jitter
    sleep(drainDurationSec)

    # Step 4 — Force-close any remaining connections
    for userId in localConnections.keys():
        localConnections.get(userId).close(1001, "server_shutdown")
        redis.DEL("conn:" + userId)

    redis.DEL("server:" + serverId + ":connections")

Why randomized delay: Without jitter, all clients reconnect to remaining servers simultaneously — a thundering herd that can overwhelm the fleet. Spreading reconnections over 60 seconds keeps the load manageable. For a 200-server fleet with 100K connections each, draining one server means 100K reconnections spread over 60 seconds = ~1,700/sec, which is routine.

2. Server-Sent Events (SSE) — The Simpler Default

SSE is HTTP-based, unidirectional (server-to-client), and has auto-reconnect built into the browser specification. For server-push use cases (notifications, feed updates, live scores), SSE is operationally simpler than WebSockets.

SSE Implementation

# Server (Node.js / Express style)
function handleSSE(request, response):
    response.setHeader("Content-Type", "text/event-stream")
    response.setHeader("Cache-Control", "no-cache")
    response.setHeader("Connection", "keep-alive")

    userId = authenticate(request)

    # Send initial connection event
    response.write("event: connected\ndata: {}\"connected\":true}\n\n")

    # Subscribe to user's event channel
    subscription = eventBus.subscribe("user:" + userId, (event) =>
        # Last-Event-ID enables replay on reconnect
        response.write("id: " + event.id + "\n")
        response.write("event: " + event.type + "\n")
        response.write("data: " + serialize(event.payload) + "\n\n")
    )

    # Handle disconnect
    request.on("close", () =>
        subscription.unsubscribe()
        eventBus.publish("presence", { userId, status: "offline" })
    )

Auto-reconnect: When the connection drops, the browser automatically reconnects and sends the Last-Event-ID header. The server uses this to replay any missed events from a bounded buffer. No client-side reconnection logic needed.

SSE vs WebSocket: Operational Comparison

Concern	SSE	WebSocket
Load balancer	Standard HTTP LB (ALB, nginx)	Requires WebSocket-aware LB or sticky sessions
Auto-reconnect	Browser-native	Must implement manually
Message replay	`Last-Event-ID` header	Custom cursor protocol
Bidirectional	No — use POST for client→server	Yes — native
Connection limit (HTTP/1.1)	6 per domain per browser	No browser limit
Connection limit (HTTP/2)	100+ multiplexed streams	N/A
Deploy drain	Close response → auto-reconnect	Custom drain protocol (see above)
Memory per connection	~8KB (no upgrade overhead)	~24KB (TLS + WebSocket framing + buffers)

3. Fan-Out Strategies — Small vs Large Groups

The fan-out strategy determines system cost more than the transport choice. A message sent to a 5-person group chat is fundamentally different from a message sent to 50K live stream viewers.

Small Group Fan-Out (Direct Push)

# For groups with < 500 members
function fanOutSmallGroup(groupId, message, senderId):
    members = db.query("SELECT user_id FROM group_members WHERE group_id = ?", groupId)

    for memberId in members:
        if memberId == senderId:
            continue                          # Don't echo to sender

        conn = redis.HGETALL("conn:" + memberId)
        if conn:
            routeToServer(conn["server"], memberId, message)
        else:
            # Offline: persist for later delivery
            redis.LPUSH("offline:" + memberId, serialize(message))
            redis.LTRIM("offline:" + memberId, 0, 999)   # Bounded queue

Why direct push for small groups: Querying 500 members and routing individually is O(N) but N is small. No pub/sub overhead, no channel management, no subscription lifecycle. Simple and predictable.

Large Group Fan-Out (Pub/Sub with Subscription Channels)

# For groups with > 500 members (live streams, public channels)
function fanOutLargeGroup(channelId, message):
    # Publish once — all gateway servers with subscribed users receive it
    redis.PUBLISH("channel:" + channelId, serialize(message))

# Each gateway server subscribes on behalf of its local clients
function onClientJoinChannel(userId, channelId, serverId):
    # Track subscription locally
    localSubscriptions.add(userId, channelId)

    # Subscribe to Redis channel if this is the first local subscriber
    if localSubscriptions.countForChannel(channelId) == 1:
        redis.SUBSCRIBE("channel:" + channelId)

function onRedisMessage(channelId, message):
    # Fan out to all local subscribers for this channel
    for userId in localSubscriptions.getUsersForChannel(channelId):
        socket = localConnections.get(userId)
        if socket:
            socket.send(message)

Fan-Out Decision Matrix

Group Size	Strategy	Publish Cost	Read Cost	Operational Complexity
1-1 (DM)	Direct push	O(1) lookup + route	O(1)	Minimal
2-500 (group chat)	Direct push with member list	O(N) lookups	O(1) per member	Low
500-50K (large channel)	Redis pub/sub	O(1) publish	O(N) local fan-out per server	Medium — manage subscriptions
50K+ (broadcast)	Tiered pub/sub with edge servers	O(1) publish to backbone	O(N) tiered fan-out	High — edge server fleet

4. Presence Tracking — Redis SET with TTL and Heartbeat

"Online" status is one of the most expensive features in real-time systems. Naive implementations create O(N) fan-out on every status change.

Presence Implementation

# Presence data in Redis
function setOnline(userId):
    redis.SET("presence:" + userId, "online", EX=45)   # 45s TTL
    # Publish to contacts who are currently online
    contacts = getOnlineContacts(userId)
    for contactId in contacts:
        sendToUser(contactId, { type: "presence", userId, status: "online" })

function heartbeat(userId):
    redis.SET("presence:" + userId, "online", EX=45)    # Refresh TTL
    # No presence publish on heartbeat — only on state change

function setOffline(userId):
    redis.DEL("presence:" + userId)
    contacts = getOnlineContacts(userId)
    for contactId in contacts:
        sendToUser(contactId, { type: "presence", userId, status: "offline" })

function isOnline(userId):
    return redis.EXISTS("presence:" + userId)

function getOnlineContacts(userId):
    contactIds = db.query("SELECT contact_id FROM contacts WHERE user_id = ?", userId)
    pipeline = redis.pipeline()
    for cid in contactIds:
        pipeline.EXISTS("presence:" + cid)
    results = pipeline.execute()
    return [contactIds[i] for i, r in enumerate(results) if r == 1]

Presence Accuracy vs Cost

Approach	Accuracy	Fan-Out Cost	Memory	Use Case
Heartbeat + TTL (45s)	"Online within last 45s"	Per state change	O(online users)	Chat apps — good enough for most
Last-seen timestamp	"Last seen 3 min ago"	Zero fan-out	O(all users)	WhatsApp-style — cheapest
Real-time presence	Instant online/offline	O(contacts) per change	O(online users)	Slack-style — expensive
Lazy presence	On-demand check	Zero proactive fan-out	O(online users)	LinkedIn-style — minimal cost

Architecture Diagram

Rendering diagram...

Data flow: Clients connect to gateway servers via WebSocket or SSE. Gateways register connections in Redis. When the API tier needs to push a message, it looks up the target user's gateway via the connection registry and publishes to that server's channel. The gateway delivers locally. For large groups, a single Redis PUBLISH fans out to all gateways with subscribers.

Failure Scenarios

1. Gateway Server Crash — 25K Connections Lost

Timeline: Gateway server 3 crashes at 14:00:00. 25K WebSocket connections are severed instantly. All 25K clients begin reconnection simultaneously. Remaining 49 gateways receive a thundering herd of 25K new connections within 2 seconds.

Blast radius: The 25K affected users lose ~5-30 seconds of messages (depending on reconnection speed). Remaining gateway servers experience a CPU spike from connection setup overhead (TLS handshake, authentication, subscription setup). If servers are near capacity, the spike can cascade.

Detection: Gateway health check fails. Redis connection registry for the crashed server shows stale entries (TTL has not yet expired). Connected-user count drops by 25K.

Recovery:

Client-side exponential backoff with jitter spreads reconnections over 30 seconds instead of 2 seconds
Redis presence TTL (90s) automatically cleans up stale connection entries — no manual intervention
Missed messages are replayed via Last-Event-ID (SSE) or cursor-based recovery (WebSocket) from the message buffer
N+2 fleet sizing ensures remaining servers have capacity to absorb the reconnections

2. Redis Pub/Sub Backpressure — Message Delivery Stall

Timeline: A popular live stream channel has 50K viewers. The streamer generates 20 messages/sec (chat + reactions). Each message fans out to 50 gateway servers. Redis pub/sub throughput is sufficient, but gateway server 7 has a slow consumer — its local fan-out to 5K connections takes 200ms per message. Redis pub/sub has no backpressure — messages are dropped for slow subscribers.

Blast radius: Users connected to gateway 7 miss messages silently. No error, no retry — Redis pub/sub is fire-and-forget. The users see gaps in the chat stream.

Detection: Per-gateway message delivery rate monitoring. If gateway 7 delivers 15 messages/sec while others deliver 20, it is dropping messages. Client-side sequence number gap detection.

Recovery:

Switch from Redis pub/sub to Redis Streams (XREADGROUP) for durable delivery with consumer acknowledgment
Buffer messages in a per-gateway queue (Redis list) with a consumer that processes at its own pace
If a gateway falls too far behind (>5 seconds of lag), disconnect its clients and let them reconnect to healthier servers
Rate-limit the source — cap chat messages at 10/sec per channel to bound fan-out load

Timeline: 9:00 AM — 500K users log in within a 15-minute window. Each login triggers a presence fan-out to online contacts (average 50 online contacts per user). Total presence messages: 500K users x 50 contacts = 25M presence notifications in 15 minutes = ~28K messages/sec.

Blast radius: Presence messages compete with actual content messages (chat, notifications) for gateway bandwidth. Users experience delayed message delivery because the pipeline is saturated with presence updates.

Detection: Message delivery latency increases. Gateway queue depth grows. Presence message volume spikes relative to content message volume.

Recovery:

Batch presence updates — instead of sending individual "user X is online" events, send a batch update every 5 seconds: "these 200 contacts came online"
Deprioritize presence — route presence through a separate, lower-priority channel so content messages are never delayed
Lazy presence — stop proactively pushing presence; let the client fetch contact status when the user opens their contact list
Throttle login-wave presence — during detected login spikes, temporarily disable proactive presence fan-out and switch to lazy mode

In the Wild

Real-time patterns are easier to internalize when you see how they were applied under real production constraints. These are public, documented examples — not speculation.

Discord: Scaling to Millions of Concurrent Connections

Discord handles millions of concurrent WebSocket connections for voice and text chat. Each gateway server manages ~100K connections, with a fleet of hundreds of servers. Message routing uses a guild-based fan-out: when a user sends a message in a channel, the API server looks up the guild's member list, resolves which gateway servers have connected members, and publishes to those specific servers. The gateway servers then fan out locally to connected users.

The Staff-level insight: Discord's key optimization is lazy guild subscription. A gateway server only subscribes to a guild's message stream when at least one of its connected users is in that guild. When the last local member disconnects, it unsubscribes. This prevents every gateway server from receiving every message in every guild — the fan-out scales with the overlap between connected users and guild membership, not with total guild count. For a guild with 500K members but only 2K online across 20 gateway servers, only those 20 servers receive messages.

Slack: Hybrid Connection Architecture

Slack uses WebSockets for desktop and mobile clients but falls back to a combination of HTTP/2 push and polling for constrained environments. Their connection management separates the gateway tier (which holds connections) from the messaging tier (which processes messages and routes them). The gateway tier is stateless with respect to business logic — it only manages socket state and forwards messages bidirectionally between clients and the messaging tier.

The Staff-level insight: Slack's architecture demonstrates that the gateway tier should be as thin as possible. The gateway knows about connections and routing, but not about channels, permissions, or message formatting. This separation means the gateway tier can scale independently of business logic — adding gateway servers to handle more connections does not require scaling the messaging tier, and vice versa. It also means gateway deployments (which require connection draining) do not require redeploying business logic.

Figma: Operational Transforms over WebSockets

Figma's collaborative design tool requires true bidirectional real-time: every cursor movement, shape resize, and text edit from every collaborator must be visible to all others within 100ms. They use WebSockets with operational transformation (OT) — each edit is transformed against concurrent edits before being applied, ensuring all clients converge to the same document state regardless of operation arrival order.

The Staff-level insight: Figma's architecture reveals why WebSockets are justified for collaborative editing but not for most other use cases. The client must push structured data upstream (edit operations) at high frequency (potentially dozens of operations per second per user), and the server must transform and broadcast operations in strict causal order. SSE cannot handle the upstream data flow, and polling would add unacceptable latency to the collaboration loop. The bidirectional requirement is genuine, not speculative.

Practice Drill

Staff-Caliber Answer Shape

Expand

Size the connection pool. 10M DAU, but not all are online simultaneously. Peak concurrent: ~2M users (20% of DAU). At 50K connections per gateway server, that is 40 gateway servers. Memory budget: 2M × 24KB per connection = ~48GB total across the fleet.
Choose the transport. Notifications are server-to-client only — no bidirectional data flow. SSE is the right choice. Auto-reconnect handles mobile network flaps. Last-Event-ID gives us replay on reconnect. Works through standard HTTP load balancers — no sticky sessions needed.
Design the notification pipeline. When a user likes a post: the API server writes the notification to the database, then publishes to a Kafka topic partitioned by recipient_user_id. A notification delivery service consumes from Kafka, looks up the recipient's gateway server via the Redis connection registry, and pushes via the server-specific channel.
Handle the fan-out. New follower on a celebrity account (10M followers): only the celebrity receives the notification, not all followers — this is 1:1 delivery, not fan-out. Likes and comments: also 1:1 (only the post owner is notified). Direct messages: 1:1. Fan-out is bounded to O(1) per event for all notification types — this is fundamentally different from a chat system.
Handle offline users. If the recipient is not connected, the notification service writes to a Redis list (offline:{userId}), bounded to the last 100 notifications. On reconnect, the SSE endpoint replays the offline queue before switching to live events.
Address the 2-second SLA. End-to-end path: API server → Kafka publish (~5ms) → notification service consume (~10ms) → Redis lookup + gateway route (~2ms) → SSE push to client (~1ms). Total: ~18ms — well within 2 seconds. The bottleneck is Kafka consumer lag during traffic spikes, not the delivery path.

The Staff move: Show that this is not a fan-out problem — every notification is point-to-point delivery. The complexity is in reliable delivery to potentially-offline users, not in scaling real-time connections.

Staff Interview Application

How to Introduce This Pattern

Lead with the transport decision framework, not a technology choice. Then immediately address fan-out: "The transport is the easy part. The harder question is fan-out strategy — how do we route a message to the right user on the right server without broadcasting to the entire fleet."

When NOT to Use This Pattern

Update frequency < 1/minute: Short polling is operationally simpler and the latency difference is invisible to users. A 30-second poll interval means average staleness of 15 seconds — acceptable for dashboards, leaderboards, and profile updates.
No live audience: If users are not actively watching (email summaries, batch reports), push infrastructure is wasted. Queue the updates and deliver on next session.
Single-server system: If you have one application server with <1K concurrent users, in-process event emitters suffice. No Redis, no pub/sub, no connection registry. Don't build fleet infrastructure for a single-node system.
Data is cacheable and shared: If all users see the same data (stock ticker, weather, sports scores), a CDN with short TTL and client-side polling is cheaper than maintaining 100K persistent connections.

Follow-Up Questions to Anticipate

Interviewer Asks	What They Are Testing	How to Respond
"Why not WebSockets for everything?"	Operational cost awareness	"Every WebSocket connection is operational state — memory, file descriptors, drain on deploy. SSE gives us server-push with auto-reconnect and standard HTTP infrastructure. I use WebSockets only when bidirectional flow is genuinely required."
"How do you handle message ordering?"	Distributed systems fundamentals	"Per-user ordering via the connection registry — messages to a user route to one gateway. Cross-user ordering (group chat) uses a sequence number assigned at the application tier before fan-out."
"What about mobile clients on flaky networks?"	Reliability engineering	"Cursor-based recovery: each message has a monotonic ID. On reconnect, the client sends its last-seen ID and the server replays from there. This handles both network flaps and server-side connection migrations."
"How do you deploy without dropping messages?"	Operational maturity	"Graceful drain: stop accepting new connections, send reconnect-with-jitter to existing clients, wait for drain window, then shut down. Clients reconnect to other servers and replay missed messages via cursor."
"How do you scale to 10M connections?"	Architecture scaling	"Horizontal gateway fleet. At 50K connections/server, that is 200 servers. The connection registry in Redis handles routing. The bottleneck shifts to fan-out — large channels need tiered pub/sub with regional edge servers."

Why This Matters

The 60-Second Version

The Problem

Playbooks That Use This Pattern

The Core Tradeoff

Staff Default Position

When to Deviate

Common Interview Mistakes

Quick Reference

Staff Sentence Templates

Implementation Deep Dive

1. WebSocket Connection Management — Redis Connection Registry

Connection Registry Pattern

Message Routing with Connection Registry

Connection Draining During Deploy

2. Server-Sent Events (SSE) — The Simpler Default

SSE Implementation

SSE vs WebSocket: Operational Comparison

3. Fan-Out Strategies — Small vs Large Groups

Small Group Fan-Out (Direct Push)

Large Group Fan-Out (Pub/Sub with Subscription Channels)

Fan-Out Decision Matrix

4. Presence Tracking — Redis SET with TTL and Heartbeat

Presence Implementation

Presence Accuracy vs Cost

Architecture Diagram

Failure Scenarios

1. Gateway Server Crash — 25K Connections Lost

2. Redis Pub/Sub Backpressure — Message Delivery Stall

3. Presence Storm During Morning Login Wave

In the Wild

Discord: Scaling to Millions of Concurrent Connections

Slack: Hybrid Connection Architecture

Figma: Operational Transforms over WebSockets

Practice Drill

Staff Interview Application

How to Introduce This Pattern

When NOT to Use This Pattern

Follow-Up Questions to Anticipate