Why This Matters
Contention is the reason distributed systems are hard. Not networking, not storage, not latency — contention. Two writers targeting the same resource at the same instant is the root cause of every data corruption bug, every double-charge, every oversold inventory, and every lost update in production. The moment your system has more than one concurrent writer, you have a contention problem — whether you acknowledge it or not.
Most candidates treat contention as a locking problem. Staff engineers treat it as a data modeling problem. The best concurrency control strategy is the one you never need: reshape the data model so writers do not collide. Shard the counter. Partition the inventory. Pre-allocate the capacity. When avoidance is impossible, the choice between optimistic, pessimistic, and serialized strategies is not a preference — it is a function of the measured conflict rate, the latency SLA, and the business cost of a lost update.
If you can walk an interviewer through the contention spectrum — avoidance first, then CAS for low conflict, queues for high conflict, and locks only when external calls demand mutual exclusion — you demonstrate the systems reasoning that Staff interviews demand.
The 60-Second Version
- Contention avoidance beats contention resolution. Before picking a locking strategy, ask: can I reshape the data model so these writers don't collide? Shard the counter. Partition by region. Pre-allocate inventory blocks.
- CAS (optimistic) works below ~5% conflict rate. Above that, retry storms cascade — each retry itself can conflict. At 20% conflict rate, you spend more time retrying than working. Track
cas.retryper key and switch strategies when it crosses your threshold. - Queue serialization trades latency for correctness. Every mutation goes through a single-partition topic; the consumer processes them one at a time. Throughput ceiling is ~50-100K ops/sec per partition. Latency is "accepted" vs "committed" — the caller must tolerate the gap.
- Pessimistic locks are the last resort, not the first. They block all concurrent access — even non-conflicting requests. Use only when the critical section spans external service calls or when you need mutual exclusion across distributed processes.
- Sharded counters distribute write contention across N sub-counters. Writes are O(1) and contention-free. Reads aggregate across N shards — O(N) per read, eventually consistent. 16 shards is a good default.
- All distributed locks — including Redlock — have partial failure modes. Clock skew, GC pauses, and network partitions can cause two clients to believe they hold the same lock. This is true of Redlock, ZooKeeper leases, and etcd locks alike (Kleppmann's analysis applies broadly, not just to Redis). Prefer queue serialization for safety-critical operations; use distributed locks only when mutual exclusion is best-effort.
The Problem
Multiple writers targeting the same resource at the same time. Two users booking the last seat. Ten thousand carts decrementing the same inventory counter. Contention is not a bug — it's a design constraint. The question isn't whether it happens, but whether you prevent it (avoidance), detect and resolve it (optimistic), or serialize it (pessimistic).
Playbooks That Use This Pattern
- Reservation & Inventory Systems — Seat/room contention under concurrent booking
- Flash Sales & Virtual Queues — Extreme burst contention on limited inventory
- Order Matching & Exchanges — Order book contention at matching time
- Payment Processing — Balance contention during concurrent transactions
- Distributed Consensus — Leader election contention
The Core Tradeoff
| Strategy | What Works | What Breaks | Who Pays |
|---|---|---|---|
| Pessimistic locking | Simple correctness, easy to reason about | Throughput ceiling, deadlock risk, lock convoy | Every request waits — even non-conflicting ones |
| Optimistic concurrency (CAS) | High throughput under low contention | Retry storms when contention spikes | Losers retry — cost grows with conflict rate |
| Queue serialization | Clean ordering, no lost updates | Throughput bound by single consumer | Latency — every request waits in line |
| Reservation / hold pattern | Business-level TTL lock, graceful expiry | Complexity in hold cleanup, phantom inventory | Operational cost of expiry reconciliation |
| Sharded counters | Distributes write contention across N shards | Reads require aggregation, eventual consistency on totals | Read latency and complexity to get accurate counts |
Staff Default Position
Contention avoidance > contention resolution. Before picking a locking strategy, ask: can I reshape the data model so these writers don't collide?
Shard by user. Partition by region. Pre-allocate inventory blocks. Move from a single global counter to per-shard counters that reconcile asynchronously. If contention is truly unavoidable, serialize via queue rather than lock — queues give you backpressure and ordering; locks give you deadlocks and convoys.
When to Deviate
- Low-contention, high-value writes — Optimistic concurrency with CAS is the right call. Retry cost is negligible when conflicts are rare.
- Sub-millisecond latency requirements — Queue serialization adds latency. A sharded lock or CAS may be the only option that meets SLOs.
- Strict global ordering — If the business requires total order (e.g., exchange matching), a single serialization point is the honest answer. Acknowledge the throughput ceiling and design around it.
- User-facing holds — Reservation/hold pattern is the right abstraction when the business already thinks in terms of "hold this for me while I decide." Don't fight the domain model.
Common Interview Mistakes
| What Candidates Say | What Interviewers Hear | What Staff Engineers Say |
|---|---|---|
| "We'll use distributed locks" | "I haven't thought about failure modes" | "Distributed locks have partial failure semantics — I'd prefer CAS or queue serialization unless we need mutual exclusion across services" |
| "Optimistic locking solves this" | "I haven't considered high-contention scenarios" | "Optimistic concurrency works at low contention. Past a threshold, retry storms make it worse than serialization" |
| "Just use a database transaction" | "I'm hoping the database handles it" | "I need to name the isolation level and understand what anomalies it permits under this workload" |
| "We'll lock the row" | "Single-strategy thinking" | "Before locking, can I shard the data so these writes don't compete? Contention avoidance beats contention resolution" |
| "Redis SETNX for the lock" | "No TTL, no fencing, no plan for crashes" | "If we must lock, we need TTL, fencing tokens, and a clear owner-crash recovery path" |
Quick Reference
Staff Sentence Templates
Implementation Deep Dive
1. Optimistic Concurrency with CAS — Redis WATCH/MULTI/EXEC
Optimistic concurrency assumes conflicts are rare. Instead of acquiring a lock before the operation, you attempt the operation and detect if another writer modified the data between your read and write. If so, you retry. Redis provides this natively via WATCH.
Redis CAS Pattern
function decrementInventory(itemId):
maxRetries = 5
for attempt in range(maxRetries):
redis.WATCH("inventory:" + itemId) # Mark key for monitoring
current = int(redis.GET("inventory:" + itemId))
if current <= 0:
redis.UNWATCH()
return { success: false, reason: "out_of_stock" }
# MULTI/EXEC is atomic only if WATCH'd key unchanged
tx = redis.MULTI()
tx.SET("inventory:" + itemId, current - 1)
result = tx.EXEC()
if result != null: # Transaction succeeded
return { success: true, remaining: current - 1 }
# result == null means another writer changed the key — retry
metrics.increment("cas.retry", tags=["item:" + itemId])
return { success: false, reason: "contention_exceeded" }
How it works: WATCH tells Redis to track a key. If any client modifies that key between WATCH and EXEC, the transaction aborts (returns null). The client retries with fresh data. This is lock-free — no other client is blocked.
When CAS Breaks Down
| Conflict Rate | Avg Retries/Request | Effective Throughput | Verdict |
|---|---|---|---|
| 1% | 1.01 | ~99% | CAS is ideal |
| 5% | 1.05 | ~95% | CAS works well |
| 10% | 1.11 | ~85% | Monitor closely |
| 20% | 1.25 | ~60% | Switch to serialization |
| 50% | 2.0 | ~30% | CAS is actively harmful |
2. Pessimistic Locking — PostgreSQL Advisory Locks
When correctness requires mutual exclusion and you cannot tolerate retries, pessimistic locking is the honest choice. PostgreSQL advisory locks are application-level locks that don't interfere with row-level locking.
Advisory Lock Pattern
function processPayment(accountId, amount):
lockKey = hashToInt64("payment:" + accountId) # Advisory locks use bigint keys
# Try to acquire lock with timeout (non-blocking variant)
acquired = db.query("SELECT pg_try_advisory_lock($1)", lockKey)
if not acquired:
return { success: false, reason: "concurrent_payment_in_progress" }
try:
balance = db.query("SELECT balance FROM accounts WHERE id = $1 FOR UPDATE", accountId)
if balance < amount:
return { success: false, reason: "insufficient_funds" }
db.query("UPDATE accounts SET balance = balance - $1 WHERE id = $2", amount, accountId)
db.query("INSERT INTO transactions (account_id, amount, type) VALUES ($1, $2, 'debit')",
accountId, amount)
db.commit()
return { success: true, newBalance: balance - amount }
finally:
db.query("SELECT pg_advisory_unlock($1)", lockKey) # Always release
Why advisory locks over SELECT ... FOR UPDATE: Advisory locks span transactions. A FOR UPDATE lock releases at commit, but an advisory lock can be held across multiple queries, API calls, or even external service calls. Use FOR UPDATE for single-transaction correctness; use advisory locks when the critical section spans more than one transaction.
Deadlock Prevention Rules
| Rule | Implementation | Cost of Violation |
|---|---|---|
| Lock ordering | Always acquire locks in ascending key order | Deadlock requiring timeout + retry |
| Lock timeout | Set lock_timeout = 5s at session level | Indefinite hang consuming a connection |
| Lock scope minimization | Hold lock only during the critical section, not the whole request | Throughput degradation proportional to non-critical work |
| Lock monitoring | Track pg_stat_activity for wait_event = 'advisory' | Silent throughput cliff with no alert |
3. Queue Serialization — Kafka Single-Partition Ordering
When contention is high and correctness requires ordering, serialization through a queue eliminates all concurrency on the contended resource. Every mutation goes through a single-partition topic; the consumer processes them one at a time.
Kafka Serialization Pattern
# Producer: route all mutations for a resource to the same partition
function submitOrderBookUpdate(orderId, symbol, action):
producer.send(
topic = "order-mutations",
key = symbol, # All orders for AAPL go to one partition
value = serialize({ orderId, symbol, action, timestamp: now() }),
acks = "all"
)
return { status: "accepted", position: "queued" }
# Consumer: single-threaded processing per partition
function orderMutationConsumer(partition):
for record in consumer.poll():
mutation = deserialize(record.value)
# Process in strict order — no concurrent access to order book
match mutation.action:
case "place":
orderBook.place(mutation)
case "cancel":
orderBook.cancel(mutation.orderId)
case "match":
orderBook.executeMatch(mutation)
consumer.commitOffset(record)
Why this works: Kafka guarantees ordering within a partition. By routing all mutations for the same resource (e.g., stock symbol) to the same partition, you get total ordering without locks, without retries, and without CAS failures.
Throughput Ceiling and Escape Hatch
A single Kafka partition processes ~50K-100K messages/sec with a single consumer. That is the hard ceiling per contention domain.
To scale beyond this:
- Partition by sub-resource — instead of one partition for all AAPL orders, partition by price band:
AAPL:100-110,AAPL:110-120. Cross-partition matching requires a coordination step. - Batch processing — the consumer processes 100 mutations in a single database transaction, amortizing commit overhead.
- Accept eventual consistency — if strict ordering is not required for all operations (e.g., cancel does not need to be ordered relative to place), split into separate topics with different ordering guarantees.
4. Sharded Counters — Distributed Increment with Periodic Aggregation
When thousands of writes target a single counter (likes, views, inventory), shard the counter across N sub-counters. Each writer increments a random shard; reads aggregate across all shards.
Redis Sharded Counter
SHARD_COUNT = 16
function incrementCounter(entityId):
shard = random(0, SHARD_COUNT - 1)
key = "counter:" + entityId + ":shard:" + shard
redis.INCR(key)
function getCounter(entityId):
keys = ["counter:" + entityId + ":shard:" + i for i in range(SHARD_COUNT)]
values = redis.MGET(keys)
return sum(int(v or 0) for v in values)
# Periodic aggregation (background job every 30s)
function aggregateCounter(entityId):
total = getCounter(entityId)
redis.SET("counter:" + entityId + ":total", total)
# Optionally reset shards after aggregation for bounded memory
The tradeoff: Writes hit random shards — no contention, O(1) per increment. Reads aggregate across N shards — O(N) per read, eventually consistent with the true count. The :total key provides a cached aggregate for fast reads, refreshed every 30 seconds.
When to Aggregate vs. Read All Shards
| Use Case | Strategy | Latency | Consistency |
|---|---|---|---|
| Display "1.2M likes" on a post | Read :total (cached) | <1ms | ~30s stale |
| Check if inventory > 0 for checkout | Read all shards live | ~2ms for 16 shards | Real-time |
| Analytics dashboard counts | Read :total (cached) | <1ms | ~30s stale |
| Flash sale "X remaining" display | Read all shards live | ~2ms | Real-time |
Architecture Diagram
Selection logic: The application tier selects the contention strategy based on the resource type and measured conflict rate. CAS for low-contention key-value updates. Advisory locks for payment-critical mutual exclusion. Queue for order book mutations where ordering matters. Sharded counters for high-volume increment workloads.
Failure Scenarios
1. CAS Retry Storm During Flash Sale
Timeline: Flash sale begins at 12:00:00. 50K users attempt to decrement inventory of 1,000 items simultaneously. CAS conflict rate exceeds 80%. Each request retries 3-5 times before succeeding or exhausting retries. Effective load is 200K+ Redis operations in 10 seconds.
Blast radius: Redis CPU saturates at 90%. All CAS operations — not just inventory — experience elevated latency. Unrelated cache lookups sharing the same Redis cluster degrade. App server thread pools fill with retrying requests.
Detection: cas.retry rate per key spikes beyond configured threshold. Redis CPU utilization exceeds 80%. App server thread pool utilization exceeds 90%.
Recovery:
- Immediate: switch inventory operations from CAS to queue serialization (feature flag)
- Short-term: pre-shard inventory before the sale — allocate 100 units to each of 10 shards so writers do not collide
- Long-term: implement a virtual queue system that serializes access before reaching the inventory layer
2. Advisory Lock Leak After Application Crash
Timeline: An app server acquires pg_advisory_lock(42) for a payment operation. The server crashes mid-transaction. The PostgreSQL connection is severed. The advisory lock is released automatically when the session ends — but session-level locks acquired with pg_advisory_lock persist until explicit unlock or session disconnect. If connection pooling reuses the session, the lock is never released.
Blast radius: All subsequent payment operations for the same account wait indefinitely (if using blocking locks) or fail immediately (if using non-blocking). One crashed request blocks all future requests for that resource.
Detection: Query pg_locks WHERE locktype = 'advisory' and correlate with active PIDs. Alert when advisory lock hold time exceeds 30 seconds (normal payment processing completes in <5 seconds).
Recovery:
- Immediate: identify the leaked lock via
pg_locks, terminate the orphaned session withpg_terminate_backend(pid) - Short-term: switch to transaction-level advisory locks (
pg_advisory_xact_lock) which auto-release on transaction end — even if the application forgets to unlock - Long-term: wrap all advisory lock usage in a library that enforces timeout and automatic cleanup
3. Queue Consumer Lag Creating Phantom Inventory
Timeline: Order mutations are serialized through Kafka. The consumer falls behind — lag grows to 50K messages. During the lag window, the database shows 200 units of inventory, but 180 of those have pending decrements in the queue. The website displays "200 available" while only 20 are actually available.
Blast radius: Customers purchase items that are already sold. The system must handle 180 over-commitments — refunds, apology emails, and trust damage.
Detection: Consumer group lag monitoring. Threshold alert when lag exceeds N seconds of processing time (not just message count — a 10K message lag at 100K messages/sec throughput is only 100ms, which is fine).
Recovery:
- Immediate: pause accepting new orders until consumer catches up
- Short-term: add a "pessimistic available" counter that is decremented at write time (producer side) — may undercount but never overcount
- Long-term: maintain a real-time shadow counter that is decremented synchronously at order acceptance time, independently of the queue. The queue handles order processing, not inventory accounting.
In the Wild
Contention patterns are easier to internalize when you see how they were solved under real production pressure. These are public, documented examples — not speculation.
Ticketmaster: Virtual Queues for Extreme Burst Contention
When Taylor Swift tickets go on sale, millions of users compete for thousands of seats simultaneously. Ticketmaster's original architecture — direct database access with row-level locks — collapsed under this load. Their solution was a virtual queue: users enter a FIFO queue before reaching the inventory layer. The queue serializes access so that only N users at a time can interact with the seat map. Everyone else sees their queue position and estimated wait time.
The Staff-level insight: Ticketmaster separated the contention problem into two layers. The queue itself handles the burst (millions of concurrent requests) using stateless queue position tracking that scales horizontally. The inventory layer handles the contention (seat selection with mutual exclusion) at a manageable rate — hundreds of concurrent seat selections, not millions. By rate-limiting access to the contended resource, they turned an impossible contention problem into a manageable one.
Stripe: Idempotency Keys for Payment Contention
Payment processing has an inherent contention problem: a user clicks "Pay" twice, a webhook retries, or a network timeout triggers a client retry. Without protection, the same payment executes multiple times. Stripe's solution is idempotency keys: every payment request includes a client-generated key. The server stores the key and its result. If the same key appears again, the server returns the cached result without re-executing the payment.
The Staff-level insight: Stripe turned a contention problem into a deduplication problem. Instead of locking the account balance during payment processing, they detect duplicate intent at the API boundary. The idempotency key acts as a "reservation" for the operation — the first request claims it, and all subsequent requests with the same key are no-ops. This is contention avoidance through API design, not through locking.
Reddit: Sharded Vote Counters at Scale
Reddit's vote system handles tens of thousands of votes per second on popular posts during viral events. A single atomic counter would be a massive contention bottleneck. Their approach: sharded counters where each vote increments a random shard. The displayed count is a periodic aggregation of all shards, updated every few seconds.
The Staff-level insight: Reddit accepts that the displayed vote count is approximate (stale by a few seconds) in exchange for unlimited write throughput. This is a business-level tradeoff: users don't notice if "15,234 upvotes" is actually "15,247 upvotes" two seconds later. But they absolutely notice if the upvote button takes 2 seconds to respond because of lock contention. The sharded counter trades read precision for write speed — the right tradeoff for a social platform.
Practice Drill
Staff-Caliber Answer ShapeExpand
-
Quantify the contention. 500K users, 10K items. If all 500K users click "Buy" at 12:00:00, that is 500K concurrent writes to the inventory counter. A single Redis CAS at 80% conflict rate would generate ~2.5M retry operations in the first second. CAS is the wrong strategy here.
-
Layer 1 — Virtual queue (contention avoidance). Before users reach the inventory, put them in a virtual queue. Rate-limit to 1,000 users entering the purchase flow per second. 500K users × 1K/sec throughput = ~8 minutes to process the queue. Users see their position and estimated wait time.
-
Layer 2 — Sharded inventory (distributed contention). Pre-shard 10K units across 100 shards (100 units per shard). Users are assigned to random shards. Each shard handles ~10 concurrent decrements — well within CAS tolerance (<5% conflict rate per shard).
-
Layer 3 — Reservation with TTL (business-level lock). When a user enters checkout, reserve their item for 5 minutes (TTL). If they don't complete payment, the item returns to the shard. This prevents phantom inventory from abandoned carts.
-
Handle the edge cases. What if a shard runs out? Route to the next shard with available inventory (round-robin fallback). What if the queue is longer than the inventory? Show "Sold Out" when aggregate inventory hits 0 — no need for exact per-shard accounting for this display.
-
State what you rejected. Not pessimistic locks — they would create a queue inside the database, defeating the purpose of the application-level queue. Not a single CAS counter — conflict rate at 500K concurrent users would be catastrophic.
The Staff move: Show the layered approach: queue to manage burst, sharding to distribute contention, reservations to handle business logic. Each layer addresses a different aspect of the contention problem.
Staff Interview Application
How to Introduce This Pattern
Lead with data model first, concurrency control second. This signals that you think in systems, not in locks.
When NOT to Use This Pattern
- No concurrent writers exist: If updates are single-threaded (batch jobs, cron tasks), adding concurrency control is pure overhead.
- Idempotent operations: If the operation is naturally idempotent (SET, not INCREMENT), concurrent writes produce the same result regardless of ordering. Last-writer-wins may be acceptable.
- Read-heavy workload: If the contention is on reads (thundering herd on cache miss), this pattern does not apply — look at Scaling Reads instead.
- Single-digit QPS: If you have 5 writes/sec to the same resource, a simple
FOR UPDATErow lock is the right answer. Don't over-engineer.
Follow-Up Questions to Anticipate
| Interviewer Asks | What They Are Testing | How to Respond |
|---|---|---|
| "What happens under high contention with CAS?" | Understanding of failure modes | "Retry storms. At 20%+ conflict rate, retries compound — each retry itself may conflict. I'd set a retry budget of 3 attempts and fail fast, or switch to queue serialization." |
| "How do you handle deadlocks?" | Practical lock management | "Lock ordering prevents deadlocks: always acquire locks in ascending key order. For detection, set lock_timeout so deadlocked transactions fail instead of hanging." |
| "How do sharded counters handle reads?" | Read-consistency tradeoffs | "Aggregate across all shards for real-time reads, or read a periodically-updated total for approximate reads. The choice depends on whether the consumer needs real-time accuracy or display-quality approximation." |
| "What is a fencing token?" | Distributed systems depth | "A monotonically increasing value assigned when a lock is acquired. The protected resource rejects operations with stale fencing tokens. This prevents split-brain where two processes both believe they hold the lock." |
| "Why not use distributed locks with Redis?" | Understanding of distributed lock limitations | "All distributed locks — including Redis Redlock — have partial failure modes. Clock skew, GC pauses, and network partitions can cause two clients to believe they hold the same lock. This isn't unique to Redis; ZooKeeper and etcd leases have similar edge cases. I'd prefer queue serialization for safety-critical operations and reserve distributed locks for best-effort mutual exclusion." |