Caching Fundamentals — Staff Interview Quick Reference
The 60-Second Version
- Cache = consistency debt. Every cached value is a promise that staleness is acceptable for this use case.
- Three core strategies: cache-aside (app manages reads), write-through (sync to cache + DB), write-behind (async DB writes, durability risk).
- TTL is not a technical constant — it is business staleness tolerance expressed in seconds. If product cannot define acceptable staleness, you cannot set a TTL.
- Invalidation is the hard problem. TTL-based is simple but stale. Event-based is fresh but operationally complex. Manual is a last resort that becomes everyone's first choice under pressure.
- Thundering herd on cold start or mass expiry will turn a cache miss into a database incident. Plan for it before it happens.
- Cache penetration (non-existent keys bypassing cache) is a quiet cost that shows up as unexplained DB load, not as cache errors.
- A cache with a 90% hit rate is not "almost there" — it may be sending 10x the expected load to your database.
What Staff Engineers Say (That Seniors Don't)
| Concept | Senior Response | Staff Response |
|---|---|---|
| Strategy choice | "We use Redis as a cache" | "We use cache-aside here because eventual staleness up to 30s is acceptable and write volume doesn't justify write-through overhead" |
| TTL selection | "Set TTL to 5 minutes" | "Product tolerates 60s stale for feed ranking; TTL is 45s with jitter to avoid synchronized expiry storms" |
| Invalidation | "Invalidate on write" | "Event-driven invalidation via CDC stream, with TTL as a consistency backstop, not the primary mechanism" |
| Failure mode | "Cache miss falls through to DB" | "Under cache failure we shed load with request coalescing and circuit-break to degraded responses rather than stampeding the database" |
| Hit rate | "Our hit rate is 95%" | "Hit rate is 95% aggregate but 70% for long-tail queries — that tail drives 40% of DB read load, so we added negative caching with short TTL" |
The Numbers That Matter
- Redis: ~100K ops/s single-threaded, sub-ms p99 latency typical
- Memcached: ~500K ops/s multithreaded, sub-ms p99 latency typical
- Healthy hit rate: 95%+ aggregate; investigate any key space segment below 85%
- Thundering herd threshold: a single popular key expiring under 1K+ RPS triggers hundreds of redundant DB reads — singleflight / request coalescing reduces this to one
- Negative cache TTL: 30-120s is typical; long enough to absorb bursts, short enough to not mask real data arrival
Common Interview Traps
- Naming a strategy without justifying the tradeoff. "Cache-aside" is not an answer. Why cache-aside and not write-through? What staleness does the business accept?
- Ignoring cache failure as a mode. Caches fail. If your design falls over when the cache is cold or unavailable, you have a single point of failure you called an "optimization."
- Treating TTL as the invalidation strategy. TTL is a safety net. If TTL is your only invalidation mechanism, you have chosen to serve stale data for the duration of every TTL window and you should say so explicitly.
- Overlooking cache penetration. Candidates almost always address thundering herd. They rarely address the steady-state cost of requests for keys that will never exist. Bloom filters or negative caching are the standard mitigations.
Cache Strategy Decision Tree
Rendering diagram...
Advanced Patterns
| Pattern | How It Works | When to Use |
|---|---|---|
| Singleflight | Concurrent misses for the same key collapse into one DB query | Popular keys under >100 RPS |
| Negative caching | Cache the fact that a key doesn't exist (short TTL) | 10%+ lookups for non-existent keys |
| Read-through | Cache itself fetches from DB on miss | Uniform miss handling across all consumers |
| Cache warming | Pre-populate before traffic arrives | Deployments, new regions, predictable spikes |
| Multi-tier | L1 (in-process) → L2 (Redis) → DB | When L2 network hop is too costly for hot keys |
| Stampede lock | On miss, one request acquires lock; others wait or get stale | Expensive DB queries (>100ms) on popular keys |
Hit Rate Segmentation
A 95% aggregate hit rate hides the real story:
| Key Segment | Hit Rate | Query Share | DB Load Share |
|---|---|---|---|
| Popular (top 1K) | 99.5% | 60% | 3% |
| Mid-tail (next 100K) | 92% | 30% | 28% |
| Long-tail (rest) | 40% | 10% | 69% |
Staff insight: Aggregate hit rate is a vanity metric. The long tail generates the majority of database load despite being a minority of traffic.
Practice Prompt
Staff-Caliber Answer ShapeExpand
- Audit current usage. What's the TTL distribution? Are there keys with TTLs >24h that could be shortened? Stale data from deprecated features?
- Analyze key-size distribution. Run
redis-cli --bigkeys. Often 5% of keys consume 50% of memory. Compress large values or move them to a dedicated store. - Evaluate eviction policy.
allkeys-lruself-manages.noeviction(common mistake) means writes fail when full. - Consider tiering. Move cold keys to a cheaper store. Keep only the hot working set in Redis.
- Scale horizontally. Add Redis Cluster shards rather than resizing the instance.
The Staff move: Don't just ask for more memory. Show you've exhausted optimization first: "After TTL tuning and compression, we need 15% more capacity, not 30%."
Additional Traps
- Using cache as a primary data store. Redis with no persistence is a speed layer. If it dies and data is gone, you had a SPOF you called "caching."
- TTL jitter neglect. Same TTL on all batch keys = simultaneous expiry. Add ±10% random jitter.
- Ignoring serialization cost. JSON serialization of complex objects can take 1-5ms. Consider protobuf or msgpack for hot paths.
- Cache key collision. Using
user:{id}for different endpoints caching different projections leads to wrong data. Include query shape in the key.
Where This Appears
- Distributed Caching — Cache topologies, invalidation
- CDN & Edge Caching — Edge cache TTL ownership
- Feed Generation — Pre-computed feed caching
- Search & Indexing — Query result caching