Caching Fundamentals — Staff Interview Quick Reference

The 60-Second Version

Cache = consistency debt. Every cached value is a promise that staleness is acceptable for this use case.
Three core strategies: cache-aside (app manages reads), write-through (sync to cache + DB), write-behind (async DB writes, durability risk).
TTL is not a technical constant — it is business staleness tolerance expressed in seconds. If product cannot define acceptable staleness, you cannot set a TTL.
Invalidation is the hard problem. TTL-based is simple but stale. Event-based is fresh but operationally complex. Manual is a last resort that becomes everyone's first choice under pressure.
Thundering herd on cold start or mass expiry will turn a cache miss into a database incident. Plan for it before it happens.
Cache penetration (non-existent keys bypassing cache) is a quiet cost that shows up as unexplained DB load, not as cache errors.
A cache with a 90% hit rate is not "almost there" — it may be sending 10x the expected load to your database.

What Staff Engineers Say (That Seniors Don't)

Concept	Senior Response	Staff Response
Strategy choice	"We use Redis as a cache"	"We use cache-aside here because eventual staleness up to 30s is acceptable and write volume doesn't justify write-through overhead"
TTL selection	"Set TTL to 5 minutes"	"Product tolerates 60s stale for feed ranking; TTL is 45s with jitter to avoid synchronized expiry storms"
Invalidation	"Invalidate on write"	"Event-driven invalidation via CDC stream, with TTL as a consistency backstop, not the primary mechanism"
Failure mode	"Cache miss falls through to DB"	"Under cache failure we shed load with request coalescing and circuit-break to degraded responses rather than stampeding the database"
Hit rate	"Our hit rate is 95%"	"Hit rate is 95% aggregate but 70% for long-tail queries — that tail drives 40% of DB read load, so we added negative caching with short TTL"

The Numbers That Matter

Redis: ~100K ops/s single-threaded, sub-ms p99 latency typical
Memcached: ~500K ops/s multithreaded, sub-ms p99 latency typical
Healthy hit rate: 95%+ aggregate; investigate any key space segment below 85%
Thundering herd threshold: a single popular key expiring under 1K+ RPS triggers hundreds of redundant DB reads — singleflight / request coalescing reduces this to one
Negative cache TTL: 30-120s is typical; long enough to absorb bursts, short enough to not mask real data arrival

Common Interview Traps

Naming a strategy without justifying the tradeoff. "Cache-aside" is not an answer. Why cache-aside and not write-through? What staleness does the business accept?
Ignoring cache failure as a mode. Caches fail. If your design falls over when the cache is cold or unavailable, you have a single point of failure you called an "optimization."
Treating TTL as the invalidation strategy. TTL is a safety net. If TTL is your only invalidation mechanism, you have chosen to serve stale data for the duration of every TTL window and you should say so explicitly.
Overlooking cache penetration. Candidates almost always address thundering herd. They rarely address the steady-state cost of requests for keys that will never exist. Bloom filters or negative caching are the standard mitigations.

Cache Strategy Decision Tree

Rendering diagram...

Advanced Patterns

Pattern	How It Works	When to Use
Singleflight	Concurrent misses for the same key collapse into one DB query	Popular keys under >100 RPS
Negative caching	Cache the fact that a key doesn't exist (short TTL)	10%+ lookups for non-existent keys
Read-through	Cache itself fetches from DB on miss	Uniform miss handling across all consumers
Cache warming	Pre-populate before traffic arrives	Deployments, new regions, predictable spikes
Multi-tier	L1 (in-process) → L2 (Redis) → DB	When L2 network hop is too costly for hot keys
Stampede lock	On miss, one request acquires lock; others wait or get stale	Expensive DB queries (>100ms) on popular keys

Hit Rate Segmentation

A 95% aggregate hit rate hides the real story:

Key Segment	Hit Rate	Query Share	DB Load Share
Popular (top 1K)	99.5%	60%	3%
Mid-tail (next 100K)	92%	30%	28%
Long-tail (rest)	40%	10%	69%

Staff insight: Aggregate hit rate is a vanity metric. The long tail generates the majority of database load despite being a minority of traffic.

Practice Prompt

Staff-Caliber Answer Shape

Expand

Audit current usage. What's the TTL distribution? Are there keys with TTLs >24h that could be shortened? Stale data from deprecated features?
Analyze key-size distribution. Run redis-cli --bigkeys. Often 5% of keys consume 50% of memory. Compress large values or move them to a dedicated store.
Evaluate eviction policy. allkeys-lru self-manages. noeviction (common mistake) means writes fail when full.
Consider tiering. Move cold keys to a cheaper store. Keep only the hot working set in Redis.
Scale horizontally. Add Redis Cluster shards rather than resizing the instance.

The Staff move: Don't just ask for more memory. Show you've exhausted optimization first: "After TTL tuning and compression, we need 15% more capacity, not 30%."

Additional Traps

Using cache as a primary data store. Redis with no persistence is a speed layer. If it dies and data is gone, you had a SPOF you called "caching."
TTL jitter neglect. Same TTL on all batch keys = simultaneous expiry. Add ±10% random jitter.
Ignoring serialization cost. JSON serialization of complex objects can take 1-5ms. Consider protobuf or msgpack for hot paths.
Cache key collision. Using user:{id} for different endpoints caching different projections leads to wrong data. Include query shape in the key.

Where This Appears

Distributed Caching — Cache topologies, invalidation
CDN & Edge Caching — Edge cache TTL ownership
Feed Generation — Pre-computed feed caching
Search & Indexing — Query result caching