StaffSignal
Foundation — Quick Reference

Database Indexing

B-tree vs LSM-tree, composite indexes, covering indexes. Index design as a cost conversation, not a performance trick.

Database Indexing — Staff Interview Quick Reference

The 60-Second Version

  • B-tree: balanced tree, O(log N) lookup. Read-optimized. Default for most RDBMS (Postgres, MySQL). The workhorse of OLTP with mixed read/write workloads.
  • LSM-tree: log-structured merge tree. Write-optimized via sequential writes. Powers RocksDB, Cassandra, LevelDB. Trade-off: read amplification and compaction overhead.
  • Composite indexes obey the leftmost prefix rule. An index on (A, B, C) serves queries on A, (A, B), and (A, B, C) — but NOT B alone. Column order is a design decision, not an afterthought.
  • Covering indexes include every column the query needs, eliminating heap/table lookups entirely. The fastest query is one that never touches the table.
  • Write amplification is the hidden cost. Every index adds a write on every insert/update. Five indexes means roughly 5x write overhead. Index design is always a cost conversation.
  • Partial/filtered indexes target only rows matching a condition. Smaller footprint, faster lookups for high-frequency query patterns.
  • Index-only scans occur when the index contains all requested data. No table access. Massive performance wins, especially for analytics queries.

What Staff Engineers Say (That Seniors Don't)

ConceptSenior ResponseStaff Response
Index selection"Add an index on the slow column""What's the read/write ratio? Five indexes on a write-heavy table is a cost we need to quantify"
B-tree vs LSM"B-tree is standard""B-tree for read-heavy OLTP, LSM for write-heavy ingest — and we need to budget for LSM compaction IO"
Composite indexes"Create an index on those columns""Column order follows query selectivity and the leftmost prefix rule; reordering changes which queries benefit"
Covering indexes"It avoids a table lookup""It turns a random IO into zero IO. At our query volume that eliminates N million disk seeks per day"
Index bloat"We should rebuild indexes""Our index-to-data ratio is 1.4x. Partial indexes on the hot partition cut storage and keep the working set in memory"

The Numbers That Matter

  • B-tree lookup on 1B rows: 3-4 disk reads (tree depth)
  • Index entry size: 8-16 bytes typical (key + pointer)
  • Full table scan at scale: effectively unusable for user-facing latency
  • Each additional index: +1 write per insert/update (plus page splits, WAL entries)
  • Covering index benefit: eliminates random IO entirely — the gap between 0 disk reads and 1 disk read is larger than the gap between 1 and 10

Common Interview Traps

  • "Just add an index" without discussing write amplification. Staff candidates frame indexing as a trade-off between read latency, write throughput, and storage cost.
  • Ignoring composite column order. Candidates who treat (A, B) and (B, A) as equivalent reveal a gap in understanding the leftmost prefix rule.
  • Forgetting LSM compaction costs. Citing LSM as "faster for writes" without acknowledging compaction IO, space amplification, and read penalty is an incomplete answer.
  • Missing the B-tree vs LSM decision boundary. The choice is workload-dependent, not universal. Interviewers probe whether you can articulate when each structure wins and why.

B-Tree vs LSM Decision

Rendering diagram...

Index Design Cheat Sheet

Query PatternIndex DesignWhy
WHERE user_id = ?Single-column index on user_idPoint lookup, B-tree ideal
WHERE user_id = ? AND created_at > ?Composite (user_id, created_at)Equality first, then range
WHERE status = 'active' AND ...Partial index WHERE status = 'active'Indexes only matching rows, smaller footprint
SELECT user_id, email FROM users WHERE ...Covering index (filter_col) INCLUDE (user_id, email)Index-only scan, zero table access
WHERE text LIKE '%search%'Full-text index (GIN in Postgres)B-tree can't help with leading wildcard
ORDER BY score DESC LIMIT 10Index on score DESCAvoids sorting; reads top 10 directly from index

Write Amplification Math

Each index adds write overhead. For a table with N indexes:

IndexesWrite Cost per INSERTExample Table
01 write (heap only)Append-only log
1~2 writesSimple lookup table
3~4 writesTypical OLTP table
5~6 writesOver-indexed table
10~11 writesAnti-pattern: audit table with analytics indexes

Staff rule: Every index proposal must justify the read gain against the write cost. At 5+ indexes on a write-heavy table, the overhead is architectural, not incidental.

Practice Prompt

Staff-Caliber Answer Shape
Expand
  1. Audit all 8 indexes. Use pg_stat_user_indexes to check which indexes are actually scanned. Unused indexes cost writes with zero read benefit — drop them.
  2. Check for redundant indexes. If you have both (user_id) and (user_id, created_at), the composite index covers the single-column queries. Drop the single-column index.
  3. Move the dashboard query off OLTP. COUNT(*) GROUP BY status on 500M rows is an analytics query. It should hit a materialized view or a replica — not the primary. Refresh the materialized view every 5 minutes.
  4. Partition the table. 500M rows with 2M/day growth means indexes are growing too. Range-partition by created_at (monthly). Old partitions are cold; new partitions are hot. Index maintenance is per-partition, not per-table.
  5. Consider partial indexes. If 95% of orders are status = 'completed', a partial index WHERE status != 'completed' covers the active-order queries with a fraction of the storage.

The Staff move: Don't add more indexes. Remove unnecessary ones, move analytics queries elsewhere, and partition to keep index sizes manageable.

Additional Traps

  • Creating indexes during peak traffic. CREATE INDEX on a large table acquires a write lock (without CONCURRENTLY in Postgres) and can block writes for minutes. Always use CREATE INDEX CONCURRENTLY in production.
  • Ignoring index bloat. B-tree indexes accumulate dead tuples from updates/deletes. Without REINDEX or autovacuum tuning, an index can be 2-3x its necessary size, wasting memory and slowing scans.
  • Composite index column order by intuition. The correct order is: equality columns first, then range columns, then sort columns. Getting this wrong means the range condition breaks the sort optimization.

Where This Appears