Technologies referenced in this playbook: Elasticsearch
How to Use This Playbook
This playbook supports three reading modes:
| Mode | Time | What to Read |
|---|---|---|
| Quick Review | 15 min | Executive Summary → Interview Walkthrough → Fault Lines (§3) → Drills (§7) |
| Targeted Study | 1-2 hrs | Executive Summary → Interview Walkthrough → Core Flow, expand appendices where you're weak |
| Deep Dive | 3+ hrs | Everything, including all appendices |
What is Search & Indexing? — Why interviewers pick this topic
The Problem
Search infrastructure allows users to find relevant data across large, unstructured, or semi-structured datasets. Unlike direct lookups (query by primary key), search requires building and maintaining secondary data structures — indexes — that map content characteristics to document locations. The challenge isn't building a search box; it's building a search platform that stays fresh, relevant, and fast as data and query patterns evolve.
Common Use Cases
- Product search: E-commerce catalogs with filters, facets, ranking, and personalization
- Full-text search: Document retrieval, knowledge bases, support tickets
- Autocomplete/typeahead: Sub-100ms suggestions as users type
- Log search: Operational search across billions of log lines (ELK stack)
- Geo-search: Location-based queries with distance ranking
Why Interviewers Ask About This
Search exposes the core Staff-level skill: reasoning about the gap between what the user expects and what the system can deliver. Users expect instant, relevant results. The system deals with stale indexes, relevance tuning that's never "done," and an indexing pipeline that can fall hours behind. Interviewers want to see you navigate the tension between freshness, relevance, and cost — not recite inverted index theory.
Mechanics Refresher: Inverted Index Fundamentals — How search engines map terms to documents
| Concept | What It Does | Why It Matters for Fault Lines |
|---|---|---|
| Inverted index | Maps each term → list of document IDs containing it | The core data structure. Rebuild cost is the dominant operational concern. |
| Tokenization | Splits text into searchable terms ("New York" → ["new", "york"]) | Tokenization bugs silently break search — users search for terms that don't exist in the index |
| TF-IDF / BM25 | Ranks documents by term frequency vs rarity across corpus | Default relevance scoring. Works well until business rules override it. |
| Analyzers | Chain of tokenizer + filters (lowercase, stemming, synonyms) | Analyzer mismatches between index-time and query-time cause silent zero-result queries |
| Segments & merging | Lucene writes immutable segments; background merge compacts them | Merge storms spike CPU and latency. Over-segmentation degrades query performance. |
Why this matters for fault lines: The indexing pipeline's health determines search quality. A stale index serves wrong results. A misconfigured analyzer returns zero results. Staff engineers own the pipeline, not just the query layer.
What This Interview Actually Tests
Search is not an Elasticsearch question. Everyone can set up an index.
This is a data pipeline ownership and relevance engineering question that tests:
- Whether you separate the indexing pipeline from the query layer (and who owns each)
- Whether you reason about freshness guarantees as a business SLA, not a technical aspiration
- Whether you understand that relevance is never "solved" — it's a continuous tuning loop
- Whether you can own the operational cost of search infrastructure at scale
The key insight: Search infrastructure is a platform problem. The search box is the tip of the iceberg — underneath it is an indexing pipeline, a relevance model, a freshness guarantee, and an operational burden that grows with every new data source.
The L5 vs L6 Contrast (Memorize This)
| Behavior | L5 (Senior) | L6 (Staff) |
|---|---|---|
| First move | "We'll use Elasticsearch" | Asks "What are the search requirements? Freshness SLA? Relevance expectations?" |
| Indexing | "We'll index on write" | Designs the indexing pipeline: source → transform → index, with freshness guarantees and failure handling |
| Relevance | "BM25 is good enough" | Asks "What does 'relevant' mean for this use case? Who defines it? How do we measure it?" |
| Freshness | "Near real-time" | Quantifies: "Our indexing pipeline has a 30-second p99 lag. Is that acceptable for this use case?" |
| Ownership | Focuses on the search cluster | Asks "Who owns the indexing pipeline? Who tunes relevance? Who gets paged when search is stale?" |
The Three Intents (Pick One and Commit)
| Intent | Constraint | Strategy | Freshness Bar |
|---|---|---|---|
| Precision Search | Results must be correct and comprehensive | Full-text + filters + facets, careful analyzer tuning | Seconds to minutes (acceptable) |
| Discovery/Browse | Results should be interesting and engaging | Personalized ranking, collaborative filtering, diversity | Minutes to hours (acceptable) |
| Operational Search | Results must be fast and fresh | Log/event search, time-series focus, minimal ranking | Sub-second freshness (critical) |
🎯 Staff Insight: "I'll assume we're building precision search for an e-commerce catalog — users search by product name, filter by attributes, and expect results to reflect inventory changes within 60 seconds. This means a dedicated indexing pipeline with freshness monitoring and relevance that balances text match with business signals (popularity, margin, stock)."
The Five Fault Lines (The Core of This Interview)
-
Freshness vs Cost — How quickly must the index reflect source data changes, and what's the infrastructure cost of that freshness?
-
Relevance Definition — Who defines "relevant"? Text match, business rules, personalization, or some blend? And how do you measure it?
-
Indexing Pipeline Ownership — The pipeline from source data to searchable index is the most fragile part. Who owns it, and what happens when it breaks?
-
Query Performance vs Index Completeness — Do you index everything (slow writes, fast reads) or index selectively (fast writes, incomplete results)?
-
Search as Platform vs Feature — Is search a shared platform (central team, multi-tenant) or a per-team feature (each team runs their own)?
Each fault line has a tradeoff matrix with explicit "who pays" analysis. See §3.
Quick Reference: What Interviewers Probe
| After You Say... | They Will Ask... |
|---|---|
| "Elasticsearch cluster" | "What happens when the indexing pipeline falls behind? How do users know results are stale?" |
| "BM25 for relevance" | "Product wants to boost promoted items. How does that interact with text relevance?" |
| "Index on write" | "What's the write amplification? What happens during a bulk data migration?" |
| "Near real-time" | "Define 'near.' What's the p99 indexing lag? What happens when it exceeds your SLA?" |
| "We'll add more nodes" | "How do you handle shard rebalancing? What's the query latency during rebalance?" |
Jump to Practice
→ Active Drills (§7) — 8 practice prompts with expected answer shapes
System Architecture Overview
Interview Walkthrough
Phase 1: Requirements & Framing (30 seconds)
- "Search has two sides: indexing (building the searchable data structure) and querying (finding relevant results). The fundamental tradeoff is index freshness vs query latency — how fast new content becomes searchable vs how fast queries return results."
Phase 2: Core Entities (30 seconds)
- Inverted Index: maps terms → list of document IDs containing that term (the core data structure)
- Analyzer: tokenizer + normalizer + stemmer that converts text into indexable terms
- Segment: an immutable chunk of the index; new documents go to new segments; old segments merge periodically
- Relevance Score: TF-IDF or BM25 ranking that determines result ordering
Phase 3: The 2-Minute Architecture (2 minutes)
Phase 4: Transition to Depth (15 seconds)
"The basics are straightforward. The hard problems are: relevance tuning (why the wrong results rank first), index freshness (how fast new content is searchable), and search at scale (multi-shard query coordination)."
Phase 5: Deep Dives (5-15 minutes if probed)
Probe 1: "How does relevance scoring work?" (3-5 min)
"BM25 is the standard scoring function. It considers two factors: term frequency (TF) — how often the term appears in the document, and inverse document frequency (IDF) — how rare the term is across all documents."
"The insight: 'the' appears in every document (low IDF, nearly zero weight). 'kubernetes' appears in few documents (high IDF, high weight). BM25 automatically prioritizes rare, specific terms over common ones."
Beyond BM25: "BM25 is text-only. Production search combines BM25 with: (1) popularity signals (click-through rate, purchase count), (2) recency boost (newer content ranks higher), (3) personalization (user's past behavior influences ranking), (4) query intent classification (navigational vs informational vs transactional)."
Probe 2: "How do you handle real-time indexing?" (3-5 min)
"Elasticsearch's default refresh interval is 1 second. This means a newly indexed document is searchable within 1 second. For most use cases, that's sufficient."
When it's not sufficient: "A stock trading platform needs sub-100ms indexing. An e-commerce site publishing a flash sale needs instant searchability. For these cases: (1) reduce the refresh interval to 200ms (at the cost of more small segments and more CPU for merging), (2) use a real-time search layer (Redis with full-text modules) for the most recent content and merge with Elasticsearch for historical search."
When it's too aggressive: "A log aggregation system indexing 1 million events/sec doesn't need 1-second freshness. Increase the refresh interval to 30 seconds and use bulk indexing for throughput."
Probe 3: "How does search scale?" (3-5 min)
"Elasticsearch distributes the index across shards. Each shard is a complete Lucene index. Queries scatter to all shards and gather results."
The scaling decisions:
- Shard count: Fixed at index creation. Too few: each shard is too large for fast queries. Too many: coordination overhead dominates. "Rule of thumb: 10-50GB per shard. A 1TB index needs 20-100 shards."
- Replica count: Each shard has N replicas. More replicas = more read throughput but more storage. "For read-heavy search: 2-3 replicas per shard. Reads fan out to replicas, spreading the load."
- Hot-warm-cold architecture: Recent data on fast SSDs (hot), older data on slower/cheaper storage (warm), archived data on S3 (cold). "This reduces cost by 5-10x for indices with time-based access patterns."
Phase 6: Wrap-Up
"Search is an inverted index problem with a ranking problem on top. The Staff-level insight: the index is easy (Elasticsearch handles it). Relevance tuning — making the RIGHT results appear first — is where teams spend 80% of their search engineering effort. And the most common bug isn't a missing document; it's analyzer mismatch between indexing and querying."
Quick-Reference: The 30-Second Cheat Sheet
| Topic | The L5 Answer | The L6 Answer (say this) |
|---|---|---|
| Technology | "Use Elasticsearch" | "Elasticsearch for full-text; the hard part is relevance tuning, not the infrastructure" |
| Indexing | "Index the data" | "CDC from primary DB → analyzer → inverted index with configurable refresh interval" |
| Relevance | "BM25 scoring" | "BM25 + popularity + recency + personalization — pure text relevance isn't enough" |
| Freshness | "Real-time search" | "1-second refresh interval; tune based on freshness SLA vs indexing throughput" |
| Scale | "Add more shards" | "10-50GB per shard, hot-warm-cold for cost optimization" |