Design a Search Engine | StaffSignal Playbook

Technologies referenced in this playbook: Elasticsearch

How to Use This Playbook

This playbook supports three reading modes:

Mode	Time	What to Read
Quick Review	15 min	Executive Summary → Interview Walkthrough → Fault Lines (§3) → Drills (§7)
Targeted Study	1-2 hrs	Executive Summary → Interview Walkthrough → Core Flow, expand appendices where you're weak
Deep Dive	3+ hrs	Everything, including all appendices

What is Search & Indexing? — Why interviewers pick this topic

The Problem

Search infrastructure allows users to find relevant data across large, unstructured, or semi-structured datasets. Unlike direct lookups (query by primary key), search requires building and maintaining secondary data structures — indexes — that map content characteristics to document locations. The challenge isn't building a search box; it's building a search platform that stays fresh, relevant, and fast as data and query patterns evolve.

Common Use Cases

Product search: E-commerce catalogs with filters, facets, ranking, and personalization
Full-text search: Document retrieval, knowledge bases, support tickets
Autocomplete/typeahead: Sub-100ms suggestions as users type
Log search: Operational search across billions of log lines (ELK stack)
Geo-search: Location-based queries with distance ranking

Why Interviewers Ask About This

Search exposes the core Staff-level skill: reasoning about the gap between what the user expects and what the system can deliver. Users expect instant, relevant results. The system deals with stale indexes, relevance tuning that's never "done," and an indexing pipeline that can fall hours behind. Interviewers want to see you navigate the tension between freshness, relevance, and cost — not recite inverted index theory.

Mechanics Refresher: Inverted Index Fundamentals — How search engines map terms to documents

Who Pays Analysis

Concept	What It Does	Why It Matters for Fault Lines
Inverted index	Maps each term → list of document IDs containing it	The core data structure. Rebuild cost is the dominant operational concern.
Tokenization	Splits text into searchable terms ("New York" → ["new", "york"])	Tokenization bugs silently break search — users search for terms that don't exist in the index
TF-IDF / BM25	Ranks documents by term frequency vs rarity across corpus	Default relevance scoring. Works well until business rules override it.
Analyzers	Chain of tokenizer + filters (lowercase, stemming, synonyms)	Analyzer mismatches between index-time and query-time cause silent zero-result queries
Segments & merging	Lucene writes immutable segments; background merge compacts them	Merge storms spike CPU and latency. Over-segmentation degrades query performance.

Why this matters for fault lines: The indexing pipeline's health determines search quality. A stale index serves wrong results. A misconfigured analyzer returns zero results. Staff engineers own the pipeline, not just the query layer.

What This Interview Actually Tests

Search is not an Elasticsearch question. Everyone can set up an index.

This is a data pipeline ownership and relevance engineering question that tests:

Whether you separate the indexing pipeline from the query layer (and who owns each)
Whether you reason about freshness guarantees as a business SLA, not a technical aspiration
Whether you understand that relevance is never "solved" — it's a continuous tuning loop
Whether you can own the operational cost of search infrastructure at scale

The key insight: Search infrastructure is a platform problem. The search box is the tip of the iceberg — underneath it is an indexing pipeline, a relevance model, a freshness guarantee, and an operational burden that grows with every new data source.

The L5 vs L6 Contrast (Memorize This)

Level Calibration

Behavior	L5 (Senior)	L6 (Staff)
First move	"We'll use Elasticsearch"	Asks "What are the search requirements? Freshness SLA? Relevance expectations?"
Indexing	"We'll index on write"	Designs the indexing pipeline: source → transform → index, with freshness guarantees and failure handling
Relevance	"BM25 is good enough"	Asks "What does 'relevant' mean for this use case? Who defines it? How do we measure it?"
Freshness	"Near real-time"	Quantifies: "Our indexing pipeline has a 30-second p99 lag. Is that acceptable for this use case?"
Ownership	Focuses on the search cluster	Asks "Who owns the indexing pipeline? Who tunes relevance? Who gets paged when search is stale?"

The Three Intents (Pick One and Commit)

Intent	Constraint	Strategy	Freshness Bar
Precision Search	Results must be correct and comprehensive	Full-text + filters + facets, careful analyzer tuning	Seconds to minutes (acceptable)
Discovery/Browse	Results should be interesting and engaging	Personalized ranking, collaborative filtering, diversity	Minutes to hours (acceptable)
Operational Search	Results must be fast and fresh	Log/event search, time-series focus, minimal ranking	Sub-second freshness (critical)

🎯 Staff Insight: "I'll assume we're building precision search for an e-commerce catalog — users search by product name, filter by attributes, and expect results to reflect inventory changes within 60 seconds. This means a dedicated indexing pipeline with freshness monitoring and relevance that balances text match with business signals (popularity, margin, stock)."

The Five Fault Lines (The Core of This Interview)

Freshness vs Cost — How quickly must the index reflect source data changes, and what's the infrastructure cost of that freshness?
Relevance Definition — Who defines "relevant"? Text match, business rules, personalization, or some blend? And how do you measure it?
Indexing Pipeline Ownership — The pipeline from source data to searchable index is the most fragile part. Who owns it, and what happens when it breaks?
Query Performance vs Index Completeness — Do you index everything (slow writes, fast reads) or index selectively (fast writes, incomplete results)?
Search as Platform vs Feature — Is search a shared platform (central team, multi-tenant) or a per-team feature (each team runs their own)?

Each fault line has a tradeoff matrix with explicit "who pays" analysis. See §3.

Quick Reference: What Interviewers Probe

After You Say...	They Will Ask...
"Elasticsearch cluster"	"What happens when the indexing pipeline falls behind? How do users know results are stale?"
"BM25 for relevance"	"Product wants to boost promoted items. How does that interact with text relevance?"
"Index on write"	"What's the write amplification? What happens during a bulk data migration?"
"Near real-time"	"Define 'near.' What's the p99 indexing lag? What happens when it exceeds your SLA?"
"We'll add more nodes"	"How do you handle shard rebalancing? What's the query latency during rebalance?"

Jump to Practice

→ Active Drills (§7) — 8 practice prompts with expected answer shapes

System Architecture Overview

Rendering diagram...

Interview Walkthrough

Phase 1: Requirements & Framing (30 seconds)

"Search has two sides: indexing (building the searchable data structure) and querying (finding relevant results). The fundamental tradeoff is index freshness vs query latency — how fast new content becomes searchable vs how fast queries return results."

Phase 2: Core Entities (30 seconds)

Inverted Index: maps terms → list of document IDs containing that term (the core data structure)
Analyzer: tokenizer + normalizer + stemmer that converts text into indexable terms
Segment: an immutable chunk of the index; new documents go to new segments; old segments merge periodically
Relevance Score: TF-IDF or BM25 ranking that determines result ordering

Phase 3: The 2-Minute Architecture (2 minutes)

Staff-grade phrasing

"The search pipeline has three stages:

1. Ingestion. Data changes (new product, updated review, deleted listing) are captured via CDC or event stream and sent to the indexing pipeline. Each event is analyzed (tokenized, stemmed, normalized) and added to the index.

2. Indexing. Elasticsearch (or Solr) maintains an inverted index across shards. Each shard holds a subset of documents. New documents are buffered in memory and flushed to a new segment every 1 second (the 'refresh interval'). Background merges consolidate small segments into larger ones.

3. Querying. The query is analyzed with the same analyzer as indexing (critical for consistency). The coordinator node scatters the query to all shards, each shard returns its top-K results, and the coordinator merges and re-ranks. Total latency: 10-50ms for typical queries."

Rendering diagram...

Phase 4: Transition to Depth (15 seconds)

"The basics are straightforward. The hard problems are: relevance tuning (why the wrong results rank first), index freshness (how fast new content is searchable), and search at scale (multi-shard query coordination)."

Phase 5: Deep Dives (5-15 minutes if probed)

Probe 1: "How does relevance scoring work?" (3-5 min)

"BM25 is the standard scoring function. It considers two factors: term frequency (TF) — how often the term appears in the document, and inverse document frequency (IDF) — how rare the term is across all documents."

"The insight: 'the' appears in every document (low IDF, nearly zero weight). 'kubernetes' appears in few documents (high IDF, high weight). BM25 automatically prioritizes rare, specific terms over common ones."

Beyond BM25: "BM25 is text-only. Production search combines BM25 with: (1) popularity signals (click-through rate, purchase count), (2) recency boost (newer content ranks higher), (3) personalization (user's past behavior influences ranking), (4) query intent classification (navigational vs informational vs transactional)."

Probe 2: "How do you handle real-time indexing?" (3-5 min)

"Elasticsearch's default refresh interval is 1 second. This means a newly indexed document is searchable within 1 second. For most use cases, that's sufficient."

When it's not sufficient: "A stock trading platform needs sub-100ms indexing. An e-commerce site publishing a flash sale needs instant searchability. For these cases: (1) reduce the refresh interval to 200ms (at the cost of more small segments and more CPU for merging), (2) use a real-time search layer (Redis with full-text modules) for the most recent content and merge with Elasticsearch for historical search."

When it's too aggressive: "A log aggregation system indexing 1 million events/sec doesn't need 1-second freshness. Increase the refresh interval to 30 seconds and use bulk indexing for throughput."

Probe 3: "How does search scale?" (3-5 min)

"Elasticsearch distributes the index across shards. Each shard is a complete Lucene index. Queries scatter to all shards and gather results."

The scaling decisions:

Shard count: Fixed at index creation. Too few: each shard is too large for fast queries. Too many: coordination overhead dominates. "Rule of thumb: 10-50GB per shard. A 1TB index needs 20-100 shards."
Replica count: Each shard has N replicas. More replicas = more read throughput but more storage. "For read-heavy search: 2-3 replicas per shard. Reads fan out to replicas, spreading the load."
Hot-warm-cold architecture: Recent data on fast SSDs (hot), older data on slower/cheaper storage (warm), archived data on S3 (cold). "This reduces cost by 5-10x for indices with time-based access patterns."

Phase 6: Wrap-Up

"Search is an inverted index problem with a ranking problem on top. The Staff-level insight: the index is easy (Elasticsearch handles it). Relevance tuning — making the RIGHT results appear first — is where teams spend 80% of their search engineering effort. And the most common bug isn't a missing document; it's analyzer mismatch between indexing and querying."

Quick-Reference: The 30-Second Cheat Sheet