StaffSignal
Technology Guide31 min read

Elasticsearch

Distributed search and analytics engine built on Apache Lucene. The go-to technology for full-text search, log aggregation, and relevance ranking.

Why This Matters

Search is the feature that turns a database into a product. Every e-commerce platform, content site, and SaaS application needs users to find things — and the moment "find things" means more than exact match on a primary key, you need a search engine. Elasticsearch is the default technology for this problem in system design interviews, and understanding it deeply means understanding the tradeoff at the heart of every search system: freshness vs. relevance vs. cost.

The gap between L5 and L6 on Elasticsearch is not knowing it does full-text search — it is understanding that Elasticsearch is a derived read model, not a database. L5 candidates say "we'll store the data in Elasticsearch." L6 candidates say "PostgreSQL is the source of truth, Elasticsearch is a denormalized search index populated via a Kafka consumer, and here is what happens when the indexing pipeline falls behind." That second answer demonstrates the architectural maturity Staff interviews demand: you understand that search quality depends more on the reliability of your indexing pipeline than on any analyzer or scoring configuration.

If you can explain why your mapping uses keyword for status fields and text with edge n-grams for autocomplete, why your shard count is 10 and not 100, and how your hot/warm/cold architecture cuts storage costs by 80%, you have shown the operational depth that separates Staff engineers from candidates who just know the API.

The 60-Second Pitch

Elasticsearch is a distributed search and analytics engine built on Apache Lucene. Inverted indexes, BM25 relevance scoring, near-real-time indexing, and horizontal scaling across commodity hardware. In system design interviews, Elasticsearch is the default choice when the question involves full-text search, log aggregation, autocomplete, faceted navigation, or any workload where "find documents matching a complex query across billions of records in <100ms" is the requirement.

The Staff-level insight: Elasticsearch is not a database — it is a search engine that happens to store data. It has no transactions, no referential integrity, no durable single-write acknowledgment by default, and its eventual consistency model means a document indexed 50ms ago may not yet be visible to search. The Staff move is to pair Elasticsearch with a source of truth (PostgreSQL, Kafka) and treat it as a derived, eventually consistent read model optimized for query patterns that relational databases handle poorly.


Architecture & Internals

Cluster, Nodes, and Roles

An Elasticsearch cluster is a group of nodes that collectively store data and provide search capabilities. Each node can serve one or more roles:

Master-eligible nodes manage cluster state — index creation/deletion, shard allocation, node membership. The cluster elects one active master at a time. You need a minimum of 3 master-eligible nodes for quorum-based leader election (avoiding split brain). Master nodes should be lightweight — they do not store data or execute queries on dedicated clusters.

Data nodes store index shards and execute search and indexing operations. These are the workhorses — they need fast disks (SSD), significant RAM (for OS page cache and field data), and adequate CPU for query execution. In hot/warm/cold architectures, data nodes are tiered by hardware class: hot nodes (fast SSDs for recent data), warm nodes (HDDs for older data), cold nodes (cheapest storage for archived data).

Coordinating nodes (also called client nodes) route requests, scatter queries across shards, gather results, and merge them. Every node can coordinate, but dedicated coordinating nodes prevent expensive aggregation merges from competing with indexing on data nodes. For high-query-rate clusters, dedicated coordinators prevent search latency spikes during bulk indexing.

Ingest nodes run ingest pipelines (transformation, enrichment) before indexing. Useful for log parsing (Grok patterns), field extraction, GeoIP enrichment — preprocessing that would otherwise happen in Logstash or application code.

Rendering diagram...

Shards and Replicas

An index is divided into primary shards (the unit of data distribution) and replica shards (copies of primaries for redundancy and read scaling). A document is routed to a primary shard by hash(routing_key) % num_primary_shards. The routing key defaults to the document _id but can be overridden — a critical optimization for queries that always filter on the same field (e.g., tenant_id).

The number of primary shards is set at index creation and cannot be changed afterward (without reindexing). This is similar to Kafka's partition count — it is a commitment. Too few shards limits parallelism and maximum index size. Too many shards wastes resources (each shard is a Lucene index with its own file handles, memory structures, and thread pool slots). The rule of thumb: 10-50GB per shard for search-heavy workloads, up to 50GB for logging workloads.

Replica shards serve two purposes: (1) fault tolerance — if a node with a primary shard fails, a replica is promoted to primary, and (2) read scaling — search queries can be served by either primary or replica. With 1 replica per shard across 3 data nodes, the cluster tolerates the loss of any single node without data loss or search downtime.

The Inverted Index

Elasticsearch's speed comes from the inverted index — a data structure that maps each unique term to the list of documents containing it. When you index a document with text "The quick brown fox", the analyzer tokenizes it into terms (the, quick, brown, fox), and each term is added to the inverted index with a pointer to the document. A search for "brown fox" looks up both terms in the inverted index, finds the intersection of document lists, and returns matching documents — without scanning a single document's full text.

This is fundamentally different from a B-tree index in a relational database. A B-tree on a text column supports exact match and prefix queries (LIKE 'foo%'). An inverted index supports full-text search, stemming (matching "running" when searching "run"), synonyms, fuzzy matching (typo tolerance), and relevance scoring — all in O(1) per term lookup plus O(N) for posting list intersection, where N is the number of matching documents.

The analyzer pipeline controls how text becomes terms: character filters (strip HTML, normalize unicode) → tokenizer (split on whitespace, ngrams, or language-specific rules) → token filters (lowercase, stemming, stop word removal, synonyms). Choosing the right analyzer is a relevance-critical decision. The standard analyzer (default) tokenizes on word boundaries and lowercases. For autocomplete, use an edge_ngram tokenizer. For exact matching on IDs or enums, use the keyword analyzer (no tokenization).

Rendering diagram...

BM25 Relevance Scoring

Elasticsearch ranks search results using BM25 (Best Matching 25), a probabilistic relevance scoring algorithm. BM25 considers three factors: term frequency (how often the term appears in the document — more = higher score, with diminishing returns), inverse document frequency (how rare the term is across all documents — rarer terms score higher), and field length normalization (shorter fields score higher for the same term frequency — a match in a title is more relevant than the same match in a 10,000-word body).

BM25 replaced TF-IDF as the default scoring model in Elasticsearch 5.x. The key improvement: BM25 saturates term frequency impact (a term appearing 100 times is not 100x more relevant than appearing once), making it more robust against keyword stuffing and long documents.

In interviews, BM25 is relevant when discussing relevance tuning. The Staff answer to "how do you improve search quality?": "Boosting fields (title^3, body^1), custom analyzers for domain-specific tokenization, and function_score queries that blend BM25 relevance with business signals (recency, popularity, user personalization)."

Documents are not immediately searchable after indexing. Elasticsearch writes to an in-memory buffer and periodically refreshes (default every 1 second), which creates a new Lucene segment and makes buffered documents visible to search. This 1-second delay is the "near-real-time" in Elasticsearch's description.

For bulk indexing (initial data load, reindexing), set refresh_interval=-1 to disable automatic refresh, index all documents, then manually refresh. This can improve indexing throughput by 5-10x by avoiding the overhead of creating many small segments.

Documents are durable (survive node restart) only after a flush, which writes the in-memory buffer and transaction log (translog) to disk as a Lucene commit. The translog provides durability between flushes — every write operation is appended to the translog, which is fsynced by default. If the node crashes before a flush, the translog is replayed on recovery.


Core Concepts for Interviews

Mapping Design

A mapping defines the schema of an index — field names, types, and analysis settings. Unlike relational schemas, mappings are not enforced at write time by default (dynamic mapping adds fields automatically). In production, always use explicit mappings to prevent mapping explosion — the accidental creation of thousands of fields from unstructured JSON, each consuming cluster state memory and degrading performance.

Key field types: text (analyzed, for full-text search), keyword (not analyzed, for exact match, sorting, aggregations), integer/long/float/double (numeric), date, geo_point, nested (for arrays of objects that need independent querying), object (flattened, fields merged with parent — no independent querying of array elements).

The most common mapping mistake: using text for fields that should be keyword. A status field with values active/inactive should be keyword — analyzing it into tokens (a, c, t, i, v, e) is meaningless. Use multi-fields ("status": { "type": "keyword", "fields": { "search": { "type": "text" } } }) only when you genuinely need both exact match and full-text search on the same field.

Nested vs. Parent-Child

When documents contain arrays of objects, Elasticsearch flattens them by default. An order with items: [{ "name": "shirt", "size": "M" }, { "name": "pants", "size": "L" }] becomes items.name: ["shirt", "pants"] and items.size: ["M", "L"] — losing the association between name and size. A query for "shirt in size L" incorrectly matches this document.

Nested fields preserve the association. Each nested object is stored as a hidden internal document. Queries use nested query syntax to match within a single nested object. Cost: each nested object is a separate Lucene document — a parent with 100 nested objects creates 101 Lucene documents, impacting indexing speed, shard size, and query complexity.

Parent-child (join field) stores parent and child as separate documents in the same index. Children can be updated independently without reindexing the parent. Cost: has_child and has_parent queries are slower than nested queries because they require a global ordinals lookup across the shard.

Decision rule: use nested when the number of nested objects per document is small (<100) and they are always read/written with the parent. Use parent-child when children are updated independently or the number of children per parent is large (>100).

Elasticsearch has no joins. Every query operates on a single index (cross-index search is possible but not a join — it is parallel queries with merged results). This means search indexes must be denormalized: embed all data needed for search and display into each document.

A product search index denormalizes: { "product_id": 123, "name": "Running Shoe", "brand_name": "Nike", "category_path": ["Shoes", "Athletic", "Running"], "price": 129.99, "avg_rating": 4.5, "review_count": 2847, "in_stock": true }. The brand name is duplicated in every product document. If Nike changes its name, you reindex every Nike product. This is the tradeoff: query speed and simplicity in exchange for update complexity and storage redundancy.


Query Types

Full-Text Queries

match: the workhorse. Analyzes the query string, then searches for each term in the inverted index. { "match": { "title": "running shoes" } } finds documents containing "running" OR "shoes" (default operator). With "operator": "and", both terms must appear.

multi_match: searches across multiple fields. { "multi_match": { "query": "running shoes", "fields": ["title^3", "description", "brand"] } } boosts title matches 3x. Types: best_fields (score from the best-matching field), most_fields (sum scores across fields), cross_fields (term-centric — each term must appear in at least one field).

match_phrase: terms must appear in order and adjacent. { "match_phrase": { "title": "running shoes" } } matches "running shoes" but not "shoes for running". Use slop parameter to allow N words between terms.

Structured Queries

term: exact match on keyword fields (not analyzed). { "term": { "status": "active" } }. Never use term on text fields — the query is not analyzed but the field was, so "Active" won't match the lowercased token "active".

range: { "range": { "price": { "gte": 50, "lte": 200 } } }. Works on numeric, date, and keyword fields.

bool: combines queries with must (AND, affects score), filter (AND, no scoring — faster), should (OR, boosts score), must_not (NOT, no scoring). The Staff pattern for search: full-text in must, structured filters in filter (price range, category, in-stock), business boosting in should (featured products, high ratings).

Aggregations

Bucket aggregations: group documents — terms (group by field value), date_histogram (group by time interval), range (group by numeric ranges). These power faceted navigation ("500 results in Shoes, 200 in Clothing").

Metric aggregations: compute statistics within buckets — avg, sum, min, max, cardinality (approximate distinct count using HyperLogLog), percentiles.

Pipeline aggregations: operate on other aggregation results — moving averages, cumulative sums, bucket selectors. These power analytics dashboards and time-series analysis within Elasticsearch.

Autocomplete and Suggestions

Completion suggester: purpose-built for autocomplete. Uses an FST (finite state transducer) data structure for prefix lookups in O(length) time. The field type completion stores suggestions separately from the inverted index — sub-millisecond response time regardless of index size.

Edge n-grams: alternative approach — tokenize "elasticsearch" as ["e", "el", "ela", "elas", ...] at index time. Queries use a standard match query. More flexible than completion suggester (supports fuzzy matching, middle-of-word matching) but uses more disk space and indexing time.

The Staff recommendation: completion suggester for simple prefix autocomplete (product names, user names). Edge n-grams for search-as-you-type with fuzzy tolerance and more complex matching requirements.


Scaling Elasticsearch

Shard Sizing

The fundamental scaling unit is the shard. Each shard is an independent Lucene index with its own inverted index, file handles, and memory overhead (~500MB-1GB heap per shard). Oversized shards (>50GB) cause slow recoveries and rebalancing. Undersized shards (<1GB) waste resources with excessive overhead.

Target: 10-50GB per shard for search workloads. For a 500GB dataset with 1 replica: 500GB / 30GB_per_shard ≈ 17 primary shards, 34 total shards (17 primary + 17 replica). Distribute across data nodes so each node holds a manageable number of shards (aim for <500 shards per node).

Index Lifecycle Management (ILM)

For time-series data (logs, metrics, events), ILM automates index rotation through phases:

PhaseDurationStoragePurpose
Hot0-7 daysFast SSDActive writes and most queries
Warm7-30 daysStandard SSDRead-only, still frequently queried
Cold30-90 daysHDD or S3 (searchable snapshots)Rarely queried, retained for compliance
Delete>90 daysNonePurged

Rollover indices: instead of one massive index, create date-based indices (logs-2024.01.15) with an alias (logs-write pointing to the current index). ILM rolls over when the index hits a size threshold (50GB) or age threshold (1 day). Queries use a read alias (logs-read) that spans all active indices. This pattern enables cheap deletion (drop an entire index vs. deleting documents) and per-phase hardware allocation.

Hot/Warm/Cold Architecture

Assign data nodes to tiers using node attributes (node.attr.data_tier: hot). ILM policies move indices between tiers automatically. Hot nodes run on fast NVMe SSDs with high CPU. Warm nodes run on standard SSDs with moderate CPU. Cold nodes use searchable snapshots from S3 — data is not stored locally, only fetched on demand.

This architecture reduces infrastructure cost by 60-80% compared to keeping all data on hot-tier hardware. A 10TB cluster with 90-day retention might need only 1TB of hot storage (7 days), 3TB warm (30 days), and 6TB cold (searchable snapshots on S3 at pennies per GB).


Failure Modes

Cluster Yellow / Red State

Yellow: all primary shards are assigned, but some replica shards are not. The cluster functions but has reduced redundancy — one more node failure could cause data loss. Common cause: cluster has fewer data nodes than the replication factor, or a node recently left and replicas are being relocated.

Red: one or more primary shards are unassigned. Queries that hit unassigned shards return partial results or errors. This is a data availability incident.

Detection: GET _cluster/health — status field. Alert immediately on red, within 15 minutes on yellow.

Staff Response: "Yellow after a node bounce is expected — replicas relocate within minutes. Sustained yellow means we have a structural problem: insufficient nodes, disk watermarks exceeded (cluster.routing.allocation.disk.watermark.low defaults to 85%), or shard allocation filtering misconfigured. Red requires immediate investigation — check GET _cluster/allocation/explain to understand why the shard cannot be assigned. Most common cause: disk full on all eligible nodes."

Shard Relocation Storms

Symptom: Cluster performance degrades during node addition, removal, or recovery. Network bandwidth is saturated with shard data transfers. Search latency spikes.

Detection: GET _cat/recovery shows many concurrent recoveries. Network I/O metrics on data nodes spike.

Staff Response: "Limit concurrent recoveries with cluster.routing.allocation.node_concurrent_recoveries (default 2). Set indices.recovery.max_bytes_per_sec to cap recovery bandwidth (default 40MB/s). For planned maintenance, use allocation filtering to drain a node gracefully before removing it: PUT _cluster/settings { "transient": { "cluster.routing.allocation.exclude._name": "node-5" } }."

Mapping Explosion

Symptom: Master node memory usage spikes. Cluster state size grows to hundreds of MB. Index creation and mapping updates become slow. Master node becomes unresponsive.

Detection: GET _cluster/state/metadata size exceeds 100MB. Index mapping has >1,000 fields.

Staff Response: "Set index.mapping.total_fields.limit (default 1000) and dynamic: strict on all production indices. Audit existing mappings for unused fields. For log ingestion with variable fields, use flattened field type (stores the entire object as a single field with keyword-only sub-fields) instead of dynamic mapping."

Slow Queries

Symptom: Search latency p99 exceeds SLA. Some queries take seconds instead of milliseconds.

Detection: Enable slow log (index.search.slowlog.threshold.query.warn: 1s). Monitor search_latency metrics per index.

Staff Response: "Profile with GET index/_search?profile=true. Common causes: (1) wildcard or regexp queries at the beginning of a term (no inverted index optimization — full scan), (2) terms aggregation on high-cardinality fields loading millions of unique values into heap, (3) deep pagination (from: 10000, size: 10 — Elasticsearch must fetch and sort 10,010 documents from every shard, then discard 10,000). Fix deep pagination with search_after or scroll API."

Split Brain (Legacy)

Symptom: Two master nodes active simultaneously. Each half of the cluster accepts writes independently. Data diverges.

Detection: In Elasticsearch 7.x+, this is effectively eliminated by the built-in cluster coordination (quorum-based voting configuration). In older versions, monitor number_of_master_nodes — should always be exactly 1.

Staff Response: "Elasticsearch 7.x+ uses a quorum-based voting configuration that prevents split brain without manual minimum_master_nodes tuning. For pre-7.x clusters: set discovery.zen.minimum_master_nodes to (master-eligible_nodes / 2) + 1. For any version: use 3 dedicated master nodes (never 2, never an even number) and ensure master nodes are in separate failure domains."

Rendering diagram...

When to Use vs. Alternatives

Use CaseElasticsearchPostgreSQL FTSAlgoliaTypesenseSolr
Full-text search (>1M docs)✅ Best⚠️ Viable to ~10M rows✅ Managed✅ Simpler✅ Comparable
Relevance tuning✅ Extensive⚠️ Basic (ts_rank)✅ AI-powered⚠️ Basic✅ Extensive
Log aggregation✅ Best (ELK stack)❌ Not designed for❌ Not designed for❌ Not designed for⚠️ Possible
Autocomplete✅ Completion suggester⚠️ pg_trgm✅ Built-in✅ Built-in✅ Possible
Faceted navigation✅ Aggregations⚠️ Manual GROUP BY✅ Built-in✅ Built-in✅ Built-in
Geo search✅ Geo queries✅ PostGIS (better)⚠️ Basic⚠️ Basic✅ Spatial
Ops complexity⚠️ High (cluster mgmt)✅ Part of existing DB✅ Managed SaaS✅ Simple⚠️ High
Real-time indexing✅ ~1s delay✅ Immediate✅ Immediate✅ Immediate✅ Near-real-time
Cost at scale⚠️ Memory-intensive✅ Cheapest❌ Expensive at scale✅ Affordable⚠️ Similar to ES

Decision rule: Use Elasticsearch when you need full-text search at scale (>10M documents), log aggregation (ELK/OpenSearch), complex aggregations across large datasets, or relevance tuning beyond basic ranking. Use PostgreSQL tsvector + GIN when search is a secondary feature on <10M rows and you do not want to operate a separate cluster. Use Algolia when search quality and developer experience matter more than cost and you are under 10M records. Use Typesense when you want Algolia-like simplicity with self-hosted control.


Deployment Topologies

Rendering diagram...

Staff-Level Operational Concerns

Heap sizing: Elasticsearch uses Java heap for field data caches, query caches, and cluster state. Set heap to 50% of available RAM, never exceeding 31GB (beyond 31GB, JVM loses compressed ordinary object pointers, and effective heap efficiency drops). The other 50% of RAM is used by the OS page cache for Lucene segment files — this is equally critical. A 64GB server should run with 31GB heap and 33GB for page cache.

Monitoring essentials: Cluster health (red/yellow/green), JVM heap usage and GC frequency (jvm.gc.collectors.old.collection_time_in_millis — long pauses indicate heap pressure), indexing rate and search latency per index, pending tasks queue (master node backlog), thread pool rejections (search.rejected, write.rejected), and disk watermarks.

Reindexing strategy: Unlike schema migrations in PostgreSQL, Elasticsearch mapping changes often require a full reindex. The zero-downtime pattern: create a new index with the updated mapping, reindex from the old index (or from the source of truth), swap the alias from old to new, delete the old index. Use _reindex API for Elasticsearch-to-Elasticsearch, or re-feed from Kafka/PostgreSQL for source-of-truth rebuilds.

Index aliases are non-negotiable in production. Never point application code at concrete index names (products-v1). Always use aliases (products). Aliases enable zero-downtime reindexing (swap alias from v1 to v2), blue-green deployments, and A/B testing of different mappings or analyzers.


Interview Application

Which Playbooks Use Elasticsearch

PlaybookHow Elasticsearch Is UsedKey Pattern
Search IndexingPrimary search engineInverted index with custom analyzers and BM25 tuning
Web CrawlingCrawled content indexingBulk indexing pipeline with deduplication
Feed GenerationContent search and discoveryMulti-field search with recency boosting
Notification SystemsNotification search and filteringStructured queries with date range filters

What L5 Says vs. What L6 Says

TopicL5 SaysL6 Says
Choice"We'll use Elasticsearch for search""Elasticsearch for the search read model — inverted indexes give us sub-100ms full-text queries across 500M products. PostgreSQL remains the source of truth. We index via a Kafka consumer that denormalizes product data with brand, category, and inventory status."
Scaling"We'll add more nodes""10 primary shards at ~30GB each for 300GB of product data. 1 replica for fault tolerance. Hot/warm tiers if we add search analytics logging. Shard count is fixed at index creation so we size for 2x current volume."
Relevance"Elasticsearch handles ranking""BM25 base scoring with field boosting (title^3, description^1). Function score query blends relevance with conversion rate and recency decay. A/B test scoring changes via index aliases pointing to different index versions."
Failure"Elasticsearch is distributed""Near-real-time means a product indexed now is searchable in ~1 second. During that window, a user who just added a product will not find it via search. We handle this with a read-your-writes pattern: check the source of truth for recently created items and merge with search results."
Consistency"It's eventually consistent""The indexing pipeline lag from PostgreSQL change to ES visibility is typically 2-5 seconds via Kafka. If the pipeline falls behind, search results are stale. We monitor consumer lag and alert if search index is >30 seconds behind source of truth."

Common Interview Mistakes

MistakeWhy It's WrongWhat to Say Instead
"Elasticsearch as primary database"No transactions, eventual consistency, no referential integrity"Elasticsearch as a search read model, PostgreSQL/Kafka as source of truth"
"Just index everything"Mapping explosion, wasted storage, slower queries"Explicit mappings with dynamic: strict. Index only fields needed for search, filter, sort, and display."
LIKE '%query%' in PostgreSQL insteadFull table scan, no relevance scoring, no fuzzy matching"PostgreSQL tsvector for <10M rows. Elasticsearch for >10M or when relevance tuning, aggregations, or autocomplete are needed."
Ignoring near-real-time delayDocuments are not searchable for ~1 second after indexing"1-second indexing delay is acceptable for most search use cases. For real-time requirements, supplement with a source-of-truth check."
"We'll use scroll API for pagination"Scroll holds a point-in-time snapshot consuming heap; deprecated for user-facing pagination"search_after with a sort tiebreaker for deep pagination. Scroll only for bulk export."

In the Wild

Wikipedia — Elasticsearch for Full-Text Search at Scale

Wikipedia migrated from a custom Lucene-based search to Elasticsearch (via CirrusSearch) to handle search across 60M+ articles in 300+ languages. The system processes ~100K search queries per second. Each language wiki has its own index with language-specific analyzers — stemming, stop words, and tokenization rules differ dramatically between English, Chinese, Arabic, and Japanese.

The critical architectural decision: Wikipedia runs dedicated Elasticsearch clusters separate from their MediaWiki application servers. Search indices are populated via a change propagation pipeline — when an editor saves a page, the change flows through a Kafka-like event system to the Elasticsearch indexer. The indexing delay is typically 2-5 seconds, meaning a newly edited article may not appear in search results for a few seconds — an acceptable tradeoff for Wikipedia's use case.

GitHub uses Elasticsearch to power code search across 200M+ repositories. The challenge is not just text volume — it is that code search requires exact-match on symbols (function names, class names), regex support, and scope filtering (search within a specific repo, org, or language). GitHub uses custom analyzers that tokenize code differently from natural language: camelCase splitting, snake_case splitting, and path component tokenization.

The operational insight: GitHub's code search index is measured in petabytes. They use a tiered architecture: hot shards for frequently accessed repositories (popular open-source projects), warm shards for less popular repos, and cold storage for archived repos. The shard routing key is repository ID, ensuring all files from one repo land on the same shard for efficient repo-scoped searches.

Uber uses Elasticsearch for searching drivers, restaurants, and places. The geo_point field type combined with geo_distance queries enables "find the 10 nearest drivers within 5km" in <50ms. Elasticsearch's combination of geospatial queries with structured filters (driver status = available, vehicle type = UberX) makes it a natural fit — PostGIS could handle the geo queries alone, but combining geo with real-time availability filtering across millions of drivers at sub-100ms latency required Elasticsearch's inverted index + geo capabilities.

The operational insight: driver locations update every 4 seconds. At 5M active drivers, that is 1.25M location updates per second flowing through the indexing pipeline. Uber uses near-real-time indexing with a custom 500ms refresh interval (versus the default 1 second) to minimize the window where a driver's position is stale.


Practice Drill

Scenario: Design the search system for a job board (like Indeed). 50M job listings, 10M searches/day, requirements: full-text search on job title and description, filtering by location (within X miles), salary range, company, and job type (full-time, contract, remote). Autocomplete on job titles. Results ranked by relevance with recency boost (newer postings rank higher). New listings must be searchable within 30 seconds.

Staff-Caliber Answer Shape
Expand
  1. Architecture: PostgreSQL as source of truth for job listings. Kafka topic job-events for change propagation. Kafka consumer indexes into Elasticsearch. Read path: search queries go directly to Elasticsearch. Write path: job CRUD goes to PostgreSQL, events flow through Kafka to ES indexer.

  2. Mapping design: title as text with standard analyzer + title.autocomplete as text with edge_ngram tokenizer (min_gram=2, max_gram=15). description as text with standard analyzer. company as both text (search) and keyword (filter/aggregation). location as geo_point. salary_min/salary_max as integer. job_type as keyword. posted_at as date. dynamic: strict to prevent mapping explosion from unstructured job metadata.

  3. Shard sizing: 50M listings × ~2KB average = ~100GB. At 30GB per shard → 4 primary shards with 1 replica = 8 total shards. This handles 10M searches/day (~115 QPS) comfortably. Leave room to grow to 200M listings by the time a reindex is needed.

  4. Query design: bool query: must = multi_match on title^3 and description^1 for relevance. filter = geo_distance for location, range for salary, term for job_type. should = function_score with exp decay on posted_at (half-life = 14 days) to boost recent postings. Filters go in filter context (no scoring overhead), relevance goes in must/should.

  5. Autocomplete: Separate completion suggester field title_suggest of type completion, populated with the job title. Alternatively, the edge_ngram sub-field on title for more flexible matching. Completion suggester for pure prefix autocomplete, edge_ngram if typo tolerance is needed.

  6. Freshness SLA (30 seconds): Default 1-second refresh interval meets this easily. The bottleneck is the Kafka→ES indexing pipeline latency. Monitor consumer lag — alert if lag exceeds 15 seconds. For the first version, a simple Kafka consumer with bulk indexing (batch 500 documents, flush every 5 seconds) achieves <10 second end-to-end latency.


Staff Insight

This is one of 10 technology deep-dives. Full access includes all guides with Staff-level operational analysis and interview-calibrated breakdowns. Explore the full library