StaffSignal
Technology Guide

Time Series Databases

Specialized databases (InfluxDB, TimescaleDB, Prometheus) optimized for time-stamped data ingestion, retention policies, and downsampling at scale.

Design with Time Series Databases — Staff-Level Technology Guide

The 60-Second Pitch

Time series databases (TSDBs) are specialized storage engines optimized for timestamped data — metrics, IoT sensor readings, financial ticks, application logs, and infrastructure telemetry. They solve a specific problem that general-purpose databases handle poorly: ingesting millions of data points per second, each tagged with a timestamp, and querying them efficiently over time ranges with aggregation (average CPU over the last hour, 95th percentile latency per minute, total revenue per day). In system design interviews, TSDBs appear whenever you design monitoring systems, IoT platforms, analytics dashboards, or any workload where "show me the trend over time" is a core query.

The Staff-level insight: time series data has unique properties that general-purpose databases are not designed to exploit. Writes are append-only (you never update yesterday's CPU reading). Data has a natural expiry (you need per-second granularity for the last hour, per-minute for the last week, per-hour for the last year). Queries are almost always time-range scans with aggregation, not point lookups. TSDBs exploit these properties with columnar storage, aggressive compression (10-20x better than PostgreSQL for the same data), automatic downsampling, and retention policies that age out old data without manual intervention. The result: a TSDB stores 100x more data points per dollar of storage than PostgreSQL and queries them 100x faster for time-range aggregations.


Why Not Just Use PostgreSQL?

This is the question every interviewer will ask, and the Staff answer is concrete.

PostgreSQL with a timestamp column works for small-to-moderate time series workloads — up to a few million rows per day, a few months of retention, and simple queries. TimescaleDB (a PostgreSQL extension) pushes this further. But at scale, general-purpose databases hit fundamental bottlenecks:

DimensionPostgreSQLPurpose-Built TSDB
Write throughput~10K inserts/sec (single node, indexed)~1M+ points/sec
Compression~50-100 bytes/row (B-tree overhead)~2-4 bytes/point (delta + gorilla encoding)
Storage for 1B points~50-100 GB~2-4 GB
Time-range queryIndex scan + heap fetch (random I/O)Columnar scan (sequential I/O)
Retention policyManual DELETE + VACUUMAutomatic drop by time partition
DownsamplingManual ETL jobBuilt-in continuous aggregation
Cardinality managementNot applicableCore design concern

The breaking point is typically 10 million data points per day with 90+ days retention. Below that, PostgreSQL (or TimescaleDB) is the right answer — do not introduce a new database for a workload that PostgreSQL handles. Above that, the storage cost, query latency, and operational burden of PostgreSQL make a purpose-built TSDB the correct choice.


The Technology Landscape

Prometheus

Prometheus is the de facto standard for infrastructure and application monitoring in Kubernetes environments. It uses a pull-based model — Prometheus scrapes metrics from HTTP endpoints exposed by applications and infrastructure components every 15-60 seconds.

Architecture: Prometheus is a single-binary Go server with an embedded time series database (TSDB). It stores data on local disk using a custom block-based format. Each block covers a 2-hour time window, is immutable once compacted, and contains compressed time series data organized by metric name and label set.

Data model: Metrics are identified by a name and a set of key-value labels: http_request_duration_seconds{method="GET", handler="/api/users", status="200"}. Each unique combination of metric name + labels is a "time series." The number of unique time series is the cardinality — the single most important capacity planning dimension for Prometheus.

Query language (PromQL):

# Average request rate over the last 5 minutes, per handler
rate(http_request_duration_seconds_count[5m])

# 95th percentile latency by handler
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# Alert: error rate > 5%
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05

Strengths: Battle-tested in Kubernetes (every K8s deployment includes Prometheus), rich ecosystem (Grafana, Alertmanager, exporters for everything), PromQL is expressive for monitoring queries, pull-based model is simple and reliable.

Limitations: Single-node storage (no native clustering), local disk only (not durable across node failures), 15-day default retention (not designed for long-term storage), high cardinality destroys performance (>10M active series requires careful management). For long-term storage and horizontal scaling, use Thanos or Cortex/Mimir as a Prometheus-compatible remote storage backend.

Thanos vs Mimir (Cortex): Both provide long-term Prometheus-compatible storage, but the architecture differs. Thanos uses a sidecar that ships Prometheus blocks to object storage (S3) and a query layer that merges local Prometheus data with remote blocks. Mimir (Grafana's evolution of Cortex) is a horizontally scalable, multi-tenant time series backend that receives remote-write from Prometheus and handles storage, querying, and compaction independently. Thanos is simpler (adds components alongside existing Prometheus), Mimir is more scalable (fully distributed, multi-tenant). For most teams, Thanos is the right starting point; migrate to Mimir when you need multi-tenancy or exceed Thanos's query performance limits.

Pull vs Push — the architectural debate: Prometheus pulls metrics from targets. InfluxDB, Datadog, and OpenTelemetry push metrics to a collector. Pull is simpler (Prometheus controls the scrape rate, automatically detects failed targets via failed scrapes), but requires network reachability from Prometheus to every target. Push works across network boundaries (NAT, firewalls) and supports ephemeral targets (serverless functions, batch jobs). In Kubernetes, pull wins — every pod is reachable from Prometheus via service discovery. Outside Kubernetes, push is often more practical.

InfluxDB

InfluxDB is a purpose-built time series database designed for high write throughput and flexible querying. InfluxDB 3.x (latest) uses Apache Arrow and DataFusion for a columnar query engine with SQL support.

Architecture (InfluxDB 3.x): Data is ingested via line protocol (a simple text format: measurement,tag=value field=value timestamp), buffered in a write-ahead log (WAL), organized into Parquet files in object storage (S3), and queried via SQL or InfluxQL. The separation of storage (object store) and compute (query engine) enables elastic scaling.

Data model: InfluxDB organizes data into measurements (like tables), tags (indexed string key-value pairs for filtering), fields (non-indexed values — the actual metric data), and timestamps. The tag set determines which time series a point belongs to.

# Line protocol format
cpu,host=server01,region=us-east usage_idle=72.3,usage_system=14.2 1609459200000000000
cpu,host=server02,region=eu-west usage_idle=88.1,usage_system=6.5 1609459200000000000

Strengths: High write throughput (~1M points/sec per node), SQL support (InfluxDB 3.x), built-in retention policies and continuous queries for downsampling, cloud-native with object storage backend, rich ecosystem (Telegraf for collection, Grafana for visualization).

Limitations: Tag cardinality limits (high-cardinality tags like user_id degrade performance), schema-on-write model can lead to schema sprawl, InfluxDB 3.x is relatively new (less battle-tested than Prometheus for monitoring).

TimescaleDB

TimescaleDB is a PostgreSQL extension that adds time series capabilities to PostgreSQL. It automatically partitions data into time-based chunks (hypertables), adds compression, continuous aggregations, and retention policies — while retaining full SQL compatibility.

Architecture: TimescaleDB creates a "hypertable" — a virtual table that automatically partitions data into chunks by time (and optionally by a space dimension like device_id). Each chunk is a regular PostgreSQL table. Queries that filter by time range only scan the relevant chunks, not the entire table. Compressed chunks use columnar encoding that achieves 10-20x compression.

Strengths: Full SQL (JOINs, CTEs, subqueries, window functions), no new query language to learn, runs as a PostgreSQL extension (existing tools, backups, replication work), continuous aggregations for automatic downsampling, multi-node clustering (TimescaleDB 2.x+).

Limitations: PostgreSQL overhead for pure metrics workloads (slower than Prometheus or InfluxDB for simple metric queries), write throughput limited by PostgreSQL's row-based storage engine (before compression), multi-node is less mature than single-node.

When to choose TimescaleDB: Your time series data needs to JOIN with relational data (e.g., sensor readings + device metadata + customer records). Your team already knows PostgreSQL. Your workload is moderate scale (millions of points per day, not billions).

ClickHouse — The Analytics Crossover

ClickHouse is not a traditional TSDB, but it is increasingly used for time series analytics where cardinality is too high for Prometheus/InfluxDB. ClickHouse is a columnar OLAP database that excels at aggregating billions of rows with SQL. Unlike TSDBs that struggle with high-cardinality tags (user_id, request_id), ClickHouse handles them natively because it stores data in columnar format with per-column compression.

When ClickHouse beats traditional TSDBs: Log aggregation (billions of log lines with high-cardinality fields like trace_id), ad analytics (impressions/clicks per campaign per user per geo), and real-time dashboards over event-level data rather than pre-aggregated metrics. Uber, Cloudflare, and GitLab use ClickHouse for observability data that exceeds what Prometheus can handle.

Tradeoff: ClickHouse requires cluster management (ZooKeeper for coordination in older versions, ClickHouse Keeper in newer), has a steeper learning curve than Prometheus, and is overkill for simple infrastructure monitoring. Use ClickHouse when your time series problem looks more like an analytics problem.

Comparison Summary

DimensionPrometheusInfluxDBTimescaleDBClickHouse
ArchitectureSingle binary, pullDistributed, pushPG extensionDistributed columnar
Sweet spotK8s monitoring, alertingIoT telemetry, high ingestSQL analytics on time seriesHigh-cardinality analytics
Cardinality limit~10M active series~1M tag combosUnlimited (SQL)Unlimited (columnar)
Query languagePromQLSQL / InfluxQLSQL (PostgreSQL)SQL
Long-term storageThanos/Mimir (add-on)Native object storeNative disk/replicationNative distributed
Operational modelSimple (1 binary)Medium (cluster)Simple (PG extension)Complex (cluster)
Rendering diagram...

Core Concepts

Time Series Data Model

Every TSDB organizes data around the same fundamental concepts, regardless of the specific implementation:

Metric / Measurement: The thing being measured — cpu_usage, http_request_duration, temperature, order_count.

Tags / Labels: Key-value pairs that identify which instance of the metric this is — host=server-01, region=us-east, endpoint=/api/users. Tags are indexed and used for filtering and grouping in queries. The combination of metric name + all tag values forms a unique time series.

Fields / Values: The actual data points — usage=72.3, latency_ms=45. Fields are the numbers you aggregate (avg, sum, max, percentile). They are typically not indexed.

Timestamp: When the data point was recorded. Precision varies: nanoseconds for financial data, seconds for infrastructure metrics, minutes for business metrics. Most TSDBs store timestamps as Unix epoch integers (nanoseconds or milliseconds) for efficient delta compression.

Series vs. Events — the fundamental distinction: A time series is a sequence of (timestamp, value) pairs for a specific metric+tags combination. It is sampled at regular intervals (every 15s, every 1m). An event is a discrete occurrence with a timestamp and arbitrary fields (a log line, a trace span, a transaction record). TSDBs are optimized for series — regular, numeric, aggregatable. Event stores (Elasticsearch, ClickHouse) are optimized for events — irregular, text-heavy, searchable. Mixing the two in one system degrades both. In interviews, this distinction guides your storage choice: "metrics go to Prometheus, logs go to Elasticsearch."

Cardinality: The total number of unique time series (unique metric + tag combinations). This is the most critical capacity dimension. http_requests{method, handler, status} with 4 methods × 50 handlers × 5 status codes = 1,000 time series. Add user_id as a tag and you have 1,000 × 1,000,000 users = 1 billion time series. This is why user_id should never be a tag in a metrics system — it is a field or belongs in a separate database.

Write Path — Append-Only Ingestion

Rendering diagram...

The write path is optimized for sequential, append-only writes:

  1. Buffer incoming points in memory — sorted by time, indexed by metric + tags. A WAL (write-ahead log) provides durability before the buffer flushes.
  2. Flush to immutable blocks — periodically (every 1-5 minutes), the buffer is flushed to an immutable on-disk block. The block is compressed using time series-specific algorithms.
  3. Compact blocks — background compaction merges small blocks into larger ones, improving query performance and enabling more aggressive compression.
  4. Retention policy drops old blocks — when a block's time range falls outside the retention window, the entire block is deleted. No row-by-row deletion, no tombstones, no VACUUM — just file deletion.

Compression — The TSDB Superpower

TSDBs achieve 10-20x better compression than general-purpose databases because time series data has exploitable structure:

Timestamp compression (Delta-of-delta): Timestamps in a regularly sampled series (every 15 seconds) have nearly identical intervals. Instead of storing each 64-bit timestamp, store the delta between consecutive timestamps. If the delta is constant (15s, 15s, 15s), store the delta-of-delta (0, 0, 0) — which compresses to nearly zero bits per timestamp. Prometheus uses this encoding, achieving ~1.4 bytes per sample for the timestamp component.

Value compression (Gorilla / XOR encoding): Metric values (like CPU usage) change slowly — 72.3, 72.4, 72.3, 72.5. XOR encoding compares consecutive values at the bit level: if only a few bits change, only those bits are stored. For slowly changing values, this achieves ~1 byte per sample. For constant values (counters that do not change between scrapes), the cost approaches zero.

Combined effect: A single data point (timestamp + value) that would cost 16 bytes in PostgreSQL (8-byte timestamp + 8-byte double) costs ~2-4 bytes in a TSDB. For 1 billion data points, that is the difference between 16 GB and 3 GB.

Why this matters in interviews: When designing a monitoring system for 10,000 servers each emitting 100 metrics at 15-second intervals, the math is: 10,000 × 100 × 4 per minute × 60 × 24 = ~5.76 billion points per day. At 16 bytes each (PostgreSQL), that is 92 GB/day or 2.7 TB/month. At 3 bytes each (TSDB), that is 17 GB/day or 510 GB/month. The TSDB stores 5x more data for one-fifth the cost — and queries it faster because columnar compression enables sequential I/O instead of random index lookups. This back-of-envelope calculation demonstrates why TSDBs exist.

Downsampling & Retention

Raw data at 15-second intervals is valuable for the last hour (debugging a current issue) but wasteful for the last year (you do not need per-second CPU readings from 6 months ago). Downsampling reduces resolution over time:

Time RangeResolutionStorage Cost
Last 1 hourRaw (15-second intervals)Full
Last 24 hours1-minute averages~4x reduction
Last 30 days5-minute averages~20x reduction
Last 1 year1-hour averages~240x reduction
Beyond 1 yearDaily averages or deleted~5,760x reduction

Implementation approaches:

  • Continuous aggregations (TimescaleDB): Materialized views that automatically maintain downsampled data as new points arrive. Defined as SQL: CREATE MATERIALIZED VIEW cpu_hourly AS SELECT time_bucket('1 hour', time), host, avg(usage) FROM cpu GROUP BY 1, 2.
  • Recording rules (Prometheus): PromQL expressions evaluated periodically and stored as new time series: record: http_request_rate_5m; expr: rate(http_requests_total[5m]).
  • Continuous queries (InfluxDB): Queries that run on a schedule and write aggregated data to a new measurement with a longer retention policy.
  • Retention policies: Automatic deletion of data older than a configured threshold. In Prometheus, --storage.tsdb.retention.time=15d. In InfluxDB, retention policies per database. In TimescaleDB, SELECT add_retention_policy('metrics', drop_after => INTERVAL '30 days').

Why retention policies are free: Unlike PostgreSQL where DELETE + VACUUM is expensive (mark rows dead → reclaim space → rebuild indexes), TSDBs organize data by time range in immutable blocks. Dropping a retention window means deleting entire block files from disk — an O(1) filesystem operation regardless of how much data the block contains. This is why TSDBs can store billions of data points and age them out at zero marginal cost.


Scaling Time Series Workloads

Write Scaling

Time series writes are embarrassingly parallelizable — each time series is independent. Scaling writes means distributing time series across shards by metric name, tag hash, or time range. Unlike relational databases where writes may contend on indexes or locks, TSDB writes are append-only with no cross-series coordination — making horizontal write scaling straightforward.

Prometheus + Thanos/Mimir: Run multiple Prometheus instances, each scraping a subset of targets. Thanos or Mimir provides a unified query layer that fans out to all instances and merges results. Write throughput scales linearly with Prometheus instance count. Target sharding strategies: by namespace (prometheus-infra scrapes infrastructure, prometheus-app scrapes application metrics), by team, or by consistent hashing of target labels.

InfluxDB 3.x: Ingestion is distributed across multiple ingesters. Each ingester writes to object storage (S3/GCS). The query engine reads from object storage and merges results. Write throughput scales by adding ingesters. The separation of ingest and query tiers means you can scale each independently — more ingesters for write-heavy workloads, more query nodes for dashboard-heavy workloads.

TimescaleDB multi-node: Hypertable chunks are distributed across multiple PostgreSQL data nodes. Writes are routed to the correct data node based on the chunk's time and space dimensions. This is the most straightforward scaling model for teams already operating PostgreSQL — standard replication and connection pooling tools apply.

Buffering and batching: All TSDBs benefit from client-side write batching. Instead of sending one data point per network call, batch 1,000-10,000 points per request. Prometheus handles this naturally (scrapes pull hundreds of metrics per target). For push-based systems, use a local agent (Telegraf, OpenTelemetry Collector) that buffers and batches writes. The collector also provides retry logic, back-pressure handling, and protocol translation — never push directly from application code to the TSDB.

Read Scaling — The Query Challenge

Time series reads are inherently range-based — "give me all CPU readings for host=server-01 from the last 6 hours." The query engine must:

  1. Identify which blocks/chunks contain data in the time range.
  2. Filter by tags (host=server-01).
  3. Decompress and aggregate the matching data points.

For recent data (in memory or on fast storage), this is fast. For historical queries spanning months of data, the query must scan and decompress large amounts of data. This is where pre-computed downsampled data is critical — querying 1-hour rollups for a 1-year range scans ~8,760 points instead of ~2 billion raw points.

Read scaling approaches:

  • Caching: Cache frequent dashboard queries. Most dashboards refresh every 30-60 seconds with the same query.
  • Pre-aggregation: Continuous aggregations and recording rules pre-compute common queries.
  • Query sharding: Distribute query execution across multiple nodes (Thanos/Mimir query frontends split queries by time range and fan out).
  • Tiered storage: Hot data on SSD, warm data on HDD, cold data in object storage. Queries on hot data are fast; queries on cold data are slower but infrequent.

Alerting Architecture

In a production monitoring system, the TSDB is only half the story — alerting is the other half. The canonical architecture:

  1. Prometheus evaluates alerting rules continuously (every 15 seconds). An alerting rule defines a condition (rate(http_5xx_total[5m]) / rate(http_requests_total[5m]) > 0.05) and a duration (for: 5m — the condition must persist for 5 minutes before firing).
  2. Alertmanager receives firing alerts from Prometheus, deduplicates (same alert from multiple Prometheus instances), groups (combine related alerts into one notification), silences (suppress known issues during maintenance), and routes to the appropriate channel (PagerDuty for critical, Slack for warning).
  3. Alert design: The Staff approach uses SLO-based alerting — alert when the error budget is being consumed too quickly, not on raw thresholds. A "burn rate" alert says: "at the current error rate, we will exhaust our 99.9% monthly SLO budget in 2 hours — page someone." This is more actionable than "5xx rate > 1%" because it accounts for the business impact.
Rendering diagram...

Failure Modes & Recovery

1. Cardinality Explosion

Symptoms: Prometheus OOM kills, increasing memory usage correlated with new deployments, slow queries across all dashboards, ingestion falling behind.

Root cause: A label with unbounded or high cardinality — user_id, request_id, pod_name in a rapidly scaling environment, or a metric emitting one series per unique URL path (including path parameters like /users/12345). Each unique label combination creates a new time series that must be indexed in memory.

Fix: Audit label cardinality with count by (__name__)({__name__=~".+"}) in PromQL. Remove or relabel high-cardinality labels using metric_relabel_configs in Prometheus. Use keep and drop relabeling rules to filter at ingestion. Normalize URL paths before labeling (/users/:id instead of /users/12345). Set cardinality limits per metric.

2. Ingestion Lag & Data Loss

Symptoms: Gaps in metric graphs, alerting on stale data, Prometheus up metric showing target scrape failures, ingestion rate metric flatlines.

Root cause: The TSDB cannot keep up with the write rate. In Prometheus, this means the scrape interval is shorter than the scrape duration (scraping takes longer than the configured interval). In push-based systems (InfluxDB), this means the ingestion endpoint is overloaded or the WAL cannot flush fast enough.

Fix: Increase scrape interval (15s → 30s) for less critical metrics. Reduce the number of scraped metrics per target (filter with metric_relabel_configs). Add Prometheus instances and shard targets. For InfluxDB, scale the ingestion tier horizontally. Monitor the prometheus_target_scrape_pool_sync_failed and prometheus_tsdb_head_series metrics.

3. Query Timeout on Historical Data

Symptoms: Dashboard queries for historical ranges timeout or return partial data, Grafana shows "query exceeded maximum time limit."

Root cause: A query scanning months of raw data without downsampling. A PromQL query like rate(http_requests_total[30d]) must decompress and process billions of raw data points.

Fix: Use recording rules and continuous aggregations to pre-compute common queries at lower resolution. Configure query timeouts and maximum query range limits. Use Thanos/Mimir query frontend to split large queries into smaller time ranges and execute in parallel. Educate dashboard authors to use downsampled data sources for historical views.

4. Storage Growth Beyond Capacity

Symptoms: Disk usage alerts, Prometheus WAL corruption due to full disk, queries slowing as disk I/O saturates.

Root cause: Retention policy not configured, new high-cardinality metrics adding series without cleanup, or downsampling not implemented (storing raw data indefinitely).

Fix: Configure retention policies — 15-30 days for raw Prometheus data, longer retention in a remote storage backend (Thanos/Mimir with S3). Implement downsampling so long-term storage grows slowly. Monitor disk usage trend and alert at 70% capacity. Use object storage (S3) for the cold tier — effectively unlimited and cheap.

5. Metric Staleness & Phantom Alerting

Symptoms: Alerts firing for services that no longer exist, phantom time series in dashboards showing flat-lined or missing data, queries returning results for decommissioned hosts.

Root cause: When a target is removed (pod scaled down, service decommissioned), Prometheus marks its series as stale after 5 minutes. But the series still exists in storage for the retention period. Queries like up == 0 will match these stale series, triggering false alerts for services that were intentionally removed.

Fix: Use absent() or absent_over_time() for liveness checks instead of up == 0. Filter alerts by active labels (e.g., only alert for hosts in the current deployment inventory). Use Prometheus's staleness markers (5-minute lookback window) to automatically exclude stale series from instant queries. For long-term storage, configure Thanos/Mimir to stop querying series that have not received new data.

Rendering diagram...

When to Use vs. Alternatives

DimensionPrometheusInfluxDBTimescaleDBClickHouse
Best forK8s monitoring, alertingIoT, high-volume telemetrySQL-first time series, relational joinsAnalytics, log aggregation, wide tables
Data modelMetric + labelsMeasurement + tags + fieldsSQL tables (hypertables)Columnar SQL tables
Query languagePromQLSQL / InfluxQLSQL (PostgreSQL)SQL (with extensions)
Write throughput~500K samples/s (single node)~1M+ points/s~100K-500K rows/s~1M+ rows/s
CompressionGorilla (excellent)Columnar (excellent)Columnar (good)Columnar (excellent)
Long-term storageThanos/Mimir (remote)Object storage (native)PostgreSQL (disk/replication)Native (distributed)
Operational costLow (single binary)Medium (cluster)Low (PG extension)Medium-High (cluster)
EcosystemGrafana, Alertmanager, K8sTelegraf, GrafanaPostgreSQL ecosystemWide analytics ecosystem
When to avoidLong-term storage, high cardinalityNeed SQL JOINs>1B points/daySimple monitoring

The Three Pillars of Observability

Time series databases are one pillar of a broader observability stack. In interviews, knowing how the three pillars connect shows architectural maturity:

Metrics (TSDB — Prometheus): Aggregated numerical measurements over time. "What is the error rate?" Cheap to store (compressed, downsampled), fast to query, best for dashboards and alerting. Metrics tell you something is wrong.

Logs (Elasticsearch, Loki): Discrete text events with context. "What happened?" Expensive to store (full text), slower to query, best for debugging specific incidents. Logs tell you what went wrong.

Traces (Jaeger, Tempo, Zipkin): Request-scoped causality chains across services. "Where did it go wrong?" Each trace shows the full path of a request through microservices with timing. Traces tell you where it went wrong.

The Staff integration pattern: a metric alert fires ("error rate > 5% on /api/checkout"). The engineer clicks through from the Grafana dashboard to exemplar traces (Prometheus → Tempo) that show specific failing requests. The trace reveals the slow span (payment service, 8-second timeout). The engineer jumps to logs filtered by the trace ID (Tempo → Loki) and finds the root cause: a database connection pool exhaustion. Metrics → Traces → Logs is the debugging workflow. TSDBs are the entry point.

Decision framework for interviews:

  • "Design a monitoring system" → Prometheus + Grafana + Alertmanager. This is the standard stack, battle-tested in every K8s deployment.
  • "Design an IoT platform ingesting sensor data" → InfluxDB (or TimescaleDB if you need SQL JOINs with device metadata).
  • "Show metrics on a dashboard with historical trends" → Prometheus for recent data, Thanos/Mimir with S3 for long-term, downsampling at each tier.
  • "We already use PostgreSQL and need time series" → TimescaleDB extension. No new database to learn or operate.
  • "Design a real-time analytics platform for ad clicks" → ClickHouse or Druid. These are OLAP engines, not pure TSDBs, but they excel at high-cardinality analytics queries that TSDBs struggle with.

Interview Application — Staff-Level Plays

The RED and USE Methods

When designing what metrics to collect, two frameworks provide structure:

RED Method (for request-driven services):

  • Rate — requests per second (rate(http_requests_total[5m]))
  • Errors — failed requests per second (rate(http_requests_total{status=~"5.."}[5m]))
  • Duration — request latency distribution (histogram_quantile(0.95, ...))

USE Method (for infrastructure resources):

  • Utilization — percentage of resource capacity used (CPU %, memory %, disk %)
  • Saturation — queue depth, backlog size, waiting threads
  • Errors — error events per second (disk errors, network drops)

In interviews, say "every service exposes RED metrics — rate, errors, duration — and every infrastructure component exposes USE metrics. The TSDB ingests both, and alerting rules are defined against SLOs derived from these metrics."

Concrete Interview Scenarios

"Design a monitoring and alerting system for a microservices platform"

Staff answer: "Prometheus scraping each service's /metrics endpoint every 15 seconds. Each service exposes RED metrics (Rate, Errors, Duration) using a Prometheus client library. Prometheus evaluates alerting rules (error rate >5% for 5 minutes → page), sends alerts to Alertmanager, which deduplicates and routes to PagerDuty/Slack. Grafana dashboards query Prometheus for real-time views. For long-term storage (capacity planning, SLA reporting), Thanos ships Prometheus blocks to S3 with downsampling at 5-minute and 1-hour resolutions. Total retention: 48 hours raw in Prometheus, 1 year downsampled in S3."

"Design an IoT platform ingesting temperature readings from 100,000 sensors"

Staff answer: "Sensors push readings via MQTT to an ingestion service that writes to InfluxDB (or TimescaleDB if we need to JOIN sensor metadata). Tag dimensions: sensor_id, building_id, floor. NOT reading_id — that is a field. At 100K sensors × 1 reading/min = ~1,700 writes/sec — well within a single InfluxDB node. Store raw data for 30 days, 1-hour rollups for 2 years. Alert on readings outside thresholds (temperature >30°C for 10 minutes). Dashboard shows per-building heatmaps using pre-aggregated data."

L5 vs L6 Responses

ScenarioL5 AnswerL6/Staff Answer
"How do you store metrics?""Use a time series database""Prometheus for infrastructure metrics (pull-based, K8s-native, PromQL for alerting). Retention: 48h raw, 30d at 1-min resolution, 1 year at 1-hour resolution in Thanos/S3. Labels are bounded: service, endpoint, status_code, region. Never user_id or request_id as labels."
"How do you handle alerting?""Check if values exceed thresholds""Prometheus alerting rules with multi-window burn rate for SLO-based alerts. A 5xx error rate breaching the 30-day SLO budget at the current burn rate triggers a page. Simple threshold alerts for infrastructure (disk >80%, CPU >90% for 5 min). Alertmanager handles dedup, grouping, silencing, and routing."
"What about long-term trends?""Store data longer""Tiered storage with downsampling. Raw data is expensive to query and store. 1-hour rollups for capacity planning queries ('show me CPU trend for the last year') are 240x cheaper and 240x faster to query. The business question 'are we growing?' does not need 15-second granularity."
"How do you scale?""Add more nodes""Prometheus scales by sharding targets across multiple instances. Thanos or Mimir provides a global query view across all instances. Write scaling is linear — each Prometheus instance handles ~500K samples/sec. Read scaling uses query frontend caching and split-merge for large time range queries."

OpenTelemetry — The Convergence Play

OpenTelemetry (OTel) is an open standard for metrics, logs, and traces collection that is rapidly replacing proprietary agents. The OTel Collector receives telemetry from applications (via OTel SDKs), transforms it, and exports to multiple backends — Prometheus for metrics, Jaeger for traces, Elasticsearch for logs — from a single integration point.

In interviews, mentioning OpenTelemetry shows awareness of the ecosystem's direction: "We instrument services with OpenTelemetry SDKs, which emit metrics, traces, and logs in a vendor-neutral format. The OTel Collector routes metrics to Prometheus, traces to Tempo, and logs to Loki. If we ever need to switch backends (Prometheus to Mimir, Tempo to Jaeger), we change the Collector config — not the application instrumentation." This vendor-neutral approach is the Staff-level infrastructure play.

The Staff TSDB Checklist

When proposing time series storage in an interview:

  1. Size the workload: "100K metrics × 15-second scrape = ~6,700 points/sec — well within Prometheus single-node capacity."
  2. Choose the technology: "Prometheus for monitoring, InfluxDB/TimescaleDB for IoT, ClickHouse for high-cardinality analytics."
  3. Define the label/tag strategy: "Labels are bounded dimensions: service, endpoint, method, status. Unbounded identifiers are fields or go to a separate store."
  4. Specify retention + downsampling: "48h raw, 30d at 1-min, 1y at 1-hour. Tiered storage: local SSD → S3."
  5. Address alerting: "PromQL alerting rules with Alertmanager. Multi-window burn rate for SLO-based alerts."
  6. Name the complement: "Prometheus for metrics, Elasticsearch for logs, Jaeger for traces — the three pillars of observability."

Which Playbooks Use Time Series Databases

PlaybookHow TSDBs Are UsedKey Pattern
Observability & MonitoringPrimary metrics storage and queryingPrometheus + Grafana for dashboards, PromQL alerting rules
Capacity PlanningHistorical trend analysis and forecastingLong-term metric retention with downsampling for trend projection
Leaderboard & CountingTime-windowed aggregation for scoringSliding window rollups for time-bounded ranking calculations

Operational Concerns

Capacity Planning Formula

Use this back-of-envelope calculation in interviews:

Data points per day = num_series × (86,400 / scrape_interval)
Storage per day     = data_points × bytes_per_point

Example: 50,000 series × 15s interval
= 50,000 × 5,760 points/day
= 288M points/day
× 3 bytes/point (compressed)
= ~864 MB/day raw
= ~25 GB/month

This fits comfortably on a single Prometheus node (recommend <10M active series, <500GB local storage). When the calculation exceeds single-node limits, add Prometheus instances and shard by target or metric namespace.

Monitoring the Monitor

Your TSDB needs its own monitoring — a meta-monitoring layer:

MetricHealthyAlert
Ingestion rate (samples/sec)StableDrop >20% (scrape failures)
Active time series countStable growthSudden spike (cardinality bomb)
Query duration p99<5s>15s
WAL sizeStableGrowing (compaction blocked)
Storage utilization<70%>85%
Scrape duration p95< scrape_interval> scrape_interval (falling behind)

Quick Reference Card

Data model:     Metric name + tags/labels (indexed) + fields/values + timestamp
Cardinality:    Number of unique metric+tag combinations — THE capacity dimension
Compression:    Delta-of-delta (timestamps) + Gorilla/XOR (values) = 2-4 bytes/point
Write path:     Buffer → WAL → Immutable block → Compaction → Retention drop
Downsampling:   Raw (48h) → 1-min rollups (30d) → 1-hour rollups (1y)
Tiered storage: Hot (SSD) → Warm (HDD) → Cold (S3/GCS)

Prometheus:     Pull-based, single binary, PromQL, K8s native, 15d local retention
InfluxDB:       Push-based, line protocol, SQL, high write throughput, cloud-native
TimescaleDB:    PostgreSQL extension, full SQL, JOINs, hypertables, compression
ClickHouse:     Columnar OLAP, SQL, high cardinality OK, analytics-first

Anti-patterns:  High-cardinality labels (user_id, request_id), no retention policy,
                querying raw data for historical ranges, no downsampling