StaffSignal
Foundation — Study Guide19 min read

API Design Patterns

REST at the boundary, gRPC internally, GraphQL only when clients genuinely vary. Cursor pagination (not offset), idempotency keys as a requirement not a feature, and how Stripe's API versioning maintains backward compatibility across 10+ years of changes.

Why This Matters

Every system design interview involves designing at least one API. The interviewer draws a box labeled "API Gateway" or "Service" and asks: "What does the API look like?" The answer to this question reveals whether you think about APIs as technical endpoints or as organizational contracts.

Most candidates default to REST because it's familiar and propose CRUD endpoints without thinking about the consumers, the versioning strategy, the pagination model, or what happens when a client retries a failed POST. Staff-level interviewers probe these blind spots because they map directly to production reliability: a missing idempotency key on a payment endpoint is a double-charge bug waiting to happen. A poorly chosen pagination strategy breaks under concurrent writes. A versioning decision made in month one becomes a maintenance burden for years.

The Staff-level framing is: API style is an organizational boundary decision, not a technology preference. REST at the external boundary for discoverability. gRPC internally for type safety and performance. GraphQL for product teams with diverse frontend needs. Each has a cost — and the candidate who can articulate those costs, not just name the protocols, is the one who earns the Staff signal.

The 60-Second Version

  • REST is resource-oriented, stateless, and built on HTTP verbs. Best for external/public APIs where broad tooling support and discoverability matter.
  • gRPC uses protobuf (binary), supports streaming, and enforces strong typing. Best for internal service-to-service where performance and schema contracts matter. Not browser-friendly without a proxy.
  • GraphQL exposes a single endpoint with client-driven queries. Best when diverse frontends need different slices of the same data. Dangerous when it devolves into an uncontrolled query engine.
  • API style is an organizational boundary decision. External = REST. Internal = gRPC. Heterogeneous frontends = GraphQL. Don't force one style everywhere.
  • Idempotency is a requirement, not a nice-to-have. POST is not idempotent. Client-generated idempotency keys are the only reliable way to prevent duplicate mutations.
  • Pagination choice has correctness implications. Offset-based skips rows on insert. Cursor-based can't jump to page N. Keyset scales for large datasets.
  • Every versioning strategy is a deprecation commitment. URL path versioning is simple but forces migration. Field-level deprecation is cleaner but harder to discover.

How API Design Works in System Design Interviews

The Three API Styles

In a system design interview, you'll choose between three styles — and the right choice depends on who consumes the API, not on which protocol is "fastest."

REST (Representational State Transfer)

REST maps resources to URLs and operations to HTTP verbs. It is the lingua franca of external APIs.

# REST API for a marketplace
GET    /v1/products              → List products (paginated)
GET    /v1/products/{id}         → Get product by ID
POST   /v1/products              → Create product
PUT    /v1/products/{id}         → Replace product
PATCH  /v1/products/{id}         → Partial update
DELETE /v1/products/{id}         → Delete product

# Response: JSON, human-readable, broadly supported
{
  "id": "prod_abc123",
  "title": "Widget",
  "price_cents": 1999,
  "created_at": "2024-01-15T10:30:00Z"
}

Why REST works for external APIs:

  • Every developer knows HTTP verbs and JSON. No SDK needed.
  • Cacheable by default — GET responses can be cached at every layer (CDN, proxy, browser).
  • Observable — any HTTP debugging tool can inspect requests.
  • Well-understood status code semantics (200 OK, 201 Created, 404 Not Found, 429 Too Many Requests).

Where REST breaks down:

  • Over-fetching (you get all fields even if you need two) and under-fetching (you need multiple requests for related data).
  • No schema enforcement — the server and client can disagree about the response shape, and you'll only find out at runtime.
  • No streaming support in standard REST. Long-running operations need polling or webhooks.

gRPC (Google Remote Procedure Call)

gRPC uses Protocol Buffers (protobuf) for binary serialization and HTTP/2 for transport. It enforces schemas and supports streaming.

# Protobuf definition
service ProductService {
  rpc GetProduct(GetProductRequest) returns (Product);
  rpc ListProducts(ListProductsRequest) returns (stream Product);
  rpc CreateProduct(CreateProductRequest) returns (Product);
}

message Product {
  string id = 1;
  string title = 2;
  int64 price_cents = 3;
  google.protobuf.Timestamp created_at = 4;
}

Why gRPC works for internal services:

  • Binary encoding: ~10x smaller payloads and ~10x faster serialization than JSON.
  • Schema enforcement: both sides agree on the contract at compile time. Breaking changes are caught before deployment.
  • Streaming: server streaming, client streaming, and bidirectional streaming are first-class.
  • Code generation: client libraries are auto-generated from .proto files — no manual SDK maintenance.

Where gRPC breaks down:

  • Not browser-friendly without gRPC-Web proxy (browsers can't make raw HTTP/2 calls).
  • Not human-readable — protobuf is binary, so debugging requires tooling.
  • Proxy and load balancer compatibility is narrower than REST/HTTP/1.1.

GraphQL

GraphQL lets the client specify exactly which fields it needs in a single request.

# Client query
query {
  product(id: "prod_abc123") {
    title
    price_cents
    seller {
      name
      rating
    }
    reviews(first: 5) {
      text
      score
    }
  }
}

Why GraphQL works for diverse frontends:

  • Mobile app needs 3 fields; web app needs 15; admin panel needs 30. One endpoint, different queries.
  • Eliminates over-fetching and under-fetching — the client gets exactly what it asked for.
  • Self-documenting — the schema is introspectable.

Where GraphQL breaks down:

  • Shifts query complexity to the server. Without depth limits, a single client can generate a query that joins 10 tables.
  • Caching is harder — every query is a POST with a unique body, so HTTP caches can't help.
  • N+1 resolution: fetching products → seller for 100 products naively generates 100 seller queries. DataLoader pattern mitigates this but adds complexity.
  • Operational cost: query cost analysis, rate limiting per query complexity, and abuse prevention are all harder than with REST.

The Decision Framework

BoundaryStyleWhy
External / public APIRESTDiscoverability, tooling, caching, broad compatibility
Internal service-to-servicegRPCType safety, binary efficiency, streaming
Mobile / web BFF layerGraphQLDifferent clients need different data shapes
Partner integrationsRESTPartners expect REST — don't make them learn gRPC
Real-time streaminggRPC streaming or WebSocketPersistent connection with typed messages
Cross-team internalgRPC with .proto as contractShared schema definition prevents integration bugs

Staff rule: Use different API styles at different boundaries. The most common mistake is forcing one style everywhere.

Pagination

Pagination is a correctness problem, not just a UX concern. The wrong pagination strategy breaks under concurrent writes, and interviewers test this specifically.

Strategy Comparison

StrategyHow It WorksProsCons
Offset/Limit?offset=20&limit=10Simple, jump to page NSkips/duplicates on concurrent inserts; O(N) on large offsets
Cursor-based?after=eyJ0...Stable under inserts; efficientCan't jump to page N; opaque tokens
Keyset?created_after=2024-01&id_gt=500Efficient at any depth; composableRequires sortable unique column
Page tokenServer-issued opaque tokenServer controls implementationToken expiry; slightly stateful

Why Offset Pagination Breaks

# Table state at time T:
Row 1, Row 2, Row 3, Row 4, Row 5

# Page 1: offset=0, limit=2 → [Row 1, Row 2]

# New row inserted before Row 1:
Row NEW, Row 1, Row 2, Row 3, Row 4, Row 5

# Page 2: offset=2, limit=2 → [Row 2, Row 3]
# Row 2 appears on BOTH pages!

For any dataset with concurrent writes (which is most production systems), offset pagination is incorrect. This is a specific, testable claim that separates Staff from senior answers.

Default recommendation: Cursor-based for feeds and timelines. Keyset for sorted data tables. Offset only for small, static datasets where correctness under concurrent writes doesn't matter.

Pagination Implementation

# Cursor-based pagination pseudocode
list_products(cursor, limit=20):
    if cursor:
        decoded = decode_cursor(cursor)  # {created_at, id}
        rows = db.query(
            "SELECT * FROM products
             WHERE (created_at, id) > (?, ?)
             ORDER BY created_at, id
             LIMIT ?",
            decoded.created_at, decoded.id, limit + 1
        )
    else:
        rows = db.query(
            "SELECT * FROM products
             ORDER BY created_at, id
             LIMIT ?",
            limit + 1
        )

    has_next = len(rows) > limit
    if has_next:
        rows = rows[:limit]

    next_cursor = encode_cursor(rows[-1]) if has_next else null

    return {
        "items": rows,
        "next_cursor": next_cursor,
        "has_next": has_next
    }

Idempotency

Every mutating API endpoint must have an idempotency strategy. Without one, every network retry is a potential double-write — a double charge, a duplicate order, a duplicated message.

The Problem

# Client sends a payment request
POST /v1/payments
{ "amount": 5000, "currency": "usd", "customer": "cust_123" }

# Server processes the payment → 200 OK
# But the response is lost due to network failure
# Client retries the exact same request
# Server processes it again → second charge!

The Solution: Idempotency Keys

The client generates a unique key (UUID) for each logical operation and sends it with the request. The server stores the key and its result. On retry, the server returns the stored result instead of re-processing.

# Idempotent payment endpoint pseudocode
create_payment(idempotency_key, amount, currency, customer):
    # Check if this key was already processed
    existing = idempotency_store.get(idempotency_key)
    if existing:
        return existing.response    # return cached result

    # Process the payment
    result = payment_processor.charge(amount, currency, customer)

    # Store the result with the idempotency key
    idempotency_store.set(
        idempotency_key,
        response=result,
        ttl=48_hours    # keys expire after 48h
    )

    return result

Key design decisions:

  • Client generates the key — the server cannot reliably deduplicate without client cooperation
  • TTL of 24–48 hours — covers retry windows without unbounded storage growth
  • Store the response, not just the key — retries must return the same response, including any generated IDs
  • Atomic check-and-process — use a database lock or compare-and-swap to prevent race conditions between concurrent retries

Versioning

Every API versioning strategy is a deprecation commitment. Choose based on your consumers and your willingness to maintain multiple versions.

StrategyHowProsCons
URL path/v1/products, /v2/productsSimple, visible, proxy-friendlyEach version is a full API surface to maintain
HeaderAccept: application/vnd.api.v2+jsonClean URLs, content negotiationHarder to discover, harder to test
Query parameter?version=2Easy to add to existing APIsPollutes query string
Field-level (GraphQL)@deprecated(reason: "Use newField")Granular, no full version neededRequires GraphQL
Protobuf (gRPC)Add new fields (backward-compatible)Zero-downtime evolutionRemoving/renaming fields is breaking

Staff rule: URL path versioning for external REST APIs (simple, discoverable). Protobuf backward compatibility for internal gRPC (add fields freely, never remove). Field-level deprecation for GraphQL. Don't over-engineer: most APIs need at most 2 concurrent versions.

The Numbers That Matter

MetricValueDesign Implication
REST payload (JSON)2–10x larger than protobufMatters at >10K RPS; negligible below
Protobuf serialization~10x faster than JSONAt 100K+ RPS, JSON serialization is measurable CPU cost
Default page size20–50 itemsLarger pages waste bandwidth; smaller cause more round-trips
Hard page cap100–200 itemsPrevents clients from requesting 10,000 items
Idempotency key TTL24–48 hoursCovers retry windows without unbounded storage
API gateway timeout5–10 secondsExternal requests; internal: 500ms–1s
Rate limit (default)100–1,000 req/s per clientPer-client, not global; return 429 with Retry-After
Cursor token expiry24 hoursPrevents stale reads; clients must re-query if token expires
gRPC max message size4 MB defaultStreaming for larger payloads

Visual Guide

API Style Decision Tree

Rendering diagram...

How This Shows Up in Interviews

Scenario 1: "Design the API for this system"

Do not start listing endpoints. Say: "First — who consumes this API? External partners and third-party developers → REST, because they expect discoverability and standard HTTP tooling. Internal service-to-service → gRPC, because we need schema enforcement and binary efficiency at 50K RPS. Both → REST at the boundary, gRPC internally, with a gateway translating between them." Then: "For the REST layer: resource-oriented endpoints, cursor-based pagination on all list endpoints, Idempotency-Key required on every POST, and URL-path versioning (/v1/) with a commitment to 12 months of support per version."

Scenario 2: "Design the API for a marketplace" (Full Walkthrough)

This tests whether you can choose different API styles at different boundaries and handle the operational details. Here's how a Staff engineer works through it:

Step 1 — Identify the consumers. "We have three consumer types: a web app, an iOS app, and third-party seller integrations. The web and iOS apps are first-party — we control both ends. Third-party integrations are external — we don't control the client."

Step 2 — Choose styles by boundary. "REST for the public API that third-party sellers use. It's discoverable, well-documented, and every developer knows how to call a REST endpoint. For internal service-to-service (product catalog → pricing → inventory), gRPC — binary protobuf for efficiency, schema enforcement to catch integration bugs at compile time, and streaming for real-time inventory updates."

Step 3 — Design the external REST API. "Resource-oriented:

  • GET /v1/products — cursor-based pagination, filterable by category, price range
  • POST /v1/products — requires Idempotency-Key header, returns 201 with product ID
  • GET /v1/orders — keyset pagination by created_at, scoped to authenticated seller
  • POST /v1/orders/{id}/fulfill — idempotent action endpoint with idempotency key

Versioning: URL path /v1/ for the public API. We commit to supporting v1 for 18 months after v2 launches."

Step 4 — Handle the BFF question. "The web app needs 15 fields per product listing. The iOS app needs 5. Rather than building per-client REST endpoints, we add a thin GraphQL BFF (Backend for Frontend) layer that sits between first-party clients and our REST/gRPC backend. The BFF composes data from multiple services into exactly the shape each client needs. Third-party integrations don't touch GraphQL — they use REST."

Step 5 — Address idempotency and pagination. "Every write endpoint requires client-generated idempotency keys stored with a 48-hour TTL. Product listings use cursor-based pagination because sellers add products concurrently — offset pagination would cause duplicates and skips. Order history uses keyset pagination on (created_at, order_id) for efficient deep pagination."

Why this is a Staff answer: Three consumer types → three API strategies, not one. Idempotency is a requirement, not an afterthought. Pagination choice is justified by the data's write pattern. The candidate chose GraphQL specifically for the BFF use case, not as a universal solution.

Scenario 3: "A client is getting duplicate charges"

This is an idempotency question. The answer: the payment endpoint accepts POST without an idempotency key, so retries create duplicate transactions. Fix: require Idempotency-Key header, store processed keys with results for 48 hours, return cached result on retry. Test edge cases: concurrent retries (use database-level locking), key reuse with different parameters (reject — same key must always mean the same operation).

Scenario 4: "Users on page 2 see duplicate items from page 1"

This is a pagination correctness question. The answer: offset pagination breaks under concurrent inserts. Switch to cursor-based pagination where the cursor encodes the last item's sort position, not a numeric offset. This guarantees stable pagination even as new items are inserted.

In the Wild

Stripe: The Gold Standard for REST API Design

Stripe's API is widely considered the industry benchmark for REST API design. Key design decisions: versioning by date (Stripe-Version: 2024-01-01) rather than numeric /v1/, /v2/ — each version date locks in the exact behavior at that time. Idempotency keys are required on all POST requests and supported on all mutating endpoints. Expanding nested resources — instead of N+1 requests, clients send ?expand[]=customer&expand[]=subscription to inline related objects in a single response.

The Staff-level insight: Stripe's date-based versioning means they can make backward-compatible changes continuously without bumping a version number. Breaking changes get a new date version, and clients opt in by changing one header. This is more granular than URL path versioning — each client migrates at their own pace, and Stripe can track exactly which clients are on which version for deprecation planning.

Slack: Evolving a REST API at Scale

Slack's Web API started as simple REST endpoints and grew to 200+ methods. Their key lessons (publicly documented): conversation-scoped tokens replaced legacy global tokens — each token specifies which API methods and which channels it can access. Rate limiting per method tier — different endpoints have different rate limits based on computational cost, not a flat per-client limit. conversations.history (expensive, hits the database) has a lower limit than chat.postMessage (cheap, append-only).

The Staff-level insight: Slack's per-method rate limiting is a design principle most candidates miss. A flat "1000 req/s per client" treats a cheap read and an expensive aggregation query identically. Slack's tiered approach (Tier 1: 1 req/s, Tier 2: 20 req/s, Tier 3: 50 req/s, Tier 4: 100 req/s) reflects the actual server-side cost of each method. This is the Staff-level framing of rate limiting: cost-proportional, not uniform.

Google: gRPC and API Design Guides

Google publishes an API Design Guide that codifies their patterns across all Google Cloud APIs. Key principles: resource-oriented design (APIs model resources, not RPC actions), standard methods (List, Get, Create, Update, Delete mapped to HTTP verbs), and long-running operations (operations that take >10 seconds return an Operation resource that clients poll for completion rather than holding a connection open).

The Staff-level insight: Google's long-running operation pattern solves the timeout problem that trips up many candidates. When an interviewer asks "what if this takes 30 seconds?", the Staff answer is: return a 202 Accepted with an operation ID, provide a GET /operations/{id} endpoint for polling, and optionally push a webhook on completion. This is not a workaround — it's the standard pattern for asynchronous API operations used by AWS, GCP, and Azure.


Staff Calibration

The sections below are calibration tools for Staff-level interviews. If you already understand API design, start here to sharpen the framing that separates L5 from L6 answers.

What Staff Engineers Say (That Seniors Don't)

ConceptSenior ResponseStaff Response
Protocol choice"gRPC is faster than REST""REST at the org boundary for discoverability, gRPC internally for contract enforcement and performance — different boundaries, different protocols"
Versioning"We use /v2/ for the new API""Every version is a support surface. We version at the field level when possible and only cut a new path version for breaking contract changes"
Pagination"We use limit/offset""Offset pagination is incorrect for any dataset with concurrent writes. Cursor-based for feeds, keyset for sorted tables — the choice depends on the write pattern"
Idempotency"We deduplicate on the server""Clients generate idempotency keys. The server stores them with a TTL. Without this, every retry is a potential double-write"
GraphQL adoption"GraphQL lets clients get what they need""GraphQL shifts query complexity to the server. Without depth limits and query cost analysis, a single client can bring down your data layer"
Error handling"Return a 500 on errors""Structured error responses with machine-readable codes, human-readable messages, and correlation IDs. The client should be able to programmatically handle every error type."

Common Interview Traps

  • Choosing gRPC "because it's faster" without discussing tradeoffs. It's faster and you lose browser support, human-readable debugging, and broad proxy compatibility.
  • Ignoring idempotency on write endpoints. If you design a POST endpoint without an idempotency strategy, you've designed a bug.
  • Treating pagination as a UX concern. It's a correctness and scalability concern. Offset pagination breaks under concurrent writes.
  • Defaulting to GraphQL without discussing operational cost. Unbounded query depth, N+1 resolution, and cache invalidation complexity are real production risks.
  • Exposing internal IDs in external APIs. Auto-increment IDs leak information (total count, creation order). Use UUIDs or hashids externally.
  • Missing error contract. Returning bare HTTP status codes without structured error bodies makes client error handling impossible. Define: { "code": "INVENTORY_EXHAUSTED", "message": "...", "details": {...} }.
  • No rate limiting on write endpoints. Every mutating endpoint needs per-client rate limiting. Without it, one misbehaving client overwhelms your write path.
  • Tight coupling via shared types. Services sharing protobuf definitions is fine; services importing each other's domain models is a coupling trap.

Practice Drill

Staff-Caliber Answer Shape
Expand
  1. REST for the external API. Third-party integrations expect REST — it's discoverable, well-documented, and proxy-friendly. Resource-oriented: GET /products, POST /products, GET /products/{id}.
  2. GraphQL as an optional BFF layer for first-party clients. Web and iOS need different data shapes. A GraphQL layer lets each client request exactly what it needs.
  3. gRPC for internal services. Product catalog, search, and pricing communicate via gRPC for type safety and performance.
  4. Versioning: URL path for public REST (/v1/products). Field-level deprecation for GraphQL (@deprecated). Protobuf backward compatibility for gRPC.
  5. Idempotency: POST /orders requires Idempotency-Key header. Server stores keys for 48 hours. Retries return the cached response.

The Staff move: Use different API styles at different boundaries. Don't force one style everywhere.

Where This Appears

These playbooks apply API design concepts to complete system design problems with full Staff-level walkthroughs, evaluator-grade rubrics, and practice drills.

  • API Gateway — Gateway as the single entry point for external traffic: rate limiting, authentication, protocol translation (REST → gRPC), and request routing to internal services
  • Service Discovery — How services find each other's API endpoints: DNS-based vs registry-based discovery, health checking, and the contract between service producers and consumers
  • Feed Generation — Feed API design: cursor-based pagination for infinite scroll, fan-out-on-write vs fan-out-on-read as an API latency tradeoff, and real-time update delivery via streaming APIs
  • Search & Indexing — Search API design: query DSLs, faceted search response shapes, typeahead/autocomplete endpoints, and how pagination works differently for search results than for CRUD lists

Related Technologies: API Gateway

This is one of 9 foundation guides. The full library also includes deep-dive system design playbooks with evaluator-grade breakdowns, practice drills, and failure-mode analysis. Explore the full library