Why This Matters
Every system design interview involves designing at least one API. The interviewer draws a box labeled "API Gateway" or "Service" and asks: "What does the API look like?" The answer to this question reveals whether you think about APIs as technical endpoints or as organizational contracts.
Most candidates default to REST because it's familiar and propose CRUD endpoints without thinking about the consumers, the versioning strategy, the pagination model, or what happens when a client retries a failed POST. Staff-level interviewers probe these blind spots because they map directly to production reliability: a missing idempotency key on a payment endpoint is a double-charge bug waiting to happen. A poorly chosen pagination strategy breaks under concurrent writes. A versioning decision made in month one becomes a maintenance burden for years.
The Staff-level framing is: API style is an organizational boundary decision, not a technology preference. REST at the external boundary for discoverability. gRPC internally for type safety and performance. GraphQL for product teams with diverse frontend needs. Each has a cost — and the candidate who can articulate those costs, not just name the protocols, is the one who earns the Staff signal.
The 60-Second Version
- REST is resource-oriented, stateless, and built on HTTP verbs. Best for external/public APIs where broad tooling support and discoverability matter.
- gRPC uses protobuf (binary), supports streaming, and enforces strong typing. Best for internal service-to-service where performance and schema contracts matter. Not browser-friendly without a proxy.
- GraphQL exposes a single endpoint with client-driven queries. Best when diverse frontends need different slices of the same data. Dangerous when it devolves into an uncontrolled query engine.
- API style is an organizational boundary decision. External = REST. Internal = gRPC. Heterogeneous frontends = GraphQL. Don't force one style everywhere.
- Idempotency is a requirement, not a nice-to-have. POST is not idempotent. Client-generated idempotency keys are the only reliable way to prevent duplicate mutations.
- Pagination choice has correctness implications. Offset-based skips rows on insert. Cursor-based can't jump to page N. Keyset scales for large datasets.
- Every versioning strategy is a deprecation commitment. URL path versioning is simple but forces migration. Field-level deprecation is cleaner but harder to discover.
How API Design Works in System Design Interviews
The Three API Styles
In a system design interview, you'll choose between three styles — and the right choice depends on who consumes the API, not on which protocol is "fastest."
REST (Representational State Transfer)
REST maps resources to URLs and operations to HTTP verbs. It is the lingua franca of external APIs.
# REST API for a marketplace
GET /v1/products → List products (paginated)
GET /v1/products/{id} → Get product by ID
POST /v1/products → Create product
PUT /v1/products/{id} → Replace product
PATCH /v1/products/{id} → Partial update
DELETE /v1/products/{id} → Delete product
# Response: JSON, human-readable, broadly supported
{
"id": "prod_abc123",
"title": "Widget",
"price_cents": 1999,
"created_at": "2024-01-15T10:30:00Z"
}
Why REST works for external APIs:
- Every developer knows HTTP verbs and JSON. No SDK needed.
- Cacheable by default — GET responses can be cached at every layer (CDN, proxy, browser).
- Observable — any HTTP debugging tool can inspect requests.
- Well-understood status code semantics (200 OK, 201 Created, 404 Not Found, 429 Too Many Requests).
Where REST breaks down:
- Over-fetching (you get all fields even if you need two) and under-fetching (you need multiple requests for related data).
- No schema enforcement — the server and client can disagree about the response shape, and you'll only find out at runtime.
- No streaming support in standard REST. Long-running operations need polling or webhooks.
gRPC (Google Remote Procedure Call)
gRPC uses Protocol Buffers (protobuf) for binary serialization and HTTP/2 for transport. It enforces schemas and supports streaming.
# Protobuf definition
service ProductService {
rpc GetProduct(GetProductRequest) returns (Product);
rpc ListProducts(ListProductsRequest) returns (stream Product);
rpc CreateProduct(CreateProductRequest) returns (Product);
}
message Product {
string id = 1;
string title = 2;
int64 price_cents = 3;
google.protobuf.Timestamp created_at = 4;
}
Why gRPC works for internal services:
- Binary encoding: ~10x smaller payloads and ~10x faster serialization than JSON.
- Schema enforcement: both sides agree on the contract at compile time. Breaking changes are caught before deployment.
- Streaming: server streaming, client streaming, and bidirectional streaming are first-class.
- Code generation: client libraries are auto-generated from
.protofiles — no manual SDK maintenance.
Where gRPC breaks down:
- Not browser-friendly without gRPC-Web proxy (browsers can't make raw HTTP/2 calls).
- Not human-readable — protobuf is binary, so debugging requires tooling.
- Proxy and load balancer compatibility is narrower than REST/HTTP/1.1.
GraphQL
GraphQL lets the client specify exactly which fields it needs in a single request.
# Client query
query {
product(id: "prod_abc123") {
title
price_cents
seller {
name
rating
}
reviews(first: 5) {
text
score
}
}
}
Why GraphQL works for diverse frontends:
- Mobile app needs 3 fields; web app needs 15; admin panel needs 30. One endpoint, different queries.
- Eliminates over-fetching and under-fetching — the client gets exactly what it asked for.
- Self-documenting — the schema is introspectable.
Where GraphQL breaks down:
- Shifts query complexity to the server. Without depth limits, a single client can generate a query that joins 10 tables.
- Caching is harder — every query is a POST with a unique body, so HTTP caches can't help.
- N+1 resolution: fetching
products → sellerfor 100 products naively generates 100 seller queries. DataLoader pattern mitigates this but adds complexity. - Operational cost: query cost analysis, rate limiting per query complexity, and abuse prevention are all harder than with REST.
The Decision Framework
| Boundary | Style | Why |
|---|---|---|
| External / public API | REST | Discoverability, tooling, caching, broad compatibility |
| Internal service-to-service | gRPC | Type safety, binary efficiency, streaming |
| Mobile / web BFF layer | GraphQL | Different clients need different data shapes |
| Partner integrations | REST | Partners expect REST — don't make them learn gRPC |
| Real-time streaming | gRPC streaming or WebSocket | Persistent connection with typed messages |
| Cross-team internal | gRPC with .proto as contract | Shared schema definition prevents integration bugs |
Staff rule: Use different API styles at different boundaries. The most common mistake is forcing one style everywhere.
Pagination
Pagination is a correctness problem, not just a UX concern. The wrong pagination strategy breaks under concurrent writes, and interviewers test this specifically.
Strategy Comparison
| Strategy | How It Works | Pros | Cons |
|---|---|---|---|
| Offset/Limit | ?offset=20&limit=10 | Simple, jump to page N | Skips/duplicates on concurrent inserts; O(N) on large offsets |
| Cursor-based | ?after=eyJ0... | Stable under inserts; efficient | Can't jump to page N; opaque tokens |
| Keyset | ?created_after=2024-01&id_gt=500 | Efficient at any depth; composable | Requires sortable unique column |
| Page token | Server-issued opaque token | Server controls implementation | Token expiry; slightly stateful |
Why Offset Pagination Breaks
# Table state at time T:
Row 1, Row 2, Row 3, Row 4, Row 5
# Page 1: offset=0, limit=2 → [Row 1, Row 2]
# New row inserted before Row 1:
Row NEW, Row 1, Row 2, Row 3, Row 4, Row 5
# Page 2: offset=2, limit=2 → [Row 2, Row 3]
# Row 2 appears on BOTH pages!
For any dataset with concurrent writes (which is most production systems), offset pagination is incorrect. This is a specific, testable claim that separates Staff from senior answers.
Default recommendation: Cursor-based for feeds and timelines. Keyset for sorted data tables. Offset only for small, static datasets where correctness under concurrent writes doesn't matter.
Pagination Implementation
# Cursor-based pagination pseudocode
list_products(cursor, limit=20):
if cursor:
decoded = decode_cursor(cursor) # {created_at, id}
rows = db.query(
"SELECT * FROM products
WHERE (created_at, id) > (?, ?)
ORDER BY created_at, id
LIMIT ?",
decoded.created_at, decoded.id, limit + 1
)
else:
rows = db.query(
"SELECT * FROM products
ORDER BY created_at, id
LIMIT ?",
limit + 1
)
has_next = len(rows) > limit
if has_next:
rows = rows[:limit]
next_cursor = encode_cursor(rows[-1]) if has_next else null
return {
"items": rows,
"next_cursor": next_cursor,
"has_next": has_next
}
Idempotency
Every mutating API endpoint must have an idempotency strategy. Without one, every network retry is a potential double-write — a double charge, a duplicate order, a duplicated message.
The Problem
# Client sends a payment request
POST /v1/payments
{ "amount": 5000, "currency": "usd", "customer": "cust_123" }
# Server processes the payment → 200 OK
# But the response is lost due to network failure
# Client retries the exact same request
# Server processes it again → second charge!
The Solution: Idempotency Keys
The client generates a unique key (UUID) for each logical operation and sends it with the request. The server stores the key and its result. On retry, the server returns the stored result instead of re-processing.
# Idempotent payment endpoint pseudocode
create_payment(idempotency_key, amount, currency, customer):
# Check if this key was already processed
existing = idempotency_store.get(idempotency_key)
if existing:
return existing.response # return cached result
# Process the payment
result = payment_processor.charge(amount, currency, customer)
# Store the result with the idempotency key
idempotency_store.set(
idempotency_key,
response=result,
ttl=48_hours # keys expire after 48h
)
return result
Key design decisions:
- Client generates the key — the server cannot reliably deduplicate without client cooperation
- TTL of 24–48 hours — covers retry windows without unbounded storage growth
- Store the response, not just the key — retries must return the same response, including any generated IDs
- Atomic check-and-process — use a database lock or compare-and-swap to prevent race conditions between concurrent retries
Versioning
Every API versioning strategy is a deprecation commitment. Choose based on your consumers and your willingness to maintain multiple versions.
| Strategy | How | Pros | Cons |
|---|---|---|---|
| URL path | /v1/products, /v2/products | Simple, visible, proxy-friendly | Each version is a full API surface to maintain |
| Header | Accept: application/vnd.api.v2+json | Clean URLs, content negotiation | Harder to discover, harder to test |
| Query parameter | ?version=2 | Easy to add to existing APIs | Pollutes query string |
| Field-level (GraphQL) | @deprecated(reason: "Use newField") | Granular, no full version needed | Requires GraphQL |
| Protobuf (gRPC) | Add new fields (backward-compatible) | Zero-downtime evolution | Removing/renaming fields is breaking |
Staff rule: URL path versioning for external REST APIs (simple, discoverable). Protobuf backward compatibility for internal gRPC (add fields freely, never remove). Field-level deprecation for GraphQL. Don't over-engineer: most APIs need at most 2 concurrent versions.
The Numbers That Matter
| Metric | Value | Design Implication |
|---|---|---|
| REST payload (JSON) | 2–10x larger than protobuf | Matters at >10K RPS; negligible below |
| Protobuf serialization | ~10x faster than JSON | At 100K+ RPS, JSON serialization is measurable CPU cost |
| Default page size | 20–50 items | Larger pages waste bandwidth; smaller cause more round-trips |
| Hard page cap | 100–200 items | Prevents clients from requesting 10,000 items |
| Idempotency key TTL | 24–48 hours | Covers retry windows without unbounded storage |
| API gateway timeout | 5–10 seconds | External requests; internal: 500ms–1s |
| Rate limit (default) | 100–1,000 req/s per client | Per-client, not global; return 429 with Retry-After |
| Cursor token expiry | 24 hours | Prevents stale reads; clients must re-query if token expires |
| gRPC max message size | 4 MB default | Streaming for larger payloads |
Visual Guide
API Style Decision Tree
How This Shows Up in Interviews
Scenario 1: "Design the API for this system"
Do not start listing endpoints. Say: "First — who consumes this API? External partners and third-party developers → REST, because they expect discoverability and standard HTTP tooling. Internal service-to-service → gRPC, because we need schema enforcement and binary efficiency at 50K RPS. Both → REST at the boundary, gRPC internally, with a gateway translating between them." Then: "For the REST layer: resource-oriented endpoints, cursor-based pagination on all list endpoints, Idempotency-Key required on every POST, and URL-path versioning (/v1/) with a commitment to 12 months of support per version."
Scenario 2: "Design the API for a marketplace" (Full Walkthrough)
This tests whether you can choose different API styles at different boundaries and handle the operational details. Here's how a Staff engineer works through it:
Step 1 — Identify the consumers. "We have three consumer types: a web app, an iOS app, and third-party seller integrations. The web and iOS apps are first-party — we control both ends. Third-party integrations are external — we don't control the client."
Step 2 — Choose styles by boundary. "REST for the public API that third-party sellers use. It's discoverable, well-documented, and every developer knows how to call a REST endpoint. For internal service-to-service (product catalog → pricing → inventory), gRPC — binary protobuf for efficiency, schema enforcement to catch integration bugs at compile time, and streaming for real-time inventory updates."
Step 3 — Design the external REST API. "Resource-oriented:
GET /v1/products— cursor-based pagination, filterable by category, price rangePOST /v1/products— requiresIdempotency-Keyheader, returns 201 with product IDGET /v1/orders— keyset pagination bycreated_at, scoped to authenticated sellerPOST /v1/orders/{id}/fulfill— idempotent action endpoint with idempotency key
Versioning: URL path /v1/ for the public API. We commit to supporting v1 for 18 months after v2 launches."
Step 4 — Handle the BFF question. "The web app needs 15 fields per product listing. The iOS app needs 5. Rather than building per-client REST endpoints, we add a thin GraphQL BFF (Backend for Frontend) layer that sits between first-party clients and our REST/gRPC backend. The BFF composes data from multiple services into exactly the shape each client needs. Third-party integrations don't touch GraphQL — they use REST."
Step 5 — Address idempotency and pagination. "Every write endpoint requires client-generated idempotency keys stored with a 48-hour TTL. Product listings use cursor-based pagination because sellers add products concurrently — offset pagination would cause duplicates and skips. Order history uses keyset pagination on (created_at, order_id) for efficient deep pagination."
Why this is a Staff answer: Three consumer types → three API strategies, not one. Idempotency is a requirement, not an afterthought. Pagination choice is justified by the data's write pattern. The candidate chose GraphQL specifically for the BFF use case, not as a universal solution.
Scenario 3: "A client is getting duplicate charges"
This is an idempotency question. The answer: the payment endpoint accepts POST without an idempotency key, so retries create duplicate transactions. Fix: require Idempotency-Key header, store processed keys with results for 48 hours, return cached result on retry. Test edge cases: concurrent retries (use database-level locking), key reuse with different parameters (reject — same key must always mean the same operation).
Scenario 4: "Users on page 2 see duplicate items from page 1"
This is a pagination correctness question. The answer: offset pagination breaks under concurrent inserts. Switch to cursor-based pagination where the cursor encodes the last item's sort position, not a numeric offset. This guarantees stable pagination even as new items are inserted.
In the Wild
Stripe: The Gold Standard for REST API Design
Stripe's API is widely considered the industry benchmark for REST API design. Key design decisions: versioning by date (Stripe-Version: 2024-01-01) rather than numeric /v1/, /v2/ — each version date locks in the exact behavior at that time. Idempotency keys are required on all POST requests and supported on all mutating endpoints. Expanding nested resources — instead of N+1 requests, clients send ?expand[]=customer&expand[]=subscription to inline related objects in a single response.
The Staff-level insight: Stripe's date-based versioning means they can make backward-compatible changes continuously without bumping a version number. Breaking changes get a new date version, and clients opt in by changing one header. This is more granular than URL path versioning — each client migrates at their own pace, and Stripe can track exactly which clients are on which version for deprecation planning.
Slack: Evolving a REST API at Scale
Slack's Web API started as simple REST endpoints and grew to 200+ methods. Their key lessons (publicly documented): conversation-scoped tokens replaced legacy global tokens — each token specifies which API methods and which channels it can access. Rate limiting per method tier — different endpoints have different rate limits based on computational cost, not a flat per-client limit. conversations.history (expensive, hits the database) has a lower limit than chat.postMessage (cheap, append-only).
The Staff-level insight: Slack's per-method rate limiting is a design principle most candidates miss. A flat "1000 req/s per client" treats a cheap read and an expensive aggregation query identically. Slack's tiered approach (Tier 1: 1 req/s, Tier 2: 20 req/s, Tier 3: 50 req/s, Tier 4: 100 req/s) reflects the actual server-side cost of each method. This is the Staff-level framing of rate limiting: cost-proportional, not uniform.
Google: gRPC and API Design Guides
Google publishes an API Design Guide that codifies their patterns across all Google Cloud APIs. Key principles: resource-oriented design (APIs model resources, not RPC actions), standard methods (List, Get, Create, Update, Delete mapped to HTTP verbs), and long-running operations (operations that take >10 seconds return an Operation resource that clients poll for completion rather than holding a connection open).
The Staff-level insight: Google's long-running operation pattern solves the timeout problem that trips up many candidates. When an interviewer asks "what if this takes 30 seconds?", the Staff answer is: return a 202 Accepted with an operation ID, provide a GET /operations/{id} endpoint for polling, and optionally push a webhook on completion. This is not a workaround — it's the standard pattern for asynchronous API operations used by AWS, GCP, and Azure.
Staff Calibration
The sections below are calibration tools for Staff-level interviews. If you already understand API design, start here to sharpen the framing that separates L5 from L6 answers.
What Staff Engineers Say (That Seniors Don't)
| Concept | Senior Response | Staff Response |
|---|---|---|
| Protocol choice | "gRPC is faster than REST" | "REST at the org boundary for discoverability, gRPC internally for contract enforcement and performance — different boundaries, different protocols" |
| Versioning | "We use /v2/ for the new API" | "Every version is a support surface. We version at the field level when possible and only cut a new path version for breaking contract changes" |
| Pagination | "We use limit/offset" | "Offset pagination is incorrect for any dataset with concurrent writes. Cursor-based for feeds, keyset for sorted tables — the choice depends on the write pattern" |
| Idempotency | "We deduplicate on the server" | "Clients generate idempotency keys. The server stores them with a TTL. Without this, every retry is a potential double-write" |
| GraphQL adoption | "GraphQL lets clients get what they need" | "GraphQL shifts query complexity to the server. Without depth limits and query cost analysis, a single client can bring down your data layer" |
| Error handling | "Return a 500 on errors" | "Structured error responses with machine-readable codes, human-readable messages, and correlation IDs. The client should be able to programmatically handle every error type." |
Common Interview Traps
- Choosing gRPC "because it's faster" without discussing tradeoffs. It's faster and you lose browser support, human-readable debugging, and broad proxy compatibility.
- Ignoring idempotency on write endpoints. If you design a POST endpoint without an idempotency strategy, you've designed a bug.
- Treating pagination as a UX concern. It's a correctness and scalability concern. Offset pagination breaks under concurrent writes.
- Defaulting to GraphQL without discussing operational cost. Unbounded query depth, N+1 resolution, and cache invalidation complexity are real production risks.
- Exposing internal IDs in external APIs. Auto-increment IDs leak information (total count, creation order). Use UUIDs or hashids externally.
- Missing error contract. Returning bare HTTP status codes without structured error bodies makes client error handling impossible. Define:
{ "code": "INVENTORY_EXHAUSTED", "message": "...", "details": {...} }. - No rate limiting on write endpoints. Every mutating endpoint needs per-client rate limiting. Without it, one misbehaving client overwhelms your write path.
- Tight coupling via shared types. Services sharing protobuf definitions is fine; services importing each other's domain models is a coupling trap.
Practice Drill
Staff-Caliber Answer ShapeExpand
- REST for the external API. Third-party integrations expect REST — it's discoverable, well-documented, and proxy-friendly. Resource-oriented:
GET /products,POST /products,GET /products/{id}. - GraphQL as an optional BFF layer for first-party clients. Web and iOS need different data shapes. A GraphQL layer lets each client request exactly what it needs.
- gRPC for internal services. Product catalog, search, and pricing communicate via gRPC for type safety and performance.
- Versioning: URL path for public REST (
/v1/products). Field-level deprecation for GraphQL (@deprecated). Protobuf backward compatibility for gRPC. - Idempotency:
POST /ordersrequiresIdempotency-Keyheader. Server stores keys for 48 hours. Retries return the cached response.
The Staff move: Use different API styles at different boundaries. Don't force one style everywhere.
Where This Appears
These playbooks apply API design concepts to complete system design problems with full Staff-level walkthroughs, evaluator-grade rubrics, and practice drills.
- API Gateway — Gateway as the single entry point for external traffic: rate limiting, authentication, protocol translation (REST → gRPC), and request routing to internal services
- Service Discovery — How services find each other's API endpoints: DNS-based vs registry-based discovery, health checking, and the contract between service producers and consumers
- Feed Generation — Feed API design: cursor-based pagination for infinite scroll, fan-out-on-write vs fan-out-on-read as an API latency tradeoff, and real-time update delivery via streaming APIs
- Search & Indexing — Search API design: query DSLs, faceted search response shapes, typeahead/autocomplete endpoints, and how pagination works differently for search results than for CRUD lists
Related Technologies: API Gateway