Design with API Gateways — Staff-Level Technology Guide
The 60-Second Pitch
An API gateway is a reverse proxy that sits between clients and backend services, handling cross-cutting concerns: routing, authentication, rate limiting, request/response transformation, observability, and protocol translation. Every system design interview that involves an API layer implicitly involves an API gateway — even if you never say the words "API gateway." When you say "the client hits our API, we authenticate the request, rate limit by user, and route to the appropriate microservice," you are describing an API gateway.
The Staff-level insight: the API gateway is not a product — it is an architectural pattern. Kong, Envoy, NGINX, and AWS API Gateway are implementations. The interview value is not in knowing which product to choose — it is in understanding what cross-cutting concerns live at the edge versus in the services, and why centralizing them at the gateway reduces duplication, improves consistency, and creates a single enforcement point for security and traffic management. The Staff move is to say: "We put auth, rate limiting, and request logging at the gateway so that every service gets these guarantees without implementing them individually."
What an API Gateway Does
The Cross-Cutting Concerns Problem
In a microservices architecture, every service needs authentication, rate limiting, logging, metrics, TLS termination, and request validation. Without a gateway, each service implements these independently — duplicating code, introducing inconsistencies, and creating security gaps when one team forgets to validate JWT tokens or enforce rate limits.
The API gateway solves this by centralizing cross-cutting concerns at the edge:
Core Responsibilities
Routing: Map incoming request paths to backend services. /api/users/* → User Service, /api/orders/* → Order Service. In interviews, this is the first thing you state: "the gateway routes based on path prefix to the appropriate service."
Authentication & Authorization: Validate JWT tokens, OAuth2 access tokens, or API keys at the edge. Reject unauthorized requests before they reach backend services. The gateway extracts the authenticated identity (user ID, scopes, roles) and passes it to services via headers — services trust the gateway and skip their own auth validation.
Rate Limiting: Enforce request quotas per user, per API key, per IP, or per endpoint. "Each user gets 100 requests per minute. Partners get 1,000. Unauthenticated requests get 10." The gateway maintains rate limit counters (in-memory, Redis, or a distributed store) and returns 429 Too Many Requests when limits are exceeded.
Request/Response Transformation: Modify requests before forwarding (add headers, rewrite paths, strip sensitive data) and modify responses before returning (add CORS headers, transform XML to JSON, redact internal error details). This is the "adapter" function — the gateway presents a clean external API while backend services use internal conventions.
Protocol Translation: Accept HTTP/REST from clients and forward as gRPC to internal services. Accept WebSocket connections and bridge to message queues. Accept GraphQL queries and fan out to multiple REST services. The gateway bridges protocol boundaries between external and internal worlds.
Observability: Emit request logs, latency metrics, and distributed traces for every request. Because all traffic flows through the gateway, it is the single best place to measure API performance, error rates, and traffic patterns. Gateway metrics feed dashboards and alerting — "p99 latency on /api/search exceeded 500ms" is a gateway-level alert.
TLS Termination: Handle HTTPS at the edge. Backend services communicate over plain HTTP (or mutual TLS) within the private network. This simplifies certificate management — one set of certificates at the gateway instead of one per service.
The Technology Landscape
Kong
Kong is an API gateway built on NGINX and OpenResty (LuaJIT). It extends NGINX with a plugin system that adds authentication, rate limiting, logging, and transformation capabilities. Kong stores its configuration in PostgreSQL or Cassandra (or declaratively in a YAML file for DB-less mode).
Architecture: Kong runs as a set of NGINX worker processes with Lua plugins. Each request flows through a pipeline of configured plugins: pre-auth → auth → rate-limit → request-transform → proxy → response-transform → logging. Plugins are written in Lua (or Go/Python/JavaScript via plugin server protocol).
Strengths:
- Rich plugin ecosystem (100+ official and community plugins)
- Declarative configuration via
kong.yaml(GitOps-friendly) - DB-less mode eliminates the database dependency for simpler deployments
- Enterprise features: developer portal, analytics, RBAC, OIDC
Weaknesses:
- Lua plugin development is a niche skill
- PostgreSQL/Cassandra dependency adds operational overhead in DB-backed mode
- Configuration propagation across nodes can have brief inconsistency windows
- Less suited for service-to-service (east-west) traffic than Envoy
When to choose Kong: You need a full-featured API gateway with a rich plugin ecosystem for north-south traffic (client → service). Your team prefers declarative YAML configuration. You want enterprise API management features (developer portals, API analytics, usage plans).
Envoy
Envoy is a high-performance C++ proxy designed for cloud-native architectures. Originally built at Lyft, it is now the data plane for most service meshes (Istio, AWS App Mesh). Envoy handles both north-south (edge) and east-west (service-to-service) traffic.
Architecture: Envoy runs as a sidecar alongside each service instance or as a standalone edge proxy. Configuration is managed dynamically via the xDS (discovery service) API — a set of gRPC APIs that push configuration changes to Envoy without restarts. This makes Envoy ideal for environments where routes and backends change frequently (Kubernetes pod scaling, blue-green deployments).
Key features:
- L7 protocol support — HTTP/1.1, HTTP/2, gRPC, WebSocket, MongoDB, Redis, Thrift
- Advanced load balancing — round-robin, least connections, ring hash, Maglev, locality-aware
- Circuit breaking — per-upstream connection limits, request limits, retry budgets
- Observability built-in — automatic metrics (Prometheus format), distributed tracing (Zipkin, Jaeger, OpenTelemetry), access logging
- Hot restart — binary upgrades without dropping connections
Strengths:
- Highest raw performance (C++, asynchronous event-driven)
- xDS API enables dynamic configuration from a control plane (Istio, custom)
- Native gRPC and HTTP/2 support
- Designed for both edge and service mesh
Weaknesses:
- Configuration is complex (YAML or xDS API, not human-friendly)
- No built-in plugin marketplace — extensibility via Lua filters or WASM
- Requires a control plane (Istio, custom) for management at scale
- Steeper learning curve than Kong or NGINX
When to choose Envoy: You are building a service mesh (Istio). You need a high-performance proxy for both edge and service-to-service traffic. Your services use gRPC or HTTP/2 heavily. You want dynamic configuration via API without proxy restarts.
NGINX
NGINX is the battle-tested reverse proxy and load balancer that powers a significant portion of the internet. In the API gateway context, NGINX serves as a high-performance router and TLS terminator, extended with Lua scripting (via OpenResty) for dynamic routing and custom logic.
Architecture: NGINX uses an event-driven, non-blocking architecture with a master process and multiple worker processes. Each worker handles thousands of concurrent connections using epoll/kqueue. Configuration is file-based (nginx.conf) and applied via reload (nginx -s reload), which gracefully drains old workers and starts new ones.
Strengths:
- Proven at massive scale (millions of concurrent connections)
- Simple, well-understood configuration language
- Extremely efficient for static content serving and reverse proxying
- Mature ecosystem, extensive documentation, large community
- NGINX Plus (commercial) adds dynamic reconfiguration, health checks, and dashboard
Weaknesses:
- Static configuration (requires reload for changes) — no native dynamic discovery
- Limited built-in API management features (no auth plugins, no rate limiting out of the box)
- Lua/OpenResty extensions require specialized knowledge
- Less cloud-native than Envoy (no xDS, no native service mesh integration)
When to choose NGINX: You need a simple, proven reverse proxy for routing and TLS termination. Your routing rules change infrequently. You want the highest raw performance with the simplest operational model. Your team already has NGINX expertise.
AWS API Gateway
AWS API Gateway is a fully managed service that handles API creation, publishing, monitoring, and security. It integrates natively with Lambda, IAM, Cognito, WAF, and CloudWatch.
Two flavors:
- REST API — full-featured: request validation, request/response transformation, API keys, usage plans, caching, WAF integration. Higher latency (~15-30ms overhead) and cost.
- HTTP API — lightweight: routing, JWT auth, CORS. Lower latency (~5-10ms overhead) and ~70% cheaper. Use this for most new workloads.
Strengths:
- Zero infrastructure to manage — no servers, no scaling, no patching
- Native Lambda integration (synchronous invoke, no networking to configure)
- Usage plans and API keys for partner/developer management
- Built-in throttling, caching (REST API), and WAF integration
Weaknesses:
- Vendor lock-in (AWS only, no portable configuration)
- Limited customization compared to self-hosted gateways
- 29-second timeout for Lambda integration (hard limit on REST API)
- Cold start latency for Lambda-backed endpoints
- Pricing at high volume can exceed self-hosted alternatives
When to choose AWS API Gateway: You are building serverless-first on AWS. Your backends are Lambda functions. You want zero operational overhead. Your traffic volume makes the per-request pricing acceptable.
Head-to-Head Comparison
| Dimension | Kong | Envoy | NGINX | AWS API Gateway |
|---|---|---|---|---|
| Language | Lua (on NGINX/OpenResty) | C++ | C | Managed service |
| Config model | Declarative YAML / Admin API | xDS API (dynamic) | File-based (static) | Console / CloudFormation |
| Plugin system | Lua, Go, Python, JS | Lua filters, WASM | Lua (OpenResty) | Lambda authorizers |
| Service mesh | Not designed for it | Native (Istio data plane) | Not designed for it | N/A |
| Protocol support | HTTP, gRPC, WebSocket | HTTP/1.1, HTTP/2, gRPC, TCP, UDP | HTTP, TCP, UDP | HTTP, WebSocket |
| Rate limiting | Built-in plugin | External (via filter) | Custom Lua | Built-in |
| Auth | JWT, OAuth2, OIDC plugins | External auth filter | Custom Lua | IAM, Cognito, JWT |
| Observability | Prometheus, Datadog plugins | Native Prometheus, tracing | Access logs, stub_status | CloudWatch native |
| Performance | High (NGINX core) | Highest (C++) | Highest | Medium (managed) |
| Operational cost | Medium (manage nodes + DB) | Medium-High (needs control plane) | Low (simple config) | Zero |
| Best for | API management, north-south | Service mesh, east-west + edge | Simple reverse proxy | Serverless on AWS |
Gateway Patterns
Edge Gateway vs Internal Gateway
Edge gateway (north-south): Sits at the boundary between external clients and your internal network. Handles: external authentication, public rate limiting, TLS termination, WAF (Web Application Firewall), CORS, API versioning, and request validation. This is the "front door" — every external request passes through it.
Internal gateway (east-west): Handles service-to-service communication within your network. Concerns are different: service discovery, load balancing, retries with backoff, circuit breaking, mutual TLS (mTLS), and request tracing. In a service mesh architecture (Istio/Linkerd), the east-west proxy is a sidecar (Envoy) deployed alongside each service instance — not a centralized gateway.
The Staff distinction: The edge gateway is about security and traffic management (keeping bad requests out). The internal gateway/mesh is about reliability and observability (keeping services connected). They solve different problems and often use different technologies. In interviews, say: "We have an edge gateway for external traffic — auth, rate limiting, TLS — and a service mesh for internal traffic — retries, circuit breaking, mTLS."
Sidecar Proxy vs Centralized Gateway
Centralized gateway: All service-to-service traffic routes through a shared gateway cluster. Simpler to deploy and manage (one fleet of proxy nodes), but the gateway becomes a bottleneck and a single point of failure for internal traffic. Adds an extra network hop to every service call.
Sidecar proxy (service mesh): Each service instance has its own proxy (Envoy sidecar) that handles outbound and inbound traffic. No centralized bottleneck — the proxy fleet scales with the service fleet. Observability is per-service by default. The tradeoff: more proxies to manage (though Istio/Linkerd automate this), higher resource overhead (each sidecar consumes CPU and memory), and operational complexity of the mesh control plane.
When to use a service mesh: 20+ microservices with complex service-to-service communication patterns, need for mTLS everywhere, per-service traffic policies (canary deployments, circuit breaking), and the team to operate it. For <10 services, a centralized gateway with client-side retry logic is simpler and sufficient.
The service mesh tax: A sidecar proxy per pod means: 50-100MB of memory per pod, 1-5ms added latency per hop (two sidecars: egress + ingress), a control plane (Istio, ~3 pods) that must be operated and upgraded, and debugging complexity (is the error in the sidecar or the service?). Teams frequently underestimate this cost. The Staff insight is to acknowledge the mesh's value — mTLS, traffic shifting, per-service rate limiting — while honestly assessing whether your organization can operate it. The worst outcome is a half-configured service mesh that adds latency and debugging friction without delivering its promised benefits.
Backend for Frontend (BFF)
The BFF pattern uses a separate gateway (or gateway configuration) for each client type: web, mobile, and partner API. Each BFF aggregates and transforms backend responses for its specific client's needs.
Why BFF matters: A mobile app needs a single API call that returns a compact response. A web app needs richer data with pagination. A partner API needs a stable, versioned interface. One-size-fits-all API design leads to over-fetching (mobile gets data it does not use) or under-fetching (web needs multiple calls). BFFs tailor the API to the client.
Implementation: Three separate gateway routes (or three separate gateway instances) — /mobile/v1/*, /web/v1/*, /partner/v1/* — each with different aggregation logic, response shaping, and rate limits. In practice, BFFs are often thin Node.js or Go services behind the edge gateway rather than gateway-level plugins — the aggregation logic is too complex for gateway configuration.
BFF vs GraphQL: GraphQL solves a similar problem (clients request exactly the data they need) without requiring per-client backends. The tradeoff: GraphQL adds query parsing complexity and potential performance pitfalls (unbounded queries, N+1 problems) at the API layer, while BFF keeps the backend API simple (REST) and puts the aggregation logic in a purpose-built service. For small teams, GraphQL is simpler (one API, flexible queries). For large organizations with dedicated platform teams, BFFs give more control over performance and deployment independence per client.
The Gateway Request Pipeline
Understanding the middleware chain is critical for reasoning about latency, failure modes, and where to place responsibilities.
Total gateway overhead: For a well-configured gateway, the added latency is 3-15ms per request (TLS + auth + rate limit + routing). This is acceptable for external APIs (where network latency from the client already dominates). For internal service-to-service calls at sub-millisecond latency requirements, keep the gateway pipeline minimal (routing + observability only, no auth — use mTLS instead).
Pipeline ordering matters: Auth before rate limiting prevents unauthenticated requests from consuming rate limit capacity. WAF before auth prevents malformed requests from reaching the auth layer. Rate limiting before proxying protects backend services from overload. Getting the order wrong creates security holes or performance problems.
The authentication-rate-limit ordering debate: There is a nuance here. If you rate limit before auth, unauthenticated flood traffic consumes your rate limit budget but protects the (expensive) auth layer. If you auth before rate limit, you burn auth resources on flood traffic but rate limit accurately by authenticated identity. The Staff answer depends on context: for API-key-based auth (fast, local lookup), auth first is fine. For OAuth token introspection (network call to IdP), rate limit by IP first to protect the auth layer, then auth, then rate limit by user identity.
Response Caching at the Gateway
For read-heavy endpoints with tolerant staleness requirements, the gateway can cache responses and serve them without hitting the backend. Product catalog pages, search results, and public content are ideal candidates. Gateway caching uses the HTTP cache semantics (Cache-Control, ETag, Vary) or custom TTL configuration per route.
When gateway caching wins: The endpoint is read-heavy (>100:1 read-to-write ratio), the response can tolerate staleness (30-60 seconds), and the cache hit rate is >70%. A 60-second cache on a popular search query can reduce backend load by 95%. The tradeoff is stale data — updates are not visible until the cache expires or is invalidated.
When NOT to cache at the gateway: User-specific responses (personalized content differs per user, making cache keys too granular), write-heavy endpoints, or responses where staleness causes correctness issues (account balances, inventory counts).
Scaling
Horizontal Scaling of Gateway Nodes
API gateways scale horizontally — add more gateway nodes behind a Layer 4 load balancer (NLB in AWS, MetalLB in Kubernetes). Each gateway node is stateless (configuration is loaded at startup or pushed via API), so adding nodes is straightforward. The L4 load balancer distributes TCP connections across gateway nodes; each gateway node handles its share of requests independently.
Capacity planning heuristic: A single NGINX/Kong/Envoy node on a 4-core machine handles 10,000-50,000 requests per second for typical HTTP API workloads (small request/response bodies, sub-100ms backend latency). At 100,000 RPS, deploy 3-5 gateway nodes for redundancy and headroom. At 1M RPS, deploy 20-30 nodes.
What degrades gateway throughput: TLS termination is CPU-intensive (especially with RSA-2048 — use ECDSA P-256 for 2-5x better TLS performance). Complex Lua/WASM plugins add per-request overhead. Large request/response bodies consume memory and increase GC pressure. WebSocket connections hold resources for the duration of the connection, reducing the effective RPS capacity. When capacity planning, benchmark with your actual plugin chain and payload sizes — not raw proxy benchmarks that skip auth and rate limiting.
Connection Pooling & Keep-Alive
The gateway maintains persistent connections to backend services (connection pooling) and to clients (HTTP keep-alive). Without connection pooling, every request incurs TCP handshake + TLS negotiation overhead (~10-30ms). With connection pooling, subsequent requests reuse existing connections — only the first request pays the setup cost.
Key settings:
- Upstream keepalive — maintain N idle connections per upstream service. Set high enough to avoid connection churn under load, low enough to not exhaust file descriptors. Typical: 64-256 per upstream.
- Client keepalive timeout — how long to keep idle client connections open. Too short = frequent reconnections. Too long = wasted file descriptors. Typical: 60-120 seconds.
- Max connections per upstream — circuit breaker for connection exhaustion. If a backend service is slow, the gateway limits how many connections it opens rather than overwhelming the service.
Health Checks
The gateway must know which backend instances are healthy. Two approaches:
Passive health checks: The gateway monitors responses from backend instances. If an instance returns 5xx errors or times out N times consecutively, it is marked unhealthy and removed from the load balancing pool. No additional traffic is generated, but detection is slower (must wait for real requests to fail).
Active health checks: The gateway periodically sends probe requests (HTTP GET /health) to every backend instance. Unhealthy instances are detected proactively, before real traffic is affected. The cost is additional probe traffic, but for critical services, proactive detection is worth it.
Failure Modes & Recovery
1. Gateway as Single Point of Failure
Symptoms: All API traffic fails simultaneously. Clients receive connection refused or timeout errors for every endpoint.
Root cause: If all gateway nodes fail (misconfiguration deployment, resource exhaustion, infrastructure failure), the entire API surface is down. The gateway is the most critical piece of infrastructure after DNS.
Fix: Deploy gateway nodes across multiple availability zones. Use a health-checked L4 load balancer that removes failed gateway nodes. Maintain at least 3 gateway nodes (2N+1 for quorum-free services). Test gateway failure regularly — kill a gateway node and verify traffic shifts seamlessly. Keep gateway configuration deployments separate from application deployments with canary rollout.
2. Configuration Deployment Errors
Symptoms: Specific routes return 404 or 502. Some requests fail while others succeed. Gateway returns errors immediately after a config change.
Root cause: A bad configuration update — invalid upstream address, broken regex in a route, missing TLS certificate, or a plugin that crashes on certain request patterns. Configuration errors are the number one cause of gateway outages.
Fix: Validate configuration before deployment (nginx -t, Kong's declarative config validation, Envoy's config dump validation). Deploy config changes with canary rollout — apply to one gateway node, observe error rates for 5 minutes, then roll to the rest. Maintain a fast rollback mechanism — "revert to last known good config" should be a single command. Store gateway configuration in version control and use CI/CD for changes.
3. Plugin/Filter Latency Overhead
Symptoms: Gateway p99 latency increases. The added latency is in the gateway itself, not the backend (measurable via gateway internal metrics). CPU usage on gateway nodes increases.
Root cause: A plugin or filter that performs expensive operations per request: regex matching on request bodies, synchronous external calls (OAuth token introspection to an IdP), complex Lua/WASM logic, or logging that blocks on I/O.
Fix: Profile plugin execution time (Kong's latency metrics, Envoy's filter_chain_duration). Replace synchronous auth calls with JWT validation (local verification, no network call). Use async logging (buffer and flush, never block the request path). Cache expensive computations (OAuth token introspection results cached for the token's TTL). Remove unused plugins — every plugin in the chain adds latency, even if minimal.
4. Connection Exhaustion
Symptoms: Gateway returns 502 Bad Gateway or 503 Service Unavailable. Backend services are healthy when accessed directly. Gateway logs show upstream connection refused or no free connections.
Root cause: The gateway has exhausted its connection pool to a specific upstream. This happens when a backend service slows down (increasing connection hold time) while the request rate stays constant — connections accumulate faster than they are released.
Fix: Configure connection limits per upstream (circuit breaker). Set aggressive timeouts on upstream connections (connect timeout: 1-3s, read timeout: 5-30s depending on endpoint). Increase connection pool size if the backend can handle more concurrent connections. Add circuit breaking — if a backend is consistently slow, stop sending it traffic and return a fast failure (503) instead of queuing connections.
5. TLS Certificate Expiry
Symptoms: All HTTPS clients receive certificate errors. API traffic drops to zero as clients reject the expired certificate.
Root cause: TLS certificates have expiry dates (typically 90 days for Let's Encrypt, 1 year for commercial CAs). If certificate renewal fails silently, the certificate expires and all TLS connections fail.
Fix: Automate certificate renewal (cert-manager in Kubernetes, certbot for standalone). Monitor certificate expiry — alert 30 days before expiry and escalate at 7 days. Use certificate management solutions that handle rotation without gateway restarts. Test certificate renewal in staging before relying on it in production.
Interview Application — Staff-Level Plays
Which Playbooks Use API Gateways
| Playbook | How API Gateways Are Used | Key Pattern |
|---|---|---|
| API Gateway | Primary architectural component | TLS termination, auth, rate limiting, routing as cross-cutting concerns |
| Rate Limiting | Gateway-level enforcement point | Token bucket per user/IP at the edge, before requests reach services |
| Service Discovery | Dynamic upstream routing | xDS (Envoy) or DNS-based discovery for backend service fleet |
| Auth at Scale | Centralized authentication enforcement | JWT validation at gateway, claims extraction into headers for services |
Every System Design Question Has a Gateway
You do not need a dedicated "Design an API Gateway" question to use this knowledge. Every question with an API layer benefits from mentioning the gateway:
- "Design a URL shortener" → "The API gateway handles rate limiting (prevent abuse of the create endpoint), authentication (track who creates links), and routing (/api/shorten → Shortener Service, /:code → Redirect Service)."
- "Design a chat application" → "The gateway terminates WebSocket connections, authenticates the JWT on connect, and routes to the messaging service. Rate limiting on message send prevents spam."
- "Design a payment system" → "The gateway enforces strict authentication (OAuth2 with scope validation), rate limiting (prevent duplicate charge attempts), and WAF rules (block suspicious patterns). TLS termination with certificate pinning for mobile clients."
- "Design a search engine" → "The gateway routes /search to the Search Service, /suggest to the Autocomplete Service. Response caching at the gateway for popular queries (60-second TTL) reduces backend load by 80%."
L5 vs L6 Responses
| Scenario | L5 Answer | L6/Staff Answer |
|---|---|---|
| "Where does auth happen?" | "Each service validates the JWT" | "The API gateway validates the JWT and extracts claims (user_id, scopes) into headers. Services trust the gateway — they read X-User-Id from the header. This eliminates duplicated auth logic across 20 services and creates a single enforcement point." |
| "How do you rate limit?" | "Use Redis for rate limiting" | "Rate limiting at the gateway with a token bucket per user, backed by Redis for distributed counting across gateway nodes. Different limits per endpoint: 100 RPM for writes, 1000 RPM for reads, 10 RPM for unauthenticated. Return 429 with Retry-After header." |
| "How do you handle API versioning?" | "URL path versioning: /v1/, /v2/" | "Path-based versioning at the gateway: /v1/* routes to the stable service fleet, /v2/* routes to the new version. The gateway handles version routing so services do not need version awareness. During migration, the gateway can fan out to both versions and compare responses (shadow testing)." |
| "What about observability?" | "Add logging to each service" | "The gateway emits structured access logs, latency metrics (p50/p95/p99 per endpoint), and injects trace IDs (OpenTelemetry) into every request. Because all traffic flows through the gateway, we get global API metrics without instrumenting individual services. Services add service-specific spans to the trace." |
The Staff Gateway Checklist
When discussing the API layer in any system design interview:
- Name the gateway role: "An API gateway sits between clients and services, handling cross-cutting concerns."
- List what lives at the gateway: "TLS termination, JWT validation, rate limiting, request routing, and access logging."
- Explain what does NOT live at the gateway: "Business logic, data validation beyond basic schema checks, and service-specific authorization (the gateway handles authn, services handle authz)."
- Address the SPOF risk: "The gateway fleet is horizontally scaled across AZs behind an L4 load balancer. Configuration changes are canaried."
- Mention the latency budget: "The gateway adds 5-10ms overhead. For internal service-to-service calls, we use a lightweight sidecar or direct communication with client-side load balancing."
- Match technology to context: "Envoy for Kubernetes/service mesh, Kong for API management, NGINX for simple routing, AWS API Gateway for serverless."
Quick Reference Card
Gateway role: Routing, auth, rate limiting, TLS, observability, transformation
Edge gateway: North-south (client → service), security + traffic management
Service mesh: East-west (service → service), reliability + observability
Pipeline order: TLS → WAF → Auth → Rate Limit → Transform → Route → Log
Kong: Lua plugins, PostgreSQL/Cassandra, declarative YAML, API management
Envoy: C++, xDS dynamic config, service mesh data plane, gRPC native
NGINX: File-based config, battle-tested, simple reverse proxy
AWS API GW: Managed, Lambda-native, usage plans, zero ops
Scaling: Stateless horizontal scaling, L4 LB in front, 10-50K RPS per node
Connection pool: 64-256 keepalive connections per upstream
Health checks: Active probes + passive monitoring, remove unhealthy upstreams
Failure modes: SPOF (multi-AZ), bad config (canary), plugin latency (profile),
connection exhaustion (limits + timeouts), TLS expiry (automate)
Interview rule: Every question with an API has a gateway. Name the concerns,
not the product.