Skip to main content
  1. System Design Components/

API Gateway Analysis Note

API Gateway Analysis Note #

This note captures the full step-by-step analysis for an API gateway: route config, auth/policy config, backend health, effective gateway state, and versioned snapshot propagation to serving nodes.

Step 1 — Normalize #

Assume the baseline prompt is:

  • design an API gateway
  • clients send API requests to one endpoint
  • gateway authenticates, authorizes, rate-limits, and routes to backend services
  • policies can change over time
  • system scales across nodes
RequirementActorOperationState touchedPriority
Client request is authenticated and routedClientread sourceS1
read source target
GatewayRoutingState
C1
Admin updates route configAdminoverwrite stateS1
update target
RouteConfig
C1
Admin updates auth/policy configAdminoverwrite stateS1
update target
PolicyConfig
C1
System records backend healthSystemoverwrite stateS1
update target
HealthState
C1
System updates effective gateway config snapshotSystemstate transitionS1
update target
GatewayRoutingState
C1
System propagates config snapshot to serving nodesSystemasync processS1
hidden write target
ConfigSnapshot
C1
Client reads gateway status/metricsClientread projectionS1
read projection target
GatewayStatusView
R2

Notes on normalization:

Important choices:

  • request routing is read source
    • the hot path mostly reads current routing state and makes a routing decision
  • route config and policy config are overwrite state
  • backend health is overwrite state
    • current health is the main truth
  • effective gateway state update is a control-plane recompute transition
  • config propagation is async process
    • control-plane to data-plane dissemination

This is clearly a:

  • Control Plane + Data Plane system

Step 2 — Critical Path Selection #

RequirementPriority classWhy
Request auth + routingC1wrong policy/routing breaks correctness and security
Update route configC1changes future request routing
Update auth/policy configC1changes future enforcement
Record backend healthC1bad health state can route traffic to failing backend
Update effective gateway stateC1control-plane to data-plane correctness bridge
Propagate config snapshotC1stale serving nodes can enforce wrong policy
Status/metrics readsR2operational only

Critical paths:

  • P1 request handling
  • P2 update route config
  • P3 update policy config
  • P4 record health
  • P5 update effective gateway state
  • P6 propagate config snapshot

Step 3 — Primary State Extraction #

Candidate object labelCandidate sourceCandidate needed for C1/R1?Candidate decomposition actionClassPrimary?OwnerEvolutionScope kindScope value
RouteConfigdirect nounYeskeep as candidateentityYesserviceoverwriteinstanceroute_id or listener_id
PolicyConfigdirect nounYeskeep as candidateentityYesserviceoverwriteinstancepolicy_scope
HealthStatehidden write targetYeskeep as candidateentityYesserviceoverwriteinstancebackend_id
GatewayRoutingStatehidden write targetYeskeep as candidateprocessYesserviceoverwriteinstancelistener_id or service_id
ConfigSnapshothidden write targetYeskeep as candidateprojectionYesserviceoverwriteinstancegateway_node_id
GatewayStatusViewderived read modelNoreject as UI artifactprojectionNoderivedoverwritecollectiongateway cluster

Minimal primary set:

  • RouteConfig
  • PolicyConfig
  • HealthState
  • GatewayRoutingState
  • ConfigSnapshot

Step 4 — Hard Invariants #

PathTierTypeInvariant statement
P1 request handlingHARDeligibilityroute_request is valid only if selected route and backend are eligible under current policy, auth result, and health state for the request scope.
P2 update route configHARDorderingRoute config revisions are ordered by monotonic config version within route scope.
P3 update policy configHARDorderingPolicy config revisions are ordered by monotonic config version within policy scope.
P4 record healthHARDorderingHealth updates are ordered by monotonic observation revision/timestamp within backend scope.
P5 update effective gateway stateHARDaccountingEffective gateway state equals function of route config, policy config, and health state.
P6 propagate config snapshotHARDfreshnessServing-node config snapshot reflects authoritative gateway state within configured propagation bound.

What matters most:

1. Route only under valid policy and health #

The core safety property is:

  • gateway must not send requests using stale/invalid route, auth, or backend-eligibility state beyond the allowed bound

2. Effective gateway state is derived from authoritative inputs #

Route config, policy config, and health feed the hot-path serving state.

3. Propagation lag is bounded #

This is the key control-plane/data-plane correctness interface.

Step 5 — Execution Context #

For the API gateway baseline:

FieldValueWhy
Topologysingle service distributedone logical gateway system with many serving nodes
Write coordination scopeper object scopecorrectness is per route/policy/backend/listener scope
Read consistency targetbounded stale allowedhot path usually uses recent snapshots, not synchronous control-plane reads
Holder modelnoneno lease-like per-request ownership is central
Compensation acceptable?Nowrong auth/routing decisions are not compensable

Derived:

  • bounded_staleness_allowed = true
  • exclusive_claim_required = false
  • guarded_by_current_state = true

This pushes us toward:

  • authoritative control plane
  • versioned snapshots to data-plane nodes
  • hot path reads from local or near-local routing snapshot

Step 6 — Deterministic Mechanism Selection #

PathWrite shapeBase mechanismRequired companions
P2 update route configoverwrite current valueCAS on versionconfig version
P3 update policy configoverwrite current valueCAS on versionconfig version
P4 record healthoverwrite current valueCAS on version or monotonic overwritehealth revision/timestamp
P5 update effective gateway stateoverwrite current valuesingle writer control-plane recomputerouting version
P6 propagate config snapshotoverwrite current valuesingle writer snapshot publicationconfig version

Hot request path P1 is a read path, not a write-shape path.

Step 7 — Read Model / Source of Truth #

ConceptTruthRead pathRebuild path
C1 route configRouteConfigread source directlyauthoritative config store
C2 policy configPolicyConfigread source directlyauthoritative config store
C3 health stateHealthStateread source directlyauthoritative health store
C4 effective gateway stateGatewayRoutingStateread source directlyrecompute from route + policy + health
C5 serving-node snapshotConfigSnapshotmaterialized viewrebuild from latest gateway state
C6 status/metricsderivedmaterialized viewrecompute from primary state

Important point:

For the hot path:

  • data-plane node reads local ConfigSnapshot
  • not control-plane state per request

So the serving read path is:

  • local materialized view / config snapshot

Step 8 — Failure Handling #

PathRetryCompeting writersCrash after commitPublish failureStale holder
route/policy updateretry with config versionstale update loses CAScommitted config survives crash if persistedsnapshot propagation may lagn/a
health updateretry with monotonic observation revisionlatest valid observation winscommitted health survives crash if persistedsnapshot propagation may lagn/a
effective-state recomputeretry safe from primary inputssingle recompute/version winsrecompute reruns after crashsnapshot propagation may lagn/a
snapshot propagationretry with versioned snapshotolder snapshot loses to newer versionnode keeps last good snapshot until refreshfailed push retried or pulledn/a
request handlingretries are application-levelserving nodes can handle concurrently with same snapshot versionnode crash drops in-flight requests onlyn/astale snapshot bounded by version/TTL refresh

What matters most:

1. Versioned snapshot propagation #

Serving nodes must reject older snapshots and move monotonically forward by config version.

2. Bounded stale routing #

The hot path is usually eventually updated, not strongly synchronized with every control-plane change.

3. Health flapping #

Health-state overwrite policy must handle noisy observations without causing excessive routing churn.

Step 9 — Scale Adjustments #

HotspotTypeFirst response
very high request volumeread hotspotadd more gateway nodes; keep hot path local to snapshots
config churnfan-out hotspotincremental snapshot propagation; batch updates
health-flap stormscontention hotspotdampen health transitions and recompute cadence
large route/policy tablesread hotspotshard config by listener/service and compress snapshots
metrics/status loadread hotspotderived views only

What scales well:

The system scales well if:

  • the hot path is local snapshot read + policy/routing evaluation
  • control plane is narrow and versioned
  • propagation is incremental

What fails first:

Usually:

  • health-flap storms
  • config propagation bursts
  • giant route/policy tables

Canonical design conclusion:

  • primary truth:
    • RouteConfig
    • PolicyConfig
    • HealthState
    • GatewayRoutingState
    • ConfigSnapshot
  • hot path:
    • local snapshot read + auth/policy check + backend selection
  • control plane:
    • versioned config + recomputed effective state + snapshot publication

Concrete Substrate #

  • control plane in Go/Java
  • authoritative config/health store in etcd or similar strongly consistent store
  • snapshot propagation via watch streams/pub-sub
  • data-plane fleet using Envoy-like or custom proxy processes
  • local snapshot cache per node for hot path

Operation Layer #

1. HandleRequest(listener, request) #

  • entry point: data-plane node
  • authoritative decider: local ConfigSnapshot
  • transition: none on source truth
  • response: proxied backend response or policy rejection

2. PutRouteConfig(route_id, config, expected_version?) #

  • entry point: control-plane API
  • authoritative decider: config store
  • transition: overwrite RouteConfig, bump version

3. PutPolicy(policy_scope, config, expected_version?) #

  • same shape as route config

4. ReportHealth(backend_id, health, observation_revision) #

  • entry point: control plane
  • authoritative decider: health-state owner
  • transition: overwrite HealthState

5. internal recompute #

  • recompute GatewayRoutingState from config + policy + health

6. snapshot propagation #

  • push/pull latest ConfigSnapshot(version) to data-plane nodes

Entry Point vs Decider vs Responder #

PathEntry pointAuthoritative deciderPhysical responderLogical responder
request handlingdata-plane nodelocal config snapshotdata-plane nodeAPI gateway
route/policy updatecontrol-plane APIconfig store ownercontrol-plane nodeAPI gateway
health updatecontrol-plane APIhealth-state ownercontrol-plane nodeAPI gateway
snapshot propagationserving node / control planesnapshot publishercontrol/data-planeAPI gateway

Concrete HLD #

Main components:

  • control-plane API
    • route config
    • policy config
    • health ingestion
  • control-plane state store
    • route, policy, health, effective routing versions
  • routing recompute worker
    • derives effective gateway state
  • snapshot distribution layer
    • pushes/pulls versioned config to serving nodes
  • data-plane gateway fleet
    • handles client traffic using local snapshots

Short interview version #

“I’d design the API gateway as a control-plane/data-plane system. Control plane owns route config, auth/policy config, backend health, and effective gateway state. Data-plane nodes don’t query control plane per request; they use versioned local snapshots to authenticate, enforce policy, and route to healthy backends. Config and health are updated in the control plane, effective state is recomputed there, and snapshots are propagated incrementally to the serving fleet.”