Routing / Load Balancing #
routing = choose path or destination
load balancing = choose among equivalent destinations
under capacity/health policy
It answers:
where should this request/packet/message/query go NOW?
Role in the catalog: the traffic face — the block with the highest import ratio, because nearly every classic “type” is an already- adjudicated concern arriving in a routing costume:
consistent-hash routing → scheduler.md's deterministic topology
+ index_structures.md's owner codomain
(route ≠ ownership: adjudicated there)
service discovery → a CACHE: the endpoint set is a cached view
of a moving fleet, invalidated by a watch
stream — cache.md's ladder, wholesale
query/data routing → index_structures.md's routing row + silent omission*
("query misses a shard, partial result
accepted as complete" IS silent
incompleteness, in scatter/gather costume)
failover routing → replication.md's promotion*, traffic side
proxy/gateway + xDS → boundary.md's control/data-plane split
(+ the xDS blueprint notes)
endpoint LB → scheduler.md's traffic row
traffic splitting → boundary.md's deployment boundary,
riding the router
What remains native is narrow and real: decision granularity, the equivalence assumption, and one star with one beautiful recipe.
Central tension:
local fast decisions vs global optimal placement and freshness
Design Axes (the core module) #
Axis 1 — Decision Granularity (the structural cleave) #
per-packet (IP/BGP)
per-connection (L4: NLB, IPVS, Maglev)
per-request (L7: Envoy routes, gateways)
per-session (sticky/affinity)
per-key (consistent hash, partitioners)
The deep-lesson row “connection-level balance ≠ request-level balance,” promoted to structure:
the granularity you balance at BOUNDS the imbalance you cannot fix.
one hot gRPC connection carries a thousand unbalanced requests —
HTTP/2 multiplexing is what forces the L4→L7 move.
and granularity is sticky DOWNWARD: choosing per-connection forecloses
per-request corrections until the connection dies.
Interrogation:
At what granularity is the choice made — and at what granularity does
load actually vary?
What imbalance is invisible below the chosen granularity?
Long-lived connections: what rebalances them, ever? (drain, max-age,
GOAWAY — or nothing?)
Axis 2 — The Route Key #
destination name DNS records
request attributes host/path/header/method — with header spoofing as
policy.md's principal-substrate failure: a routing
decision on an unauthenticated header is an authz
decision made on a lie
data key consistent hash → owner (axis 4's "unique" case)
session identity cookie, connection, user hash
client locality geo, zone, region
Interrogation:
What key chooses the route — and who can forge it?
Is the key stable under retries and reconnects? (a retry that re-keys
defeats affinity AND dedupe downstream)
Axis 3 — Routing-State Freshness (all arrows) #
The entire right-hand column of classic failure modes — stale endpoints, DNS TTL slowness, stale routing tables, rejected config — is cache.md’s freshness ladder applied to routing state specifically:
DNS TTL the TTL rung, honestly labeled
(failover speed is bounded by resolver caches
you do not control)
watch-driven endpoints invalidation rung (EndpointSlice, EDS)
xDS version/nonce ACK the version-pinned rung
(checkpoint_replay.md's control-plane row)
last-good config boundary.md's control/data-plane doctrine:
a rejected route table must not take down
serving — stale-but-valid beats fresh-but-broken
Interrogation:
Which rung does each layer of routing state sit on — named?
"Ready" signal: who asserts it, and what does it actually test?
(endpoint ready ≠ endpoint correct — the deep lesson's row)
Traffic to terminating pods: does drain precede removal, or race it?
Axis 4 — The Equivalence Assumption #
interchangeable pure LB — any endpoint will do
weighted canary/traffic-split — deployment machinery riding
the router (small samples hide failures; sticky
sessions bias the weights)
preferred locality — nearby first, spill on overload
(the spill is the hard part: zonal preference that
never spills melts the local zone while remote idles;
regional failover that spills all at once stampedes —
capacity.md's correlated peak, self-inflicted)
UNIQUE key routing — "balancing" stops being the word and
CORRECTNESS starts. the deep lesson's "consistent
hash vs fairness": key routing cannot balance —
the key distribution decides, and a hot key is not
a routing bug (remedies live upstream: key salting,
splitting the hot key — a data-model change, not an
LB knob)
Interrogation:
Are the destinations actually equivalent — or did unique-by-key sneak in?
Locality: what is the spill policy, numerically?
Splits: is the sample size enough to see the failure you're canarying for?
Axis 5 — The Feedback Loop (where the star lives) #
How the router learns: health checks, outlier ejection, latency signals (EWMA), load reports. Every signal lags reality — and that lag has a structure of its own:
Technical Bottleneck: Herding on a Stale Signal* #
every router balances on an observation that lags reality.
scheduler.md's view-vs-reality* — but with a property the scheduler
version lacks: the routers' decisions are CORRELATED.
Many independent balancers reading the same stale signal converge on the same target, manufacturing the very hotspot the signal denied. capacity.md’s statistical bet, inverted: the bet assumed uncorrelated peaks; stale-signal herding MANUFACTURES correlation. Two signature forms:
the fast-failure trap the fastest-LOOKING endpoint is often the one
FAILING fast — errors return quicker than
successes, so least-latency routing pours
traffic into the sick backend
the recovery stampede a newly-added or newly-recovered endpoint is
simultaneously discovered idle by every router,
and flattened by its own attractiveness
Known recipes (the block’s crown):
power of two choices sample two random endpoints, pick the less
loaded. near-optimal balance with almost no
state — and crucially, it DECORRELATES the
herd, because everyone samples differently.
the flagship, and one of the few genuinely
beautiful results in the catalog: exponential
improvement over random, purchased with one
extra sample.
deterministic subsetting each client sees few backends — the herd size
is bounded by construction
EWMA with error penalty latency signal decays toward PESSIMISM on
errors, defusing the fast-failure trap
bounded outlier ejection eject the sick, but never eject your way to
zero capacity (max-ejection-percent)
slow start / warm-up joining endpoints ramp their weight —
the stampede recipe
retry budgets → backpressure.md: retries amplify every
imbalance the signal missed
A strong design says explicitly:
the decision granularity and what varies beneath it (axis 1),
the route key and who can forge it (axis 2),
the freshness rung of every layer of routing state (axis 3),
whether destinations are equivalent, weighted, preferred, or unique (axis 4),
and how the feedback loop is decorrelated —
because a fleet of routers agreeing on stale news is a stampede
with a control plane.
Routing As Protocol (the crossing-point spec — keep) #
Envoy instantiation (home turf — the full request path):
listener accepts downstream connection
filter chain selected
HTTP route matches host/path/headers (axis 2)
route selects cluster (axis 4: weights, subsets)
cluster LB selects endpoint (axis 5: policy + signals)
request sent upstream
health / outlier / retry / circuit-breaker apply
(star* recipes + backpressure.md)
telemetry emitted (the signal for the next decision)
Service-discovery instantiation:
endpoint controller publishes endpoint set (the cached view is born)
resolver/proxy watches (invalidation rung)
client resolves name → LB selects (axes 2, 5)
readiness updates change FUTURE routing (the lag is structural)
drain precedes removal (or terminating pods eat traffic)
Named Configurations (lookup table) #
Vector = {granularity, key, state rung, equivalence, feedback}. Rows marked → are owned elsewhere.
| Name | Vector | Canonical study object | Signature failure |
|---|---|---|---|
| Network routing | per-packet, prefix, protocol convergence, path metrics, BGP updates | BGP path-vector | route leak; blackhole; loops; slow convergence |
| DNS routing | per-resolution, name, TTL rung, weighted/geo, none | DNS + TTL caching | stale cache; failover bounded by TTLs you don’t control |
| Service discovery | —, name→set, watch rung, —, readiness | EndpointSlice; EDS | ready signal lies; churn overloads control plane; terminating-pod traffic |
| L4 LB | per-connection, 5-tuple hash, conntrack, interchangeable, health | Maglev; IPVS | hot connection invisible; conntrack exhaustion; slow drain |
| L7 routing | per-request, host/path/header, xDS version rung, weighted, — | Envoy listener→route→cluster | precedence surprise; header spoofing ( policy.md); shadowed route |
| Endpoint LB → scheduler.md traffic row | per-request, —, —, interchangeable, the star’s home | Envoy LB policies; P2C | fast-failure trap*; recovery stampede*; retry amplification |
| Consistent hash → scheduler/index | per-key, data key, membership, unique, none | Dynamo ring; Redis slots | hot key (not an LB bug); churn movement; route ≠ ownership |
| Traffic splitting | per-request, weights, config rung, weighted, rollout gates | Istio VirtualService | sticky bias; sample too small; long-lived conns miss rollback |
| Locality routing | per-request, client zone, —, preferred + spill, zone load | Envoy locality LB | no-spill meltdown; all-at-once spill stampede; residency (boundary.md) |
| Failover routing → replication.md promotion* | coarse, health, DNS/GLB rung, priority, health gates | global LB failover | false failover; split-brain active/active; cold standby undersized |
| Session affinity | per-session, session ID, binding table, sticky, — | sticky sessions; hash-by-user | backend dies = session dies; hot user; affinity fights balance |
| Proxy/gateway → xDS notes | per-request, full config, version-pinned + last-good, —, — | Envoy + xDS | config rejected; skew across proxies; control-plane dependence |
| Query/data routing → index_structures.md + silent omission* | per-query, segment metadata, routing table rung, unique-per-shard, — | Pinot scatter/gather; ES shard routing | omitted shard; partial accepted as complete*; hot shard |
Vocabulary #
route path next hop endpoint backend cluster VIP listener
granularity multiplexing drain max-age GOAWAY
route key affinity stickiness locality spill
weight subset canary rollout
health readiness outlier ejection max-ejection-percent
EWMA load report signal lag herd stampede
power of two choices subsetting slow start
scatter/gather partial result
TTL watch version/nonce last-good config
Deep Lesson #
Routing bugs come from confusing pairs on different axes:
name resolution vs health (axis 3: DNS answers "where," never "how")
routing vs authorization (axis 2: a spoofable key is not a principal — policy.md)
load balancing vs capacity management (the router spreads load; it cannot create capacity — capacity.md)
retry vs recovery (→ retry_idempotency + backpressure: amplification)
endpoint ready vs endpoint correct (axis 3: readiness tests liveness, not truth)
connection balance vs request balance (axis 1: granularity bounds the fix)
consistent hash vs fairness (axis 4: unique destinations cannot be balanced)
local routing vs resilient routing (axis 4: preference needs a spill policy)
Design procedure: pick the granularity that matches where load varies, authenticate the route key, name the freshness rung of every table, declare the equivalence assumption per route — and decorrelate the feedback loop, preferably with two random choices, because the cheapest fix in this block is also its most elegant. The named types are recognition shortcuts; here, most are arrows.