Routing / Load Balancing #

routing        = choose path or destination
load balancing = choose among equivalent destinations
                 under capacity/health policy

It answers:

where should this request/packet/message/query go NOW?

Role in the catalog: the traffic face — the block with the highest import ratio, because nearly every classic “type” is an already- adjudicated concern arriving in a routing costume:

consistent-hash routing   → scheduler.md's deterministic topology
                            + index_structures.md's owner codomain
                            (route ≠ ownership: adjudicated there)
service discovery         → a CACHE: the endpoint set is a cached view
                            of a moving fleet, invalidated by a watch
                            stream — cache.md's ladder, wholesale
query/data routing        → index_structures.md's routing row + silent omission*
                            ("query misses a shard, partial result
                            accepted as complete" IS silent
                            incompleteness, in scatter/gather costume)
failover routing          → replication.md's promotion*, traffic side
proxy/gateway + xDS       → boundary.md's control/data-plane split
                            (+ the xDS blueprint notes)
endpoint LB               → scheduler.md's traffic row
traffic splitting         → boundary.md's deployment boundary,
                            riding the router

What remains native is narrow and real: decision granularity, the equivalence assumption, and one star with one beautiful recipe.

Central tension:

local fast decisions  vs  global optimal placement and freshness

Design Axes (the core module) #

Axis 1 — Decision Granularity (the structural cleave) #

per-packet      (IP/BGP)
per-connection  (L4: NLB, IPVS, Maglev)
per-request     (L7: Envoy routes, gateways)
per-session     (sticky/affinity)
per-key         (consistent hash, partitioners)

The deep-lesson row “connection-level balance ≠ request-level balance,” promoted to structure:

the granularity you balance at BOUNDS the imbalance you cannot fix.
one hot gRPC connection carries a thousand unbalanced requests —
HTTP/2 multiplexing is what forces the L4→L7 move.
and granularity is sticky DOWNWARD: choosing per-connection forecloses
per-request corrections until the connection dies.

Interrogation:

At what granularity is the choice made — and at what granularity does
  load actually vary?
What imbalance is invisible below the chosen granularity?
Long-lived connections: what rebalances them, ever? (drain, max-age,
  GOAWAY — or nothing?)

Axis 2 — The Route Key #

destination name      DNS records
request attributes    host/path/header/method — with header spoofing as
                      policy.md's principal-substrate failure: a routing
                      decision on an unauthenticated header is an authz
                      decision made on a lie
data key              consistent hash → owner (axis 4's "unique" case)
session identity      cookie, connection, user hash
client locality       geo, zone, region

Interrogation:

What key chooses the route — and who can forge it?
Is the key stable under retries and reconnects? (a retry that re-keys
  defeats affinity AND dedupe downstream)

Axis 3 — Routing-State Freshness (all arrows) #

The entire right-hand column of classic failure modes — stale endpoints, DNS TTL slowness, stale routing tables, rejected config — is cache.md’s freshness ladder applied to routing state specifically:

DNS TTL                    the TTL rung, honestly labeled
                           (failover speed is bounded by resolver caches
                           you do not control)
watch-driven endpoints     invalidation rung (EndpointSlice, EDS)
xDS version/nonce ACK      the version-pinned rung
                           (checkpoint_replay.md's control-plane row)
last-good config           boundary.md's control/data-plane doctrine:
                           a rejected route table must not take down
                           serving — stale-but-valid beats fresh-but-broken

Interrogation:

Which rung does each layer of routing state sit on — named?
"Ready" signal: who asserts it, and what does it actually test?
  (endpoint ready ≠ endpoint correct — the deep lesson's row)
Traffic to terminating pods: does drain precede removal, or race it?

Axis 4 — The Equivalence Assumption #

interchangeable   pure LB — any endpoint will do
weighted          canary/traffic-split — deployment machinery riding
                  the router (small samples hide failures; sticky
                  sessions bias the weights)
preferred         locality — nearby first, spill on overload
                  (the spill is the hard part: zonal preference that
                  never spills melts the local zone while remote idles;
                  regional failover that spills all at once stampedes —
                  capacity.md's correlated peak, self-inflicted)
UNIQUE            key routing — "balancing" stops being the word and
                  CORRECTNESS starts. the deep lesson's "consistent
                  hash vs fairness": key routing cannot balance —
                  the key distribution decides, and a hot key is not
                  a routing bug (remedies live upstream: key salting,
                  splitting the hot key — a data-model change, not an
                  LB knob)

Interrogation:

Are the destinations actually equivalent — or did unique-by-key sneak in?
Locality: what is the spill policy, numerically?
Splits: is the sample size enough to see the failure you're canarying for?

Axis 5 — The Feedback Loop (where the star lives) #

How the router learns: health checks, outlier ejection, latency signals (EWMA), load reports. Every signal lags reality — and that lag has a structure of its own:

Technical Bottleneck: Herding on a Stale Signal* #

every router balances on an observation that lags reality.
scheduler.md's view-vs-reality* — but with a property the scheduler
version lacks: the routers' decisions are CORRELATED.

Many independent balancers reading the same stale signal converge on the same target, manufacturing the very hotspot the signal denied. capacity.md’s statistical bet, inverted: the bet assumed uncorrelated peaks; stale-signal herding MANUFACTURES correlation. Two signature forms:

the fast-failure trap    the fastest-LOOKING endpoint is often the one
                         FAILING fast — errors return quicker than
                         successes, so least-latency routing pours
                         traffic into the sick backend
the recovery stampede    a newly-added or newly-recovered endpoint is
                         simultaneously discovered idle by every router,
                         and flattened by its own attractiveness

Known recipes (the block’s crown):

power of two choices     sample two random endpoints, pick the less
                         loaded. near-optimal balance with almost no
                         state — and crucially, it DECORRELATES the
                         herd, because everyone samples differently.
                         the flagship, and one of the few genuinely
                         beautiful results in the catalog: exponential
                         improvement over random, purchased with one
                         extra sample.
deterministic subsetting each client sees few backends — the herd size
                         is bounded by construction
EWMA with error penalty  latency signal decays toward PESSIMISM on
                         errors, defusing the fast-failure trap
bounded outlier ejection eject the sick, but never eject your way to
                         zero capacity (max-ejection-percent)
slow start / warm-up     joining endpoints ramp their weight —
                         the stampede recipe
retry budgets            → backpressure.md: retries amplify every
                         imbalance the signal missed

A strong design says explicitly:

the decision granularity and what varies beneath it (axis 1),
the route key and who can forge it (axis 2),
the freshness rung of every layer of routing state (axis 3),
whether destinations are equivalent, weighted, preferred, or unique (axis 4),
and how the feedback loop is decorrelated —
because a fleet of routers agreeing on stale news is a stampede
with a control plane.

Routing As Protocol (the crossing-point spec — keep) #

Envoy instantiation (home turf — the full request path):

listener accepts downstream connection
filter chain selected
HTTP route matches host/path/headers        (axis 2)
route selects cluster                        (axis 4: weights, subsets)
cluster LB selects endpoint                  (axis 5: policy + signals)
request sent upstream
health / outlier / retry / circuit-breaker apply
                                             (star* recipes + backpressure.md)
telemetry emitted                            (the signal for the next decision)

Service-discovery instantiation:

endpoint controller publishes endpoint set  (the cached view is born)
resolver/proxy watches                       (invalidation rung)
client resolves name → LB selects            (axes 2, 5)
readiness updates change FUTURE routing      (the lag is structural)
drain precedes removal                       (or terminating pods eat traffic)

Named Configurations (lookup table) #

Vector = {granularity, key, state rung, equivalence, feedback}. Rows marked → are owned elsewhere.

Name	Vector	Canonical study object	Signature failure
Network routing	per-packet, prefix, protocol convergence, path metrics, BGP updates	BGP path-vector	route leak; blackhole; loops; slow convergence
DNS routing	per-resolution, name, TTL rung, weighted/geo, none	DNS + TTL caching	stale cache; failover bounded by TTLs you don’t control
Service discovery	—, name→set, watch rung, —, readiness	EndpointSlice; EDS	ready signal lies; churn overloads control plane; terminating-pod traffic
L4 LB	per-connection, 5-tuple hash, conntrack, interchangeable, health	Maglev; IPVS	hot connection invisible; conntrack exhaustion; slow drain
L7 routing	per-request, host/path/header, xDS version rung, weighted, —	Envoy listener→route→cluster	precedence surprise; header spoofing ( policy.md); shadowed route
Endpoint LB → scheduler.md traffic row	per-request, —, —, interchangeable, the star’s home	Envoy LB policies; P2C	fast-failure trap; recovery stampede; retry amplification
Consistent hash → scheduler/index	per-key, data key, membership, unique, none	Dynamo ring; Redis slots	hot key (not an LB bug); churn movement; route ≠ ownership
Traffic splitting	per-request, weights, config rung, weighted, rollout gates	Istio VirtualService	sticky bias; sample too small; long-lived conns miss rollback
Locality routing	per-request, client zone, —, preferred + spill, zone load	Envoy locality LB	no-spill meltdown; all-at-once spill stampede; residency (boundary.md)
Failover routing → replication.md promotion*	coarse, health, DNS/GLB rung, priority, health gates	global LB failover	false failover; split-brain active/active; cold standby undersized
Session affinity	per-session, session ID, binding table, sticky, —	sticky sessions; hash-by-user	backend dies = session dies; hot user; affinity fights balance
Proxy/gateway → xDS notes	per-request, full config, version-pinned + last-good, —, —	Envoy + xDS	config rejected; skew across proxies; control-plane dependence
Query/data routing → index_structures.md + silent omission*	per-query, segment metadata, routing table rung, unique-per-shard, —	Pinot scatter/gather; ES shard routing	omitted shard; partial accepted as complete*; hot shard

Vocabulary #

route  path  next hop  endpoint  backend  cluster  VIP  listener
granularity  multiplexing  drain  max-age  GOAWAY
route key  affinity  stickiness  locality  spill
weight  subset  canary  rollout
health  readiness  outlier ejection  max-ejection-percent
EWMA  load report  signal lag  herd  stampede
power of two choices  subsetting  slow start
scatter/gather  partial result
TTL  watch  version/nonce  last-good config

Deep Lesson #

Routing bugs come from confusing pairs on different axes:

name resolution      vs  health                (axis 3: DNS answers "where," never "how")
routing              vs  authorization         (axis 2: a spoofable key is not a principal — policy.md)
load balancing       vs  capacity management   (the router spreads load; it cannot create capacity — capacity.md)
retry                vs  recovery              (→ retry_idempotency + backpressure: amplification)
endpoint ready       vs  endpoint correct      (axis 3: readiness tests liveness, not truth)
connection balance   vs  request balance       (axis 1: granularity bounds the fix)
consistent hash      vs  fairness              (axis 4: unique destinations cannot be balanced)
local routing        vs  resilient routing     (axis 4: preference needs a spill policy)

Design procedure: pick the granularity that matches where load varies, authenticate the route key, name the freshness rung of every table, declare the equivalence assumption per route — and decorrelate the feedback loop, preferably with two random choices, because the cheapest fix in this block is also its most elegant. The named types are recognition shortcuts; here, most are arrows.