Flow Control / Backpressure #

how does the system prevent demand from overwhelming capacity?

flow control  = regulate how fast work enters or moves
backpressure  = propagate downstream saturation upstream

Role in the catalog: tier-two protocol block — the dynamics layer. queue.md and scheduler.md own the statics (structures, single-gate decisions); this file owns the load-control loop across a chain of them.

Governing math (axis 1’s foundation):

Little's law:  L = λW
(in-flight = arrival rate × time-in-system)
rate limits and concurrency limits are coupled through latency —
limit one, and latency variance moves the other.

Central tension:

protect the system  vs  serve as much useful work as possible
(and the operative word is useful — see bottleneck*)

Design Axes (the core module) #

Axis 1 — What Quantity Is Limited #

arrival rate:     per-time admission        (token/leaky bucket, per-user caps)
outstanding work: in-flight, however denominated —
                  requests (semaphore, pool, max in-flight)
                  bytes    (TCP window, HTTP/2 WINDOW_UPDATE, prefetch)
                  — the same control, two currencies
waiting depth:    queue occupancy           (bounded queue — owned by queue.md axis 5)
memory / cost:    buffer.memory, payload budgets, $-quotas

The Little’s-law consequence, derivable not memorized:

a rate limit calibrated at 10ms latency admits 100× the in-flight load
when latency degrades to 1s. rate limits assume a latency; concurrency
limits self-adjust. under latency variance, limit outstanding work.

Interrogation:

What resource is actually scarce — CPU, memory, connections, downstream capacity?
Is the limit denominated in the same currency as the scarcity?
What latency did the rate limit silently assume?
Distributed limit: is the counter consistent, and does the key match the tenant?

Axis 2 — Overload Response (what happens to the excess) #

An escalation ladder, not rival strategies — mature systems do all of these at different load levels:

slow the producer:  block / withhold credit      (lossless, needs obedient upstream)
queue it:           absorb burst                 (bounded! — else latency hides overload)
reorder it:         priority lanes, preemption   (scheduler.md axis 2 — reference, don't restate)
degrade it:         stale cache, shed features,
                    cheaper path                 (convert expensive work to cheap work)
reject it:          429/503 + Retry-After        (make overload the caller's problem, politely)

Interrogation:

At what load level does each rung engage?
Is queueing bounded, and by time as well as depth?
What can be degraded before anything is refused?
Is a drop visible to the caller, or silent data loss?
Shed by priority: is the priority honest? (scheduler.md: abuse, inversion)

Axis 3 — Signal Mechanism (how upstream learns) #

Ordered by enforceability:

explicit credit:   receiver grants; sender cannot exceed
                   (TCP window, Reactive Streams demand(n), prefetch count)
implicit blocking: bounded buffer; producer stalls on write
                   (bounded channels, Kafka producer on buffer.memory)
out-of-band rejection: 429/503 + Retry-After — ADVISORY; caller may obey
inferred:          no signal; sender deduces from latency/loss/errors
                   (TCP congestion control, adaptive concurrency,
                    circuit breaker inferring dependency health)

The structural fact this axis carries:

credit and blocking are enforceable.
rejection is advisory.
inference is disciplined guesswork.
"backpressure ignored by producer" is only possible on the advisory rung —
if ignoring the signal is unacceptable, choose an enforceable mechanism.

Interrogation:

Who receives the signal? CAN they obey it? What happens if they don't?
Is there a hidden unbounded buffer that absorbs the signal? (see axis 4)
For inferred: what does the sender actually observe, and how stale is it?
Blocking: can the stall propagate into a deadlock? (cycles in the flow graph)

Axis 4 — Placement & Direction (the native structural content) #

ingress:  protect yourself from callers
          (admission control, rate limits at the gate, request queues)
egress:   protect the dependency from YOU — regulate self-generated demand
          (circuit breaker, retry budget, per-dependency pools, outbound limits)

Retry budgets and circuit breakers are the two egress natives: both cap demand amplification toward a struggling downstream — retries are load you manufactured precisely when capacity fell.

Propagation — the chain property. A single gate is scheduler/queue territory; a pipeline’s backpressure is end-to-end or it is nothing:

the signal chain is only as strong as its weakest hop.
one hidden unbounded buffer anywhere silently absorbs the signal
and converts overload into latency + memory instead of upstream slowdown.
this is the thesis of the block, not a footnote.

Interrogation:

Where does demand enter; where is saturation first observed; how many hops between?
Walk every hop: is its buffer bounded? (one "no" breaks the chain)
Ingress and egress both covered, or only the front door?
Cyclic topology: what prevents credit-wait deadlock?
Bulkheads (boundary.md #9): which failure classes share a pool?

Axis 5 — Static vs Adaptive (a modifier on any limit) #

static:    fixed limit; simple, predictable, wrong twice a day
adaptive:  feedback-controlled (adaptive concurrency, congestion control,
           queue-depth autoscaling, latency-based throttling)

Adaptive imports control theory wholesale — its failure modes are a controls curriculum:

oscillation           overreaction to a lagging metric
feedback delay        acting on a world that has moved (view-vs-reality*, scheduler.md)
bad setpoint          stable convergence to the wrong target
controller fights     HPA vs adaptive concurrency vs manual limits —
                      two loops, one plant, no coordination

Interrogation:

What is the measured signal, the setpoint, and the actuation delay?
Why is the loop stable — argued, not assumed? (hysteresis? damping?)
What other controller acts on the same resource?
Autoscaling ≠ overload protection: what survives the minutes before capacity arrives?

Technical Bottleneck: Goodput Under Overload* #

the quantity that matters is not admitted work — it is admitted work
that completes usefully. throughput can stay high while goodput goes to zero.

The catastrophic form — congestion collapse / metastable failure — is capacity spent on already-dead work, self-sustaining:

requests queue past the caller's timeout, then get served anyway —
  the caller gave up; you are doing archaeology
retries amplify demand exactly when capacity fell
synchronized probes/backoffs arrive as a thundering herd
the system stays saturated at zero goodput even after the trigger clears

The doc’s confusions “retry vs recovery,” “queueing vs capacity,” “throughput vs health” are all this one bottleneck.

Known recipes (bounded, composable, none universal):

deadline propagation      carry the caller's remaining budget on every hop,
                          so dead work is droppable anywhere in the chain —
                          the flagship recipe, and the block's deepest idea
admission timeout         bound queues by TIME, not just depth;
                          never dequeue work older than its deadline
retry budget + jitter     retries ≤ fraction of originals; desynchronize
LIFO under overload       newest work has the freshest deadline —
                          fairness sacrificed for goodput, deliberately
circuit breaking          stop paying for a dependency that can't deliver
degraded modes            convert expensive work to cheap work before refusing any

A strong design says explicitly:

what resource is scarce,
where each limit is enforced and in what currency,
how callers learn to slow down (and whether the signal is enforceable),
what gets queued, degraded, or shed — and how dead work is detected,
and how fairness is preserved under stress.

Flow Control As Protocol (the crossing-point spec — keep) #

measure capacity or load
admit / delay / reject work
grant credit or permit
consume credit while work is in flight
release credit on completion
adjust limits from feedback
communicate retry/backoff to the caller

Instantiations:

HTTP:             429 / 503 + Retry-After              (advisory rejection)
TCP / HTTP2:      window advertisement / WINDOW_UPDATE  (explicit credit)
Reactive Streams: subscriber requests N                 (explicit credit)
broker:           prefetch count, fetch max bytes,
                  producer buffer limits                (credit + blocking)
worker pool:      semaphore acquire/release             (implicit blocking)
gRPC deadline:    remaining budget on every hop         (deadline propagation)

Named Configurations (lookup table) #

Vector = {quantity, response, signal, placement, static/adaptive}. Rows marked → are owned elsewhere; kept here for recognition only.

Name	Vector	Canonical study object	Signature failure
Rate limiter	rate, reject, out-of-band, ingress, static	token bucket + 429/Retry-After	wrong key; distributed counter skew; assumed-latency trap
Concurrency limiter	in-flight (requests), block or reject, implicit, either, static	connection pool; max in-flight	limit ≠ bottleneck; long requests hold slots; queued work expires
Flow-control window	in-flight (bytes), slow producer, explicit credit, per-hop, static	TCP/HTTP2 windows	HoL blocking; window exhaustion; stream unfairness
Bounded queue → queue.md	depth, queue-then-reject, implicit, per-hop, static	socket backlog	too big = hidden latency; too small = churn
Backpressure chain	in-flight, slow producer, credit/blocking end-to-end, chain, static	Reactive Streams; Flink	one unbounded buffer breaks it; slow-consumer global stall; deadlock
Load shedding	any, degrade/reject ladder, out-of-band, ingress, adaptive-ish	priority shed + stale-serve	shed wrong class; flapping; silent drops
Retry budget	rate of retries, reject, out-of-band, egress, static	retry-token bucket	storm; retrying non-idempotent; synchronized retries; wrong budget scope
Circuit breaker	in-flight toward dep, reject, inferred, egress, adaptive	open/half-open/closed ( state_machine.md)	false open; flapping; synchronized probes
Bulkhead → boundary.md	pool partition, isolate, structural, either, static	per-dependency pools	stranded capacity; shared downstream still collapses
Fair share → scheduler.md	capacity split, reorder, structural, ingress, static	WFQ/DRR, quotas	gaming; bad weights; idle capacity wasted
Priority/preemption → scheduler.md	order under scarcity, reorder, structural, ingress, static	PriorityClass	starvation; inversion; preemption storm
Adaptive control	(modifier), any, inferred feedback, any, adaptive	TCP congestion control; adaptive concurrency	oscillation; metric lag; controller fights

Vocabulary #

capacity  demand  arrival rate  service rate  utilization  saturation
goodput  congestion collapse  metastable failure
in-flight  queue depth  backlog  latency
token  credit  permit  window  demand(n)  prefetch
limit  quota  reservation  budget
backoff  jitter  hedging  Retry-After  429  503
deadline propagation  admission timeout  LIFO-under-overload
shed  degraded mode  bulkhead
setpoint  feedback delay  hysteresis  oscillation

Deep Lesson #

Flow-control bugs come from confusing pairs on different axes:

rate limit        vs  concurrency limit   (axis 1: Little's law couples them via latency)
queueing          vs  capacity            (axis 2: a buffer stores overload, doesn't serve it)
retry             vs  recovery            (bottleneck*: retries are self-made load)
latency           vs  load                (axis 5: the signal lags the cause)
throughput        vs  health              (bottleneck*: goodput is the real metric)
global limit      vs  tenant fairness     (scheduler.md: one gate ≠ fair shares)
autoscaling       vs  overload protection (axis 5: capacity arrives in minutes; collapse in seconds)

Design procedure: name the scarce resource and its currency, place limits at ingress AND egress, walk the chain for unbounded buffers, choose an enforceable signal where obedience matters, propagate deadlines, and argue the stability of every adaptive loop. The named types are recognition shortcuts, not the design space.