Flow Control / Backpressure #
how does the system prevent demand from overwhelming capacity?
flow control = regulate how fast work enters or moves
backpressure = propagate downstream saturation upstream
Role in the catalog: tier-two protocol block — the dynamics layer. queue.md and scheduler.md own the statics (structures, single-gate decisions); this file owns the load-control loop across a chain of them.
Governing math (axis 1’s foundation):
Little's law: L = λW
(in-flight = arrival rate × time-in-system)
rate limits and concurrency limits are coupled through latency —
limit one, and latency variance moves the other.
Central tension:
protect the system vs serve as much useful work as possible
(and the operative word is useful — see bottleneck*)
Design Axes (the core module) #
Axis 1 — What Quantity Is Limited #
arrival rate: per-time admission (token/leaky bucket, per-user caps)
outstanding work: in-flight, however denominated —
requests (semaphore, pool, max in-flight)
bytes (TCP window, HTTP/2 WINDOW_UPDATE, prefetch)
— the same control, two currencies
waiting depth: queue occupancy (bounded queue — owned by queue.md axis 5)
memory / cost: buffer.memory, payload budgets, $-quotas
The Little’s-law consequence, derivable not memorized:
a rate limit calibrated at 10ms latency admits 100× the in-flight load
when latency degrades to 1s. rate limits assume a latency; concurrency
limits self-adjust. under latency variance, limit outstanding work.
Interrogation:
What resource is actually scarce — CPU, memory, connections, downstream capacity?
Is the limit denominated in the same currency as the scarcity?
What latency did the rate limit silently assume?
Distributed limit: is the counter consistent, and does the key match the tenant?
Axis 2 — Overload Response (what happens to the excess) #
An escalation ladder, not rival strategies — mature systems do all of these at different load levels:
slow the producer: block / withhold credit (lossless, needs obedient upstream)
queue it: absorb burst (bounded! — else latency hides overload)
reorder it: priority lanes, preemption (scheduler.md axis 2 — reference, don't restate)
degrade it: stale cache, shed features,
cheaper path (convert expensive work to cheap work)
reject it: 429/503 + Retry-After (make overload the caller's problem, politely)
Interrogation:
At what load level does each rung engage?
Is queueing bounded, and by time as well as depth?
What can be degraded before anything is refused?
Is a drop visible to the caller, or silent data loss?
Shed by priority: is the priority honest? (scheduler.md: abuse, inversion)
Axis 3 — Signal Mechanism (how upstream learns) #
Ordered by enforceability:
explicit credit: receiver grants; sender cannot exceed
(TCP window, Reactive Streams demand(n), prefetch count)
implicit blocking: bounded buffer; producer stalls on write
(bounded channels, Kafka producer on buffer.memory)
out-of-band rejection: 429/503 + Retry-After — ADVISORY; caller may obey
inferred: no signal; sender deduces from latency/loss/errors
(TCP congestion control, adaptive concurrency,
circuit breaker inferring dependency health)
The structural fact this axis carries:
credit and blocking are enforceable.
rejection is advisory.
inference is disciplined guesswork.
"backpressure ignored by producer" is only possible on the advisory rung —
if ignoring the signal is unacceptable, choose an enforceable mechanism.
Interrogation:
Who receives the signal? CAN they obey it? What happens if they don't?
Is there a hidden unbounded buffer that absorbs the signal? (see axis 4)
For inferred: what does the sender actually observe, and how stale is it?
Blocking: can the stall propagate into a deadlock? (cycles in the flow graph)
Axis 4 — Placement & Direction (the native structural content) #
ingress: protect yourself from callers
(admission control, rate limits at the gate, request queues)
egress: protect the dependency from YOU — regulate self-generated demand
(circuit breaker, retry budget, per-dependency pools, outbound limits)
Retry budgets and circuit breakers are the two egress natives: both cap demand amplification toward a struggling downstream — retries are load you manufactured precisely when capacity fell.
Propagation — the chain property. A single gate is scheduler/queue territory; a pipeline’s backpressure is end-to-end or it is nothing:
the signal chain is only as strong as its weakest hop.
one hidden unbounded buffer anywhere silently absorbs the signal
and converts overload into latency + memory instead of upstream slowdown.
this is the thesis of the block, not a footnote.
Interrogation:
Where does demand enter; where is saturation first observed; how many hops between?
Walk every hop: is its buffer bounded? (one "no" breaks the chain)
Ingress and egress both covered, or only the front door?
Cyclic topology: what prevents credit-wait deadlock?
Bulkheads (boundary.md #9): which failure classes share a pool?
Axis 5 — Static vs Adaptive (a modifier on any limit) #
static: fixed limit; simple, predictable, wrong twice a day
adaptive: feedback-controlled (adaptive concurrency, congestion control,
queue-depth autoscaling, latency-based throttling)
Adaptive imports control theory wholesale — its failure modes are a controls curriculum:
oscillation overreaction to a lagging metric
feedback delay acting on a world that has moved (view-vs-reality*, scheduler.md)
bad setpoint stable convergence to the wrong target
controller fights HPA vs adaptive concurrency vs manual limits —
two loops, one plant, no coordination
Interrogation:
What is the measured signal, the setpoint, and the actuation delay?
Why is the loop stable — argued, not assumed? (hysteresis? damping?)
What other controller acts on the same resource?
Autoscaling ≠ overload protection: what survives the minutes before capacity arrives?
Technical Bottleneck: Goodput Under Overload* #
the quantity that matters is not admitted work — it is admitted work
that completes usefully. throughput can stay high while goodput goes to zero.
The catastrophic form — congestion collapse / metastable failure — is capacity spent on already-dead work, self-sustaining:
requests queue past the caller's timeout, then get served anyway —
the caller gave up; you are doing archaeology
retries amplify demand exactly when capacity fell
synchronized probes/backoffs arrive as a thundering herd
the system stays saturated at zero goodput even after the trigger clears
The doc’s confusions “retry vs recovery,” “queueing vs capacity,” “throughput vs health” are all this one bottleneck.
Known recipes (bounded, composable, none universal):
deadline propagation carry the caller's remaining budget on every hop,
so dead work is droppable anywhere in the chain —
the flagship recipe, and the block's deepest idea
admission timeout bound queues by TIME, not just depth;
never dequeue work older than its deadline
retry budget + jitter retries ≤ fraction of originals; desynchronize
LIFO under overload newest work has the freshest deadline —
fairness sacrificed for goodput, deliberately
circuit breaking stop paying for a dependency that can't deliver
degraded modes convert expensive work to cheap work before refusing any
A strong design says explicitly:
what resource is scarce,
where each limit is enforced and in what currency,
how callers learn to slow down (and whether the signal is enforceable),
what gets queued, degraded, or shed — and how dead work is detected,
and how fairness is preserved under stress.
Flow Control As Protocol (the crossing-point spec — keep) #
measure capacity or load
admit / delay / reject work
grant credit or permit
consume credit while work is in flight
release credit on completion
adjust limits from feedback
communicate retry/backoff to the caller
Instantiations:
HTTP: 429 / 503 + Retry-After (advisory rejection)
TCP / HTTP2: window advertisement / WINDOW_UPDATE (explicit credit)
Reactive Streams: subscriber requests N (explicit credit)
broker: prefetch count, fetch max bytes,
producer buffer limits (credit + blocking)
worker pool: semaphore acquire/release (implicit blocking)
gRPC deadline: remaining budget on every hop (deadline propagation)
Named Configurations (lookup table) #
Vector = {quantity, response, signal, placement, static/adaptive}. Rows marked → are owned elsewhere; kept here for recognition only.
| Name | Vector | Canonical study object | Signature failure |
|---|---|---|---|
| Rate limiter | rate, reject, out-of-band, ingress, static | token bucket + 429/Retry-After | wrong key; distributed counter skew; assumed-latency trap |
| Concurrency limiter | in-flight (requests), block or reject, implicit, either, static | connection pool; max in-flight | limit ≠ bottleneck; long requests hold slots; queued work expires |
| Flow-control window | in-flight (bytes), slow producer, explicit credit, per-hop, static | TCP/HTTP2 windows | HoL blocking; window exhaustion; stream unfairness |
| Bounded queue → queue.md | depth, queue-then-reject, implicit, per-hop, static | socket backlog | too big = hidden latency; too small = churn |
| Backpressure chain | in-flight, slow producer, credit/blocking end-to-end, chain, static | Reactive Streams; Flink | one unbounded buffer breaks it; slow-consumer global stall; deadlock |
| Load shedding | any, degrade/reject ladder, out-of-band, ingress, adaptive-ish | priority shed + stale-serve | shed wrong class; flapping; silent drops |
| Retry budget | rate of retries, reject, out-of-band, egress, static | retry-token bucket | storm; retrying non-idempotent; synchronized retries; wrong budget scope |
| Circuit breaker | in-flight toward dep, reject, inferred, egress, adaptive | open/half-open/closed ( state_machine.md) | false open; flapping; synchronized probes |
| Bulkhead → boundary.md | pool partition, isolate, structural, either, static | per-dependency pools | stranded capacity; shared downstream still collapses |
| Fair share → scheduler.md | capacity split, reorder, structural, ingress, static | WFQ/DRR, quotas | gaming; bad weights; idle capacity wasted |
| Priority/preemption → scheduler.md | order under scarcity, reorder, structural, ingress, static | PriorityClass | starvation; inversion; preemption storm |
| Adaptive control | (modifier), any, inferred feedback, any, adaptive | TCP congestion control; adaptive concurrency | oscillation; metric lag; controller fights |
Vocabulary #
capacity demand arrival rate service rate utilization saturation
goodput congestion collapse metastable failure
in-flight queue depth backlog latency
token credit permit window demand(n) prefetch
limit quota reservation budget
backoff jitter hedging Retry-After 429 503
deadline propagation admission timeout LIFO-under-overload
shed degraded mode bulkhead
setpoint feedback delay hysteresis oscillation
Deep Lesson #
Flow-control bugs come from confusing pairs on different axes:
rate limit vs concurrency limit (axis 1: Little's law couples them via latency)
queueing vs capacity (axis 2: a buffer stores overload, doesn't serve it)
retry vs recovery (bottleneck*: retries are self-made load)
latency vs load (axis 5: the signal lags the cause)
throughput vs health (bottleneck*: goodput is the real metric)
global limit vs tenant fairness (scheduler.md: one gate ≠ fair shares)
autoscaling vs overload protection (axis 5: capacity arrives in minutes; collapse in seconds)
Design procedure: name the scarce resource and its currency, place limits at ingress AND egress, walk the chain for unbounded buffers, choose an enforceable signal where obedience matters, propagate deadlines, and argue the stability of every adaptive loop. The named types are recognition shortcuts, not the design space.