Capacity / Economics #
capacity = how much useful work resources can support
economics = how scarce resources are allocated, paid for, and traded off
The block’s founding claim:
many technically valid systems are still bad designs —
this block is where that judgment is rendered.
Role in the catalog: the evaluation layer. Not a component (like queue, scheduler) and not a protocol (like flow control, recovery) — the layer that prices the others. Scheduler allocates the ledger; backpressure defends useful work against dead work; boundary’s economic boundary says who pays; cache trades one ledger column (compute) for another (memory + staleness risk). This is where VSA stops being a lens and becomes the content.
Central tension:
performance and availability vs cost and waste
The Capacity Ledger (the core object) #
Capacity exists in stages; everything in this block is an operation on this ledger, a measurement of a gap between stages, or a policy about a gap:
physical → committed → allocated → used → useful
(exists) (reserved, (handed to (actually (produced value;
quota'd, a workload) consumed) survived amplification
promised) and wasn't dead work)
The gaps ARE the named phenomena:
physical − committed = headroom (deliberate slack)
committed − physical = overcommit exposure (the statistical bet, axis 4)
committed − used = stranded waste (reservation leak, idle quota)
used − useful = amplification + dead work
(goodput* from backpressure.md, as accounting)
Operations on the ledger:
reservation: create a committed entry before use (requests, RIs, holds)
quota: cap a tenant's committed ceiling
overcommit: deliberately let committed exceed physical
metering: observe used; attribution assigns it an owner
efficiency: raise useful/used (fight amplification)
And one subtlety the scalar ledger hides — shape:
fragmentation: free ≠ usable, because capacity is a vector
(CPU×mem×GPU per node) and demand has shape.
aggregate free capacity that fits nothing is a ledger lie.
(scheduler.md bin-packing is the mechanism side)
Interrogation (the ledger walk):
What resource is scarce, and in what unit at each stage?
For each adjacent pair of stages: how big is the gap, who owns it,
and is it deliberate (headroom) or waste (stranded)?
Is the free capacity the right SHAPE for the demand that's coming?
What fraction of used is useful — what are the amplification factors?
Design Axes #
Axis 1 — Time Horizon of the Capacity Decision #
planning: months–quarters forecasting, fleet sizing, shard counts
reservation: days–hours requests, RIs, holds
elasticity: minutes HPA/VPA/cluster autoscaler, serverless
admission: milliseconds → scheduler.md, backpressure.md
Each horizon has its own correction speed and signature failure:
planning: planned for average, not peak; wrong bottleneck; arrives late
reservation: leaked, stranded, or fragmented commitments
elasticity: metric lag, oscillation, scale-down kills warm capacity,
autoscaler fights admission (two controllers, one plant —
backpressure.md axis 5)
admission: owned elsewhere; the last line, not the plan
The seam question — the doc’s best question — lives between horizons:
what happens WHILE WAITING for more capacity?
elasticity arrives in minutes; collapse arrives in seconds.
autoscaling is not overload protection (backpressure.md);
something at the admission horizon must survive the gap.
Interrogation:
At which horizon is this decision being made, and what is its correction latency?
What covers the seam between this horizon and the faster one below it?
Peak-to-average ratio per horizon: what does the forecast assume about bursts?
Axis 2 — Who Bears Scarcity (allocation regime) #
quota: hard division; simple, strands idle capacity → scheduler.md
fair share: weighted, with borrowing of idle → scheduler.md
priority/preempt: importance decides who suffers → scheduler.md
price: the native economic tier — spot markets,
off-peak pricing, chargeback-as-incentive.
the most scalable allocator: tenants schedule
THEMSELVES when scarcity has a price.
Interrogation:
When demand exceeds supply, who is denied — and did they agree to that in advance?
Can idle allocation be borrowed, and what reclaims it when the owner returns?
Is there a price signal, or only administrative division?
Does the allocation regime survive adversarial tenants? (quota gaming, priority abuse)
Axis 3 — The Statistical Bet (overcommit family) #
CPU overcommit, memory overcommit, thin provisioning, oversubscribed network, serverless pooling — one wager:
peaks are uncorrelated, so committed may exceed physical
and multiplexing gain is pocketed.
this bet is the entire economic engine of cloud computing.
its collateral is demand diversity.
The bet’s failure is the correlated peak:
regional failover doubles a region's load everyone's midnight cron
Black Friday retry storms —
note: backpressure failures MANUFACTURE correlation;
a goodput collapse is also a correlated-peak event (blocks coupled)
Interrogation:
What is the overcommit ratio, and what demand-correlation does it assume?
What events correlate the peaks? (failover, cron, launches, retries)
When the bet loses, what is the reclaim policy — who is preempted,
OOM-killed, throttled — and is that ordering deliberate?
Is the bet priced? (spot discounts are the market paying you to be reclaimable)
Axis 4 — The Cost Feedback Loop #
metering → attribution → unit cost → budget action
A control loop ( backpressure.md axis 5, denominated in currency):
open loop: attribution only — dashboards, chargeback reports
closed loop: budget actions — throttle, degrade, hard-stop, alert
Failure modes are control failures plus one native economist:
untagged spend (unmetered = unattributable = unmanaged)
shared cost allocated unfairly; teams game the split
alerts fire after the money is gone (feedback delay)
hard stop causes an outage the budget was meant to prevent
GOODHART: teams optimize the attributed metric, not the real cost
Interrogation:
Is every scarce resource metered, in the same unit it is billed?
Is the loop open (visibility) or closed (enforcement) — chosen deliberately?
What is the budget's blast radius when it triggers? (degrade before stop)
What does the attribution metric incentivize that real cost does not?
Axis 5 — Unit Economics (the design-time judgment) #
unit cost = cost per useful operation
= (resource cost × amplification) / useful work
This is the axis that renders the block’s founding judgment:
does the design scale economically, or only technically?
The classic traps:
write/read/space amplification ignored (LSM: the canonical triangle)
cheap storage, expensive queries (cost moved, not removed)
free tier hides marginal cost until scale egress surprises
tail queries dominate the bill (P99 cost, not average cost)
efficiency that damages correctness/latency (optimized the wrong column)
index saves reads, costs writes+space (amplification traded, not erased)
Interrogation:
Cost per request / query / GB / tenant — computed, not vibes?
What are the amplification factors, and which knob trades them?
What does the COST distribution's tail look like, not just latency's?
If demand 10×es, which cost is linear, which is super-linear, which steps?
Off-Axis Seat: Denomination Mismatch #
demand arrives in user units (requests, queries, tenants, GB)
capacity is sold in supplier units (nodes, shards, IOPS, partitions)
the conversion model between them is always somewhat wrong.
This is where “wrong bottleneck,” “wrong denominator,” and “latency blamed on CPU while waiting on locks/IO” live. The USE method (utilization/saturation/errors per resource) and RED (per service) are the two audit walks across the conversion; saturation — queue length and wait time at a resource — is the signal that the conversion missed, because the true bottleneck announces itself as a queue ( queue.md: backlog is the observable).
utilization is not health: high utilization near saturation destroys
latency; low utilization may be purchased headroom. the denominator
decides the meaning.
Technical Bottleneck: The Peak* #
capacity must be provisioned for peak demand
but is paid for continuously —
and demand is bursty.
Peak-to-average ratio is the fundamental economic quantity of the block. Essential, no general solution: the peak cannot be designed away — only attacked, and every attack is the statistical bet (axis 3) in some form:
overcommit pool uncorrelated peaks
elasticity rent the peak instead of owning it (minutes of lag as the price)
spot/off-peak price demand shapes itself
batch in valleys move deferrable work off-peak
degradation ladder shave the peak's cost instead of serving it fully
Since the peak cannot be removed, the real design decision is:
who suffers when the bet loses —
reclaim/preemption ordering, OOM policy, SLO tiers,
degradation ladders (backpressure.md axis 2).
choosing deliberately is the design; discovering it in an incident is the bug.
A strong design says explicitly:
what is scarce and in what unit at each ledger stage,
how it is measured (and that the denominator is right),
who gets how much, under which regime, at which horizon,
what it costs per USEFUL operation including amplification,
and who suffers, in what order, when demand exceeds the bet.
Capacity As Protocol (the crossing-point spec — keep) #
measure demand and usage
model available capacity (with shape, not just totals)
reserve or allocate
admit, queue, or reject (→ scheduler.md, backpressure.md)
observe utilization and saturation
scale capacity or shed demand (seam coverage between horizons)
attribute cost
enforce budgets/quotas
optimize efficiency over time
Kubernetes instantiation:
Pod declares requests/limits (reservation entry on the ledger)
scheduler checks allocatable, binds (allocation; shape-aware bin-pack)
cgroups enforce limits (used bounded by committed)
metrics observe usage (metering)
HPA/CA adjust replicas/nodes (elasticity horizon)
ResourceQuota caps namespaces (quota regime)
requests≠limits gap (overcommit bet, per-node)
Cloud cost instantiation:
usage emits metering events → tags attribute → pricing converts to cost
→ budgets detect thresholds → policy alerts, throttles, or stops
Named Configurations (lookup table) #
Vector = {ledger operation, horizon, bearer regime, bet exposure, loop}. Rows marked → are owned elsewhere; kept for recognition.
| Name | Vector | Canonical study object | Signature failure |
|---|---|---|---|
| Capacity planning | forecast physical, months, —, assumes P2A ratio, open | SRE load forecasting | average-planned; wrong bottleneck; arrives late |
| Reservation | commit entry, days–hours, owner-holds, strands if idle, — | k8s requests; RIs; inventory hold | leak; stranded; fragmented commitments |
| Quota → scheduler.md | commit ceiling, standing, hard division, none, closed | ResourceQuota; API quotas | wrong key; undercount; bypass path; unmetered shared resource |
| Overcommit | committed > physical, standing, reclaim-order decides, the bet itself, — | requests vs limits; thin provisioning | correlated peak; OOM/reclaim storm; noisy neighbor |
| Elasticity | grow/shrink physical-in-use, minutes, —, rents the peak, closed | HPA + Cluster Autoscaler | metric lag; oscillation; kills warm capacity; fights admission |
| Utilization/saturation | measure used & queued, continuous, —, —, open | USE / RED methods | wrong denominator; averages hide hotspots; utilization ≠ health |
| Cost attribution | assign used to owner, continuous, —, —, open loop | OpenCost; tags/labels | untagged spend; unfair shared split; Goodhart |
| Unit economics | useful-denominated cost, design-time, —, —, judgment | LSM amplification triangle; egress pricing | amplification ignored; tail cost; cheap-storage-dear-queries |
| Budget enforcement | cap spend, near-real-time, budget-owner, —, closed loop | query cost limits; billing guardrails | hard stop = outage; alert after the money; attacker burns budget |
| Fragmentation | shape vs scalar ledger, continuous, —, —, — | k8s bin-packing leftovers; GPU stranding | “insufficient capacity” amid aggregate free; large jobs can’t place |
| Price allocation | market regime, continuous, self-scheduling tenants, priced bet, closed | spot instances; off-peak pricing | price signal ignored; spot eviction unhandled |
| Efficiency | raise useful/used, ongoing, —, —, — | cache hit ratio; compression; tiering | cost optimized against latency/correctness; amplification traded not erased |
Vocabulary #
capacity demand headroom peak-to-average
physical committed allocated used useful (the ledger)
reservation quota limit request overcommit reclaim preemption
utilization saturation bottleneck denominator
fragmentation shape bin-packing stranded
multiplexing gain correlated peak demand diversity
metering attribution chargeback tag unit cost
amplification (read/write/space) egress tail cost
budget guardrail Goodhart
spot off-peak elasticity cooldown
Deep Lesson #
Capacity bugs come from confusing pairs — mostly adjacent ledger stages:
average load vs peak load (bottleneck*: the P2A ratio IS the problem)
allocated vs used (ledger gap: stranded waste)
free vs usable (shape: fragmentation lies in scalars)
utilization vs health (denomination: saturation is the truth-teller)
autoscaling vs overload protection (axis 1 seam: minutes vs seconds)
quota vs fairness (axis 2: division ≠ justice → scheduler.md)
cheap storage vs cheap queries (axis 5: cost moved, not removed)
cost attribution vs cost control (axis 4: open loop vs closed loop)
Design procedure: walk the ledger stage by stage and name each gap’s owner, pick the horizon and cover its seam, choose who bears scarcity in advance, size and price the statistical bet, close the cost loop against Goodhart, and render the unit-economics judgment before anyone writes code. The named items are recognition shortcuts, not the design space.