Skip to main content
  1. concepts/

Capacity / Economics #

capacity  = how much useful work resources can support
economics = how scarce resources are allocated, paid for, and traded off

The block’s founding claim:

many technically valid systems are still bad designs —
this block is where that judgment is rendered.

Role in the catalog: the evaluation layer. Not a component (like queue, scheduler) and not a protocol (like flow control, recovery) — the layer that prices the others. Scheduler allocates the ledger; backpressure defends useful work against dead work; boundary’s economic boundary says who pays; cache trades one ledger column (compute) for another (memory + staleness risk). This is where VSA stops being a lens and becomes the content.

Central tension:

performance and availability  vs  cost and waste

The Capacity Ledger (the core object) #

Capacity exists in stages; everything in this block is an operation on this ledger, a measurement of a gap between stages, or a policy about a gap:

physical  →  committed  →  allocated  →  used  →  useful
(exists)     (reserved,     (handed to    (actually   (produced value;
              quota'd,       a workload)   consumed)    survived amplification
              promised)                                  and wasn't dead work)

The gaps ARE the named phenomena:

physical − committed   = headroom            (deliberate slack)
committed − physical   = overcommit exposure (the statistical bet, axis 4)
committed − used       = stranded waste      (reservation leak, idle quota)
used − useful          = amplification + dead work
                         (goodput* from backpressure.md, as accounting)

Operations on the ledger:

reservation:  create a committed entry before use (requests, RIs, holds)
quota:        cap a tenant's committed ceiling
overcommit:   deliberately let committed exceed physical
metering:     observe used; attribution assigns it an owner
efficiency:   raise useful/used (fight amplification)

And one subtlety the scalar ledger hides — shape:

fragmentation: free ≠ usable, because capacity is a vector
               (CPU×mem×GPU per node) and demand has shape.
               aggregate free capacity that fits nothing is a ledger lie.
               (scheduler.md bin-packing is the mechanism side)

Interrogation (the ledger walk):

What resource is scarce, and in what unit at each stage?
For each adjacent pair of stages: how big is the gap, who owns it,
  and is it deliberate (headroom) or waste (stranded)?
Is the free capacity the right SHAPE for the demand that's coming?
What fraction of used is useful — what are the amplification factors?

Design Axes #

Axis 1 — Time Horizon of the Capacity Decision #

planning:     months–quarters  forecasting, fleet sizing, shard counts
reservation:  days–hours       requests, RIs, holds
elasticity:   minutes          HPA/VPA/cluster autoscaler, serverless
admission:    milliseconds     → scheduler.md, backpressure.md

Each horizon has its own correction speed and signature failure:

planning:     planned for average, not peak; wrong bottleneck; arrives late
reservation:  leaked, stranded, or fragmented commitments
elasticity:   metric lag, oscillation, scale-down kills warm capacity,
              autoscaler fights admission (two controllers, one plant —
              backpressure.md axis 5)
admission:    owned elsewhere; the last line, not the plan

The seam question — the doc’s best question — lives between horizons:

what happens WHILE WAITING for more capacity?
elasticity arrives in minutes; collapse arrives in seconds.
autoscaling is not overload protection (backpressure.md);
something at the admission horizon must survive the gap.

Interrogation:

At which horizon is this decision being made, and what is its correction latency?
What covers the seam between this horizon and the faster one below it?
Peak-to-average ratio per horizon: what does the forecast assume about bursts?

Axis 2 — Who Bears Scarcity (allocation regime) #

quota:            hard division; simple, strands idle capacity   → scheduler.md
fair share:       weighted, with borrowing of idle               → scheduler.md
priority/preempt: importance decides who suffers                 → scheduler.md
price:            the native economic tier — spot markets,
                  off-peak pricing, chargeback-as-incentive.
                  the most scalable allocator: tenants schedule
                  THEMSELVES when scarcity has a price.

Interrogation:

When demand exceeds supply, who is denied — and did they agree to that in advance?
Can idle allocation be borrowed, and what reclaims it when the owner returns?
Is there a price signal, or only administrative division?
Does the allocation regime survive adversarial tenants? (quota gaming, priority abuse)

Axis 3 — The Statistical Bet (overcommit family) #

CPU overcommit, memory overcommit, thin provisioning, oversubscribed network, serverless pooling — one wager:

peaks are uncorrelated, so committed may exceed physical
and multiplexing gain is pocketed.
this bet is the entire economic engine of cloud computing.
its collateral is demand diversity.

The bet’s failure is the correlated peak:

regional failover doubles a region's load     everyone's midnight cron
Black Friday                                  retry storms —
note: backpressure failures MANUFACTURE correlation;
a goodput collapse is also a correlated-peak event (blocks coupled)

Interrogation:

What is the overcommit ratio, and what demand-correlation does it assume?
What events correlate the peaks? (failover, cron, launches, retries)
When the bet loses, what is the reclaim policy — who is preempted,
  OOM-killed, throttled — and is that ordering deliberate?
Is the bet priced? (spot discounts are the market paying you to be reclaimable)

Axis 4 — The Cost Feedback Loop #

metering → attribution → unit cost → budget action

A control loop ( backpressure.md axis 5, denominated in currency):

open loop:   attribution only — dashboards, chargeback reports
closed loop: budget actions — throttle, degrade, hard-stop, alert

Failure modes are control failures plus one native economist:

untagged spend (unmetered = unattributable = unmanaged)
shared cost allocated unfairly; teams game the split
alerts fire after the money is gone (feedback delay)
hard stop causes an outage the budget was meant to prevent
GOODHART: teams optimize the attributed metric, not the real cost

Interrogation:

Is every scarce resource metered, in the same unit it is billed?
Is the loop open (visibility) or closed (enforcement) — chosen deliberately?
What is the budget's blast radius when it triggers? (degrade before stop)
What does the attribution metric incentivize that real cost does not?

Axis 5 — Unit Economics (the design-time judgment) #

unit cost = cost per useful operation
          = (resource cost × amplification) / useful work

This is the axis that renders the block’s founding judgment:

does the design scale economically, or only technically?

The classic traps:

write/read/space amplification ignored     (LSM: the canonical triangle)
cheap storage, expensive queries           (cost moved, not removed)
free tier hides marginal cost until scale  egress surprises
tail queries dominate the bill             (P99 cost, not average cost)
efficiency that damages correctness/latency (optimized the wrong column)
index saves reads, costs writes+space      (amplification traded, not erased)

Interrogation:

Cost per request / query / GB / tenant — computed, not vibes?
What are the amplification factors, and which knob trades them?
What does the COST distribution's tail look like, not just latency's?
If demand 10×es, which cost is linear, which is super-linear, which steps?

Off-Axis Seat: Denomination Mismatch #

demand arrives in user units      (requests, queries, tenants, GB)
capacity is sold in supplier units (nodes, shards, IOPS, partitions)
the conversion model between them is always somewhat wrong.

This is where “wrong bottleneck,” “wrong denominator,” and “latency blamed on CPU while waiting on locks/IO” live. The USE method (utilization/saturation/errors per resource) and RED (per service) are the two audit walks across the conversion; saturation — queue length and wait time at a resource — is the signal that the conversion missed, because the true bottleneck announces itself as a queue ( queue.md: backlog is the observable).

utilization is not health: high utilization near saturation destroys
latency; low utilization may be purchased headroom. the denominator
decides the meaning.

Technical Bottleneck: The Peak* #

capacity must be provisioned for peak demand
but is paid for continuously —
and demand is bursty.

Peak-to-average ratio is the fundamental economic quantity of the block. Essential, no general solution: the peak cannot be designed away — only attacked, and every attack is the statistical bet (axis 3) in some form:

overcommit          pool uncorrelated peaks
elasticity          rent the peak instead of owning it (minutes of lag as the price)
spot/off-peak price demand shapes itself
batch in valleys    move deferrable work off-peak
degradation ladder  shave the peak's cost instead of serving it fully

Since the peak cannot be removed, the real design decision is:

who suffers when the bet loses —
reclaim/preemption ordering, OOM policy, SLO tiers,
degradation ladders (backpressure.md axis 2).
choosing deliberately is the design; discovering it in an incident is the bug.

A strong design says explicitly:

what is scarce and in what unit at each ledger stage,
how it is measured (and that the denominator is right),
who gets how much, under which regime, at which horizon,
what it costs per USEFUL operation including amplification,
and who suffers, in what order, when demand exceeds the bet.

Capacity As Protocol (the crossing-point spec — keep) #

measure demand and usage
model available capacity (with shape, not just totals)
reserve or allocate
admit, queue, or reject          (→ scheduler.md, backpressure.md)
observe utilization and saturation
scale capacity or shed demand    (seam coverage between horizons)
attribute cost
enforce budgets/quotas
optimize efficiency over time

Kubernetes instantiation:

Pod declares requests/limits          (reservation entry on the ledger)
scheduler checks allocatable, binds   (allocation; shape-aware bin-pack)
cgroups enforce limits                (used bounded by committed)
metrics observe usage                 (metering)
HPA/CA adjust replicas/nodes          (elasticity horizon)
ResourceQuota caps namespaces         (quota regime)
requests≠limits gap                   (overcommit bet, per-node)

Cloud cost instantiation:

usage emits metering events → tags attribute → pricing converts to cost
→ budgets detect thresholds → policy alerts, throttles, or stops

Named Configurations (lookup table) #

Vector = {ledger operation, horizon, bearer regime, bet exposure, loop}. Rows marked → are owned elsewhere; kept for recognition.

NameVectorCanonical study objectSignature failure
Capacity planningforecast physical, months, —, assumes P2A ratio, openSRE load forecastingaverage-planned; wrong bottleneck; arrives late
Reservationcommit entry, days–hours, owner-holds, strands if idle, —k8s requests; RIs; inventory holdleak; stranded; fragmented commitments
Quota → scheduler.mdcommit ceiling, standing, hard division, none, closedResourceQuota; API quotaswrong key; undercount; bypass path; unmetered shared resource
Overcommitcommitted > physical, standing, reclaim-order decides, the bet itself, —requests vs limits; thin provisioningcorrelated peak; OOM/reclaim storm; noisy neighbor
Elasticitygrow/shrink physical-in-use, minutes, —, rents the peak, closedHPA + Cluster Autoscalermetric lag; oscillation; kills warm capacity; fights admission
Utilization/saturationmeasure used & queued, continuous, —, —, openUSE / RED methodswrong denominator; averages hide hotspots; utilization ≠ health
Cost attributionassign used to owner, continuous, —, —, open loopOpenCost; tags/labelsuntagged spend; unfair shared split; Goodhart
Unit economicsuseful-denominated cost, design-time, —, —, judgmentLSM amplification triangle; egress pricingamplification ignored; tail cost; cheap-storage-dear-queries
Budget enforcementcap spend, near-real-time, budget-owner, —, closed loopquery cost limits; billing guardrailshard stop = outage; alert after the money; attacker burns budget
Fragmentationshape vs scalar ledger, continuous, —, —, —k8s bin-packing leftovers; GPU stranding“insufficient capacity” amid aggregate free; large jobs can’t place
Price allocationmarket regime, continuous, self-scheduling tenants, priced bet, closedspot instances; off-peak pricingprice signal ignored; spot eviction unhandled
Efficiencyraise useful/used, ongoing, —, —, —cache hit ratio; compression; tieringcost optimized against latency/correctness; amplification traded not erased

Vocabulary #

capacity  demand  headroom  peak-to-average
physical  committed  allocated  used  useful  (the ledger)
reservation  quota  limit  request  overcommit  reclaim  preemption
utilization  saturation  bottleneck  denominator
fragmentation  shape  bin-packing  stranded
multiplexing gain  correlated peak  demand diversity
metering  attribution  chargeback  tag  unit cost
amplification (read/write/space)  egress  tail cost
budget  guardrail  Goodhart
spot  off-peak  elasticity  cooldown

Deep Lesson #

Capacity bugs come from confusing pairs — mostly adjacent ledger stages:

average load        vs  peak load           (bottleneck*: the P2A ratio IS the problem)
allocated           vs  used                (ledger gap: stranded waste)
free                vs  usable              (shape: fragmentation lies in scalars)
utilization         vs  health              (denomination: saturation is the truth-teller)
autoscaling         vs  overload protection (axis 1 seam: minutes vs seconds)
quota               vs  fairness            (axis 2: division ≠ justice → scheduler.md)
cheap storage       vs  cheap queries       (axis 5: cost moved, not removed)
cost attribution    vs  cost control        (axis 4: open loop vs closed loop)

Design procedure: walk the ledger stage by stage and name each gap’s owner, pick the horizon and cover its seam, choose who bears scarcity in advance, size and price the statistical bet, close the cost loop against Goodhart, and render the unit-economics judgment before anyone writes code. The named items are recognition shortcuts, not the design space.