Cache #

cache = derived local copy of truth, held under a freshness contract

It looks like a performance trick. It is actually the deliberate manufacture of staleness, sold for latency:

every other block fights the gap between recorded claim and moving world.
a cache CREATES that gap on purpose and manages the proceeds.

Role in the catalog: the convergence block. Cache is where four other blocks’ machinery meets in one artifact — staleness ( scheduler.md), revocation ( policy.md), the commit point ( queue.md*), and tenant scope (boundary.md). “Cache invalidation and naming things” are not new problems; they are revocation and key-completeness, respectively.

Central tension:

latency / cost / origin-protection  vs  freshness / correctness

Design Axes (the core module) #

Axis 1 — Freshness Contract (the structural cleave — a strength ladder) #

How stale may the copy be, and what enforces the bound? Ordered by strength:

TTL:              staleness bounded by a timer, disconnected from actual change —
                  honest name: "a staleness bound we hope is acceptable"
                  (convention-strength freshness)
invalidation:     source pushes change notice; best-effort unless delivery
                  is guaranteed (and it almost never is — see bottleneck*)
validator:        revalidate cheaply on use; staleness converted into a
                  cheap round trip (ETag/If-None-Match, Last-Modified)
lease/coherence:  staleness PROVABLY bounded — server promises not to
                  change without notice, or the lease expires
                  (NFS leases, Chubby, CPU MESI; protocol-strength)
version-pinned:   reads carry a floor version; serve only at-or-after it
                  (informer resourceVersion; and this is policy.md's ZOOKIE —
                  Zanzibar's ACL cache is a version-pinned cache; same
                  recipe, authority as the payload)

Interrogation:

How stale may this value be — as a number, per key class, not a shrug?
What ENFORCES that bound: hope (TTL), delivery (invalidation),
  proof (lease), or a pinned floor (version)?
Can stale be served deliberately on source failure? (availability purchase)
Freshness ≠ consistency: two fresh caches can still disagree —
  is that acceptable here?

Axis 2 — Write Path #

read-only:      cache never accepts writes; simplest contract
write-through:  cache + backing store before ack — durability precedes ack;
                pays write latency and a partial-failure window between the two
write-behind:   ack precedes durability — and notice what this IS:
                the cache becomes HISTORY-AUTHORITATIVE for the dirty window
                (checkpoint_replay.md axis 1, flipped). it inherits the whole
                recovery kit: flush queue, dirty set, ordering, replay —
                and the commit point* (queue.md): crash before flush = loss,
                replayed flush = duplicate effect.

Sub-axis — who owns the miss path (orthogonal to write policy):

cache-aside:   application reads source and fills; explicit, flexible,
               N invalidation code paths to keep consistent
read-through:  cache owns loading; clean abstraction, but the loader is now
               a hidden dependency and stampedes concentrate inside it

Interrogation:

Do writes traverse the cache at all? (if not, invalidation is mandatory)
Write-through: what happens when cache write succeeds and store write fails?
Write-behind: what is the durability boundary, and does the caller know
  their ack is a promise, not a fact?
Who fills on miss, and is the fill single-flighted?
Classic race: read-miss loads old value, concurrent write invalidates,
  slow read fills stale AFTER the invalidation — what prevents the
  stale overwrite? (fill with version compare, or tombstone the key)

Axis 3 — Placement / Tiers #

in-process:     fastest; per-instance inconsistency; dies with the process
shared remote:  Redis/Memcached — consistent across instances, but now a
                network hop AND an availability dependency: "cache outage
                takes down app" means it stopped being a cache and became
                a tier of record without the durability to justify it
edge/CDN:       near users, protects origin; copies you cannot enumerate
client:         browser/app cache; copies you cannot even reach

Tiers compose; each tier carries its own axis-1 contract, and purge must walk all of them.

Interrogation:

If this cache vanishes, does the system degrade or die?
  (a cache the system cannot survive without is not a cache)
Can the origin absorb a cold-start miss storm? (→ backpressure.md;
  warming plan, single-flight, stampede protection)
Which tiers exist between source and reader, and does invalidation
  reach every one? Which copies can you not enumerate?
Hot key on the shared tier: one shard melting while others idle

Axis 4 — Key Completeness (the security axis) #

The key must carry every input that changes the value:

tenant, user, authorization result, Vary headers, locale, schema/API version

Failures here are boundary violations, not staleness:

personalized response cached publicly
authz context omitted -> one user's authority served to everyone
  (a cached authorized response is a cached DECISION — policy.md's
   decision cache, with the principal missing from the key)
cross-tenant leak through a shared key space (boundary.md tenant-scope motif)
too much context in the key -> cardinality explosion, hit rate dies

Interrogation:

Enumerate the key: what determines this value? Is ALL of it in the key?
Whose eyes was this value computed for? Is that identity in the key?
What is the cardinality cost of the full key — and which context can be
  dropped only because the value provably doesn't depend on it?

Axis 5 — What Can Be Cached #

values:    the ordinary case
absence:   negative cache (NXDOMAIN, 404, missing-key) — its own geometry:
             newly-created object invisible until the negative entry dies
             temporary failure cached as permanent absence
             denial cached past the grant — revocation*'s mirror image:
               a GRANT that cannot propagate
           discipline: short TTLs, classify errors before caching them
errors:    cache only errors that are facts (404), never errors that are
           weather (timeout, 503)
derived:   memoized computation, query results (fingerprint + params +
           source versions in the key), materialized views —
           → owned by checkpoint_replay.md: a materialized view is the
             snapshot+changelog composite read-side; projection lag,
             double-apply, rebuild cursors all live there

Technical Bottleneck: Invalidation as Distributed Delivery* #

the source must inform every copy — including copies it may not know
about — exactly the copies affected, before anyone acts on the old value,
across failures, usually with no acknowledgment channel.

Essential, no general solution. Count the failure modes that are this one problem:

DB updated, cache not invalidated        purge misses an edge copy
missed invalidation (lost message)       server forgets a client (state lost)
invalidation raced by a stale fill       tier N purged, tier N+1 still serving

Known recipes (the axis-1 ladder, plus delivery machinery):

climb the ladder          when best-effort invalidation isn't enough,
                          buy leases or version pins — proof over hope
validators as backstop    even if the push is lost, next use revalidates cheaply
stale-while-revalidate    serve stale, refresh async — availability bought
                          with explicitly bounded staleness
single-flight             one loader per key per miss storm; the rest wait
surrogate keys            invalidate by dependency ("everything derived from
                          product 42"), not by enumerating URLs
tombstone + version fill  kill the stale-overwrite race: fills must prove
                          they are newer than the invalidation they follow

Cross-reference: this bottleneck and policy.md’s revocation* are the same problem — propagating a change of truth to distributed copies faster than anyone acts on the old copy. The zookie solves both because it reframes the question from “did the purge arrive” to “how stale may this read be.”

A strong design says explicitly:

what truth is copied,
how the copy is keyed (with whose identity),
how stale it may be and what enforces the bound,
how it is invalidated or revalidated across every tier,
and what happens when the cache is wrong, cold, or gone.

Cache As Protocol (the crossing-point spec — keep) #

lookup
miss -> load (single-flight) -> fill (version-checked)
serve
refresh / revalidate
invalidate / purge (all tiers)
evict (capacity, not correctness — never confuse with invalidation)
write-through / write-behind, if writable
observe hit rate, staleness age, origin load

HTTP/CDN instantiation (the most complete freshness protocol in production):

Cache-Control (the contract)      ETag / If-None-Match (validator)
Last-Modified / If-Modified-Since Vary (key completeness, axis 4)
Age (staleness made visible)      stale-while-revalidate / stale-if-error
purge / surrogate keys

Named Configurations (lookup table) #

Vector = {freshness, write path, placement, key discipline, content}. Rows marked → are owned elsewhere; kept for recognition.

Name	Vector	Canonical study object	Signature failure
In-process cache	TTL, read-only/aside, in-process, app-keyed, values+negatives	LRU map; DNS resolver	per-instance inconsistency; stampede on expiry; memory blowup
Distributed cache	TTL+invalidation, aside, shared remote, app-keyed, values	Redis/Memcached	outage = app outage; hot key; herd; stale replica
Cache-aside	any, app-owned miss path, any, app-keyed, values	GET/miss/fill pattern	forgotten invalidation path; stale-overwrite race
Read-through	any, cache-owned miss, any, —, values	loading cache	loader stampede; error cached as value; hidden dependency
Write-through	strong-ish, write-through, shared, —, values	cache+DB dual write	partial failure between the two writes; write latency
Write-behind	weak during dirty window, write-behind, in-process/shared, —, values	page cache + fsync	loss on crash; flush reorder; duplicate effects (commit point*)
Edge/CDN	TTL+validator+purge, read-only, edge tiers, Vary-keyed, values+negatives	Cache-Control/ETag/surrogate keys	purge misses a tier; wrong key leaks users (axis 4); origin stampede
Metadata cache	version-pinned, read-only, in-process, resource-keyed, config/routing	informer cache; xDS config (see xDS notes)	too-old version → relist; acting on stale routing
Negative cache	short TTL, read-only, any, keyed, absence	NXDOMAIN; 404 cache	new object invisible; weather cached as fact; denial outlives grant
Query/result cache	TTL+dependency versions, read-only, shared, fingerprint+authz-keyed, derived	normalized query cache	authz omitted from key; cardinality kills hit rate; source drift
Materialized view → checkpoint_replay.md	version-pinned via changelog, rebuild path, —, keyed, derived	snapshot+changelog composite	projection lag; double-apply; partial backfill
Coherent/lease cache	lease/coherence, read-only-ish, client+server, keyed, values	NFS leases; Chubby; MESI	missed invalidation; lease-expiry ambiguity ( state_machine.md ignorance*); forgotten client
Write-coalescing → queue.md	—, write-behind+latest-wins, in-process, keyed, dirty set	coalescing row, queue.md	lost intermediate that mattered; flush storm

Vocabulary #

key  value  source of truth  hit  miss  fill
freshness  staleness  TTL  expiry  Age
invalidation  purge  revalidation  validator  ETag  Vary
lease  version  pin  zookie  surrogate key
eviction (capacity)  vs  invalidation (correctness)
negative entry  tombstone  dirty entry  flush  durability boundary
single-flight  stampede  stale-while-revalidate  stale-if-error
cache-aside  read-through  write-through  write-behind
warming  cold start  hit rate  cardinality

Deep Lesson #

Cache bugs come from confusing pairs on different axes:

cache             vs  source of truth      (axis 2: write-behind quietly inverts this)
TTL               vs  correctness          (axis 1: a timer is hope, not proof)
key               vs  full context         (axis 4: the security axis)
invalidation      vs  deletion/eviction    (correctness signal vs capacity policy)
freshness         vs  consistency          (two fresh copies can disagree)
local cache       vs  global truth         (axis 3: per-instance worlds)
negative cache    vs  permanent absence    (axis 5: absence has a shelf life)
write-behind      vs  durable write        (commit point*: an ack is a promise)

Design procedure: name the source of truth, enumerate the key (with the principal in it), choose a rung on the freshness ladder and say what enforces it, walk every tier for purge reach, single-flight the misses, and decide out loud whether the system survives the cache’s absence. The named types are recognition shortcuts, not the design space.