Cache #
cache = derived local copy of truth, held under a freshness contract
It looks like a performance trick. It is actually the deliberate manufacture of staleness, sold for latency:
every other block fights the gap between recorded claim and moving world.
a cache CREATES that gap on purpose and manages the proceeds.
Role in the catalog: the convergence block. Cache is where four other blocks’ machinery meets in one artifact — staleness ( scheduler.md), revocation ( policy.md), the commit point ( queue.md*), and tenant scope (boundary.md). “Cache invalidation and naming things” are not new problems; they are revocation and key-completeness, respectively.
Central tension:
latency / cost / origin-protection vs freshness / correctness
Design Axes (the core module) #
Axis 1 — Freshness Contract (the structural cleave — a strength ladder) #
How stale may the copy be, and what enforces the bound? Ordered by strength:
TTL: staleness bounded by a timer, disconnected from actual change —
honest name: "a staleness bound we hope is acceptable"
(convention-strength freshness)
invalidation: source pushes change notice; best-effort unless delivery
is guaranteed (and it almost never is — see bottleneck*)
validator: revalidate cheaply on use; staleness converted into a
cheap round trip (ETag/If-None-Match, Last-Modified)
lease/coherence: staleness PROVABLY bounded — server promises not to
change without notice, or the lease expires
(NFS leases, Chubby, CPU MESI; protocol-strength)
version-pinned: reads carry a floor version; serve only at-or-after it
(informer resourceVersion; and this is policy.md's ZOOKIE —
Zanzibar's ACL cache is a version-pinned cache; same
recipe, authority as the payload)
Interrogation:
How stale may this value be — as a number, per key class, not a shrug?
What ENFORCES that bound: hope (TTL), delivery (invalidation),
proof (lease), or a pinned floor (version)?
Can stale be served deliberately on source failure? (availability purchase)
Freshness ≠ consistency: two fresh caches can still disagree —
is that acceptable here?
Axis 2 — Write Path #
read-only: cache never accepts writes; simplest contract
write-through: cache + backing store before ack — durability precedes ack;
pays write latency and a partial-failure window between the two
write-behind: ack precedes durability — and notice what this IS:
the cache becomes HISTORY-AUTHORITATIVE for the dirty window
(checkpoint_replay.md axis 1, flipped). it inherits the whole
recovery kit: flush queue, dirty set, ordering, replay —
and the commit point* (queue.md): crash before flush = loss,
replayed flush = duplicate effect.
Sub-axis — who owns the miss path (orthogonal to write policy):
cache-aside: application reads source and fills; explicit, flexible,
N invalidation code paths to keep consistent
read-through: cache owns loading; clean abstraction, but the loader is now
a hidden dependency and stampedes concentrate inside it
Interrogation:
Do writes traverse the cache at all? (if not, invalidation is mandatory)
Write-through: what happens when cache write succeeds and store write fails?
Write-behind: what is the durability boundary, and does the caller know
their ack is a promise, not a fact?
Who fills on miss, and is the fill single-flighted?
Classic race: read-miss loads old value, concurrent write invalidates,
slow read fills stale AFTER the invalidation — what prevents the
stale overwrite? (fill with version compare, or tombstone the key)
Axis 3 — Placement / Tiers #
in-process: fastest; per-instance inconsistency; dies with the process
shared remote: Redis/Memcached — consistent across instances, but now a
network hop AND an availability dependency: "cache outage
takes down app" means it stopped being a cache and became
a tier of record without the durability to justify it
edge/CDN: near users, protects origin; copies you cannot enumerate
client: browser/app cache; copies you cannot even reach
Tiers compose; each tier carries its own axis-1 contract, and purge must walk all of them.
Interrogation:
If this cache vanishes, does the system degrade or die?
(a cache the system cannot survive without is not a cache)
Can the origin absorb a cold-start miss storm? (→ backpressure.md;
warming plan, single-flight, stampede protection)
Which tiers exist between source and reader, and does invalidation
reach every one? Which copies can you not enumerate?
Hot key on the shared tier: one shard melting while others idle
Axis 4 — Key Completeness (the security axis) #
The key must carry every input that changes the value:
tenant, user, authorization result, Vary headers, locale, schema/API version
Failures here are boundary violations, not staleness:
personalized response cached publicly
authz context omitted -> one user's authority served to everyone
(a cached authorized response is a cached DECISION — policy.md's
decision cache, with the principal missing from the key)
cross-tenant leak through a shared key space (boundary.md tenant-scope motif)
too much context in the key -> cardinality explosion, hit rate dies
Interrogation:
Enumerate the key: what determines this value? Is ALL of it in the key?
Whose eyes was this value computed for? Is that identity in the key?
What is the cardinality cost of the full key — and which context can be
dropped only because the value provably doesn't depend on it?
Axis 5 — What Can Be Cached #
values: the ordinary case
absence: negative cache (NXDOMAIN, 404, missing-key) — its own geometry:
newly-created object invisible until the negative entry dies
temporary failure cached as permanent absence
denial cached past the grant — revocation*'s mirror image:
a GRANT that cannot propagate
discipline: short TTLs, classify errors before caching them
errors: cache only errors that are facts (404), never errors that are
weather (timeout, 503)
derived: memoized computation, query results (fingerprint + params +
source versions in the key), materialized views —
→ owned by checkpoint_replay.md: a materialized view is the
snapshot+changelog composite read-side; projection lag,
double-apply, rebuild cursors all live there
Technical Bottleneck: Invalidation as Distributed Delivery* #
the source must inform every copy — including copies it may not know
about — exactly the copies affected, before anyone acts on the old value,
across failures, usually with no acknowledgment channel.
Essential, no general solution. Count the failure modes that are this one problem:
DB updated, cache not invalidated purge misses an edge copy
missed invalidation (lost message) server forgets a client (state lost)
invalidation raced by a stale fill tier N purged, tier N+1 still serving
Known recipes (the axis-1 ladder, plus delivery machinery):
climb the ladder when best-effort invalidation isn't enough,
buy leases or version pins — proof over hope
validators as backstop even if the push is lost, next use revalidates cheaply
stale-while-revalidate serve stale, refresh async — availability bought
with explicitly bounded staleness
single-flight one loader per key per miss storm; the rest wait
surrogate keys invalidate by dependency ("everything derived from
product 42"), not by enumerating URLs
tombstone + version fill kill the stale-overwrite race: fills must prove
they are newer than the invalidation they follow
Cross-reference: this bottleneck and policy.md’s revocation* are the same problem — propagating a change of truth to distributed copies faster than anyone acts on the old copy. The zookie solves both because it reframes the question from “did the purge arrive” to “how stale may this read be.”
A strong design says explicitly:
what truth is copied,
how the copy is keyed (with whose identity),
how stale it may be and what enforces the bound,
how it is invalidated or revalidated across every tier,
and what happens when the cache is wrong, cold, or gone.
Cache As Protocol (the crossing-point spec — keep) #
lookup
miss -> load (single-flight) -> fill (version-checked)
serve
refresh / revalidate
invalidate / purge (all tiers)
evict (capacity, not correctness — never confuse with invalidation)
write-through / write-behind, if writable
observe hit rate, staleness age, origin load
HTTP/CDN instantiation (the most complete freshness protocol in production):
Cache-Control (the contract) ETag / If-None-Match (validator)
Last-Modified / If-Modified-Since Vary (key completeness, axis 4)
Age (staleness made visible) stale-while-revalidate / stale-if-error
purge / surrogate keys
Named Configurations (lookup table) #
Vector = {freshness, write path, placement, key discipline, content}. Rows marked → are owned elsewhere; kept for recognition.
| Name | Vector | Canonical study object | Signature failure |
|---|---|---|---|
| In-process cache | TTL, read-only/aside, in-process, app-keyed, values+negatives | LRU map; DNS resolver | per-instance inconsistency; stampede on expiry; memory blowup |
| Distributed cache | TTL+invalidation, aside, shared remote, app-keyed, values | Redis/Memcached | outage = app outage; hot key; herd; stale replica |
| Cache-aside | any, app-owned miss path, any, app-keyed, values | GET/miss/fill pattern | forgotten invalidation path; stale-overwrite race |
| Read-through | any, cache-owned miss, any, —, values | loading cache | loader stampede; error cached as value; hidden dependency |
| Write-through | strong-ish, write-through, shared, —, values | cache+DB dual write | partial failure between the two writes; write latency |
| Write-behind | weak during dirty window, write-behind, in-process/shared, —, values | page cache + fsync | loss on crash; flush reorder; duplicate effects (commit point*) |
| Edge/CDN | TTL+validator+purge, read-only, edge tiers, Vary-keyed, values+negatives | Cache-Control/ETag/surrogate keys | purge misses a tier; wrong key leaks users (axis 4); origin stampede |
| Metadata cache | version-pinned, read-only, in-process, resource-keyed, config/routing | informer cache; xDS config (see xDS notes) | too-old version → relist; acting on stale routing |
| Negative cache | short TTL, read-only, any, keyed, absence | NXDOMAIN; 404 cache | new object invisible; weather cached as fact; denial outlives grant |
| Query/result cache | TTL+dependency versions, read-only, shared, fingerprint+authz-keyed, derived | normalized query cache | authz omitted from key; cardinality kills hit rate; source drift |
| Materialized view → checkpoint_replay.md | version-pinned via changelog, rebuild path, —, keyed, derived | snapshot+changelog composite | projection lag; double-apply; partial backfill |
| Coherent/lease cache | lease/coherence, read-only-ish, client+server, keyed, values | NFS leases; Chubby; MESI | missed invalidation; lease-expiry ambiguity ( state_machine.md ignorance*); forgotten client |
| Write-coalescing → queue.md | —, write-behind+latest-wins, in-process, keyed, dirty set | coalescing row, queue.md | lost intermediate that mattered; flush storm |
Vocabulary #
key value source of truth hit miss fill
freshness staleness TTL expiry Age
invalidation purge revalidation validator ETag Vary
lease version pin zookie surrogate key
eviction (capacity) vs invalidation (correctness)
negative entry tombstone dirty entry flush durability boundary
single-flight stampede stale-while-revalidate stale-if-error
cache-aside read-through write-through write-behind
warming cold start hit rate cardinality
Deep Lesson #
Cache bugs come from confusing pairs on different axes:
cache vs source of truth (axis 2: write-behind quietly inverts this)
TTL vs correctness (axis 1: a timer is hope, not proof)
key vs full context (axis 4: the security axis)
invalidation vs deletion/eviction (correctness signal vs capacity policy)
freshness vs consistency (two fresh copies can disagree)
local cache vs global truth (axis 3: per-instance worlds)
negative cache vs permanent absence (axis 5: absence has a shelf life)
write-behind vs durable write (commit point*: an ack is a promise)
Design procedure: name the source of truth, enumerate the key (with the principal in it), choose a rung on the freshness ladder and say what enforces it, walk every tier for purge reach, single-flight the misses, and decide out loud whether the system survives the cache’s absence. The named types are recognition shortcuts, not the design space.