Skip to main content
  1. concepts/

Snapshot / Read View #

snapshot  = a coherent version of state
read view = the rules deciding what a reader may see

It answers:

which state is visible to THIS read, while everything moves?

Role in the catalog: the visibility protocol — the reader’s side of the story. checkpoint_replay.md owns how state is captured; GC owns when it dies; this block owns what a reader is ENTITLED TO SEE in between. Every other file’s pin registry, publication protocol, and freshness ladder silently assumed a reader holding a coherent view; this is the block that defines one. It closes a symmetry the catalog has had half of all along: checkpoint_replay produces the coordinate; this block consumes it.

Central tension:

coherent reads and time travel  vs  storage, metadata, and GC cost

Design Axes (the core module) #

Axis 1 — The Visibility Coordinate (the structural cleave) #

What SINGLE VALUE determines what this read sees?

read timestamp / version    MVCC, FDB read version, Spanner timestamp
manifest / snapshot pointer Iceberg, Delta, OCI image manifest
segment set                 Lucene IndexReader over committed segments
log index                   Raft snapshot's last-included index
config version + nonce      xDS (checkpoint_replay's control-plane row)
projection offset           read models (materialized.md)

This is checkpoint_replay.md’s binding coordinate*, READ-SIDE — the same coordinate, consumed instead of produced. And the deep-lesson row lives here:

a read timestamp is a POSITION IN A VERSION ORDER,
not a moment on anyone's wall clock.

Interrogation:

Name the coordinate. One value? (if a view needs two independently-
  fetched values, the star* is already loose)
Who assigns it — the reader, a coordinator, a pointer swap?
What version order is it a position in, and who arbitrates that order?
  (usually: log.md's ladder, or quorum.md's commit index)

Axis 2 — Materialization of the View #

metadata-only:  the snapshot is a LIST of immutable units
                (manifests, segment sets) — the cheapest and most
                elegant: immutability makes coherence FREE
logical:        versions interleaved in shared storage; visibility
                computed per-row against the coordinate
                (MVCC: read timestamp vs active-transaction set)
physical:       actual blocks preserved (copy-on-write, backups) —
                the dearest, and the only kind that survives the
                source's destruction

Cost gradient: physical > logical > metadata. The industry’s drift toward immutable-segment architectures (Iceberg, Lucene, LSM) is the discovery that metadata-only views are nearly free once data units stop mutating — replication.md’s immutable-object lesson, read-side: you can’t tear a view of facts.

Interrogation:

Which materialization — and is the choice priced? (COW's write
  amplification and snapshot-chain depth; MVCC's version bloat;
  manifest's metadata growth)
Does the view survive the source? (only physical does — snapshot ≠
  backup is the deep lesson's row: a manifest pointing at live storage
  restores nothing after the storage burns)

Axis 3 — Lifetime and Pinning (the holder’s side of GC’s treaty) #

GC’s pin registry, seen from the pin-HOLDER’s chair. The treaty line: GC adjudicates; this block defines what a well-behaved pin looks like — bounded, declared, released.

The two-sided failure is the axis:

held too long:      blocks vacuum, stalls compaction, inflates storage —
                    the eternal-transaction pin (GC's registry failure,
                    caused from this side)
released unclearly: the reader's data dies mid-read —
                    "segment merge deletes data during query,"
                    "old versions GC'd while reader needs them"

Interrogation:

When is the view acquired, and what EXPLICITLY releases it?
Is the pin declared to the registry, or conventional? (a reader GC
  doesn't know about is a resurrection... of the reader's error)
What bounds the hold — a timeout, a query lifetime, a session?
  (unbounded pins are how time travel becomes infinite retention)

Axis 4 — The Freshness Contract (deliberate staleness) #

The block’s founding trade, and the deep lesson’s first row promoted:

LATEST is not CONSISTENT.
a snapshot is DELIBERATELY stale by a bounded, named amount,
in exchange for coherence.

This is cache.md’s ladder with the direction reversed — the cache apologizes for staleness; the snapshot SELLS it. Sub-questions:

can this read see writes made after acquisition?  never — that's the point
can this SESSION see its own writes?              read-your-writes rides on
                                                  view refresh policy, not on
                                                  the view itself
what does "latest" mean to the caller?            usually: "the newest
                                                  COHERENT view" — which is
                                                  older than the newest write,
                                                  by exactly the publication lag

Interrogation:

Is the staleness bound NAMED to the caller (snapshot age, offset lag)?
When does a session's view refresh — per query, per transaction, never?
Does anyone believe they're reading "now"? (disabuse them in the API)

Axis 5 — What Rides Inside the View #

data only
data + schema      time travel across schema evolution — an old snapshot
                   read under a new schema is misinterpretation, not
                   history ("old schema cannot be interpreted")
data + config      the xDS case: a coherent view must include the RULES
                   for reading it (routes without their clusters is a
                   torn view of config)

And the collision seated here:

time travel vs the right to be forgotten —
GC's legal-hold decree running AGAINST the pin registry.
a snapshot preserving deleted PII is a compliance event
wearing a feature's clothes. (boundary.md's data residency,
meeting axis 3's pins head-on.)

Technical Bottleneck: Torn Visibility* #

The one failure every configuration shares:

a reader observing a MIXTURE of versions that was never a state.

Mixing manifests across snapshots; a query straddling a segment merge; partial config applied; mixed projection generations; a distributed snapshot cutting through an in-flight message. And its epistemics are distinct:

a STALE view was once true.
a TORN view was NEVER true — an answer describing a world that never
existed, with no timestamp to interrogate, because no timestamp HAS
that state. silent omission*'s sibling: the other lie without a clock.

Known recipes — the catalog’s atomic-publication machinery, consumed read-side:

ONE DOOR (flagship)     acquire the view through exactly one root —
                        the manifest pointer, the searcher, the read
                        version — and derive EVERYTHING from it.
                        never assemble a view from separately-fetched
                        parts. (GC's atomic publish is what makes the
                        single door exist; this block is why it must.)
immutability below      units under the pointer never mutate — tearing
                        requires mutation, so facts can't tear
barriers                where no single pointer exists, MANUFACTURE the
                        coordinate (Chandy-Lamport, Flink — the
                        distributed cut, checkpoint_replay's scope axis)
epoch-checked assembly  when a view must span fetches, every part
                        carries the generation, and mismatch aborts
                        the read (lease_fencing's token, read-side)

The one-line discipline:

a coherent view is entered through exactly one door.

A strong design says explicitly:

the visibility coordinate, singular (axis 1),
its materialization and what that costs (axis 2),
the pin's bound, declaration, and release (axis 3),
the staleness sold, by name and amount (axis 4),
what rides inside — schema, config — and what deletion law
collides with retention (axis 5),
and the one door every reader enters through.

Snapshot As Protocol (the crossing-point spec — keep) #

select visibility coordinate (through the one door)
resolve visible state units
pin / declare to the registry
read against that view — ignoring newer state BY DESIGN
release the view, explicitly
GC proceeds once no live view needs the old data (the treaty)

MVCC instantiation:

reader obtains read timestamp
row visibility checked against snapshot's active-transaction set
writers create NEW versions (never mutate visible ones)
old versions retained while any snapshot needs them (the pin)
vacuum removes obsolete versions later (GC, adjudicating)

Manifest instantiation:

reader loads current snapshot POINTER (one door)
reads manifests + data files it references — nothing else
writer creates new files + new metadata; commit swaps the pointer
  ATOMICALLY (GC's publication protocol)
old snapshots live until expiration (visibility proof of death)

Searcher instantiation:

reader opens IndexReader over committed segments (one door)
writers create new segments; commit publishes a new SET
merges rewrite old segments — deleted only after readers release
  (the pin, per-searcher)

Named Configurations (lookup table) #

Vector = {coordinate, materialization, pin discipline, staleness contract, riders}. Rows marked → are owned elsewhere.

NameVectorCanonical study objectSignature failure
MVCC read viewread timestamp, logical, active-txn set, per-txn coherent, dataPostgres MVCC; FDB read versionlong reader blocks vacuum (axis 3); write skew (SI’s known hole); “expected latest, got snapshot”
Manifest snapshotpointer, metadata-only, snapshot refs, per-pointer, data+schemaIceberg snapshot modelmixed manifests*; GC eats referenced file (registry breach); commit race → torn metadata
Copy-on-writesnapshot root, physical, refcounts, —, dataZFS/EBS snapshotsrefcount bug kills live block; chain depth; space surprise (pins are invisible rent)
Log snapshot → checkpoint_replay.mdlast-included index, state image, log-tail treaty, —, dataRaft snapshot installsnapshot/index mismatch (binding coordinate*); truncation before durable
Distributed snapshot → checkpoint_replay.mdbarrier ID, per-actor + channels, coordinated, —, data+in-flightChandy-Lamport; Flink barrierscut through a message* ; alignment backpressure
Backup/restorerestore point + log position, physical, retention policy, —, data+schemabase backup + WALnot restorable (untested = hope); missing log tail; corrupt-found-late; residency on restore
Config snapshot → checkpoint_replay + xDS notesversion+nonce, metadata, last-good, applied-vs-current, config ridesEnvoy xDS ACK/NACKpartial config applied* ; bad config poisons plane (last-good is the recipe)
Query snapshotsegment set, metadata, per-searcher pins, per-query coherent, data+schemaLucene IndexReaderquery straddles a merge*; schema change mid-query; stale routing on one server
Time travelhistorical pointer, metadata over retained files, long pins, deliberately old, data+schema+lawIceberg time travel; Git checkouthistory expired; old schema unreadable; deletion-law collision (axis 5)
Read-model snapshot → materialized.mdprojection offset, —, —, lag named, dataKStreams store + changelog offsetmixed generations*; “consumer assumes fresh” (axis 4’s disabusal, skipped)

Vocabulary #

snapshot  read view  visibility  coordinate  version order
read timestamp  active transaction set  manifest  pointer  root
segment set  searcher  refcount  copy-on-write  chain
pin  declare  release  bounded hold  vacuum
one door  torn view  mixed generations
deliberate staleness  view refresh  read-your-writes
time travel  AS OF  schema-at-time  deletion collision
last-good config  applied vs current

Deep Lesson #

Snapshot bugs come from confusing pairs on different axes:

latest              vs  consistent          (axis 4: the founding trade)
snapshot            vs  backup              (axis 2: only physical survives the source)
manifest pointer    vs  data durability     (axis 2 + GC's treaty: a pointer is a promise the registry must keep)
read timestamp      vs  wall-clock time     (axis 1: a position, not a moment)
cache snapshot      vs  authoritative state (cache.md: the view is honest about being a copy)
time travel         vs  infinite retention  (axis 3: pins are rent; axis 5: and sometimes illegal)
old version         vs  garbage             (GC's proof of death: a pinned version is neither)

Design procedure: name the one coordinate and the one door, choose the materialization with its bill, bound and declare every pin, sell the staleness by name, list what rides inside the view and what law collides with keeping it — and never let a reader assemble reality from two separately-fetched parts, because the world they’d see never happened. The named types are recognition shortcuts, not the design space.