Log #
log = append-only sequence of records
It answers:
what happened, in what order, and from where can I replay?
Role in the catalog: the sixth component block — the storage-side substrate, and state_machine.md’s twin: a replicated state machine is literally this block composed with that one (log + deterministic apply).
The queue.md seam, declared up front. queue.md’s axis 1 cleaved remove-on-ack from retained-log; this block IS the retained-log arm, seen from inside. Ownership split:
queue.md owns the consumption discipline (claim, visibility, ack,
redelivery — who may take
work, when is it done)
log.md owns the structure itself (append, ordering, durability
ladder, retention — what
happened, what's safe to read)
The consumer offset sits ON the seam, three facets, three owners: position → this file; commit-as-progress-claim → checkpoint_replay.md; redelivery consequence → queue.md.
Central tension:
preserve history vs bound storage/cost and read amplification
Design Axes (the core module) #
Axis 1 — Record Semantics (what a record MEANS) #
The doc’s own first Big Question, promoted to the structural cleave:
command/intent: "do this" — WAL, operation log, redo log.
replay = RE-EXECUTE. idempotency is mandatory,
determinism is the contract (state_machine.md axis 2).
fact/event: "this happened" — event log, CDC, audit, ledger.
replay = RE-DERIVE. facts are not re-executed;
projections are rebuilt from them.
observation: "I saw this" — trace/diagnostic logs. best-effort,
sampled, no replay contract, no commit semantics.
EVICTED: shares only the word "log" with this block;
belongs to a future observability block.
The deep-lesson row “event vs command” is this axis, and it decides what replay even means before any machinery is chosen.
Interrogation:
Command or fact? (if you can't answer, neither can your replay logic)
Commands: is re-execution idempotent and deterministic — proven?
Facts: emitted before or after the transaction that made them true?
("event emitted before commit" = a fact about a world that never happened;
the transactional-outbox recipe exists for exactly this)
Axis 2 — The Position Ladder (the spine) #
Every record climbs, and every rung has a named coordinate:
appended assigned a position (offset, LSN, index)
durable survives the writer's crash (flush position, fsync)
committed survives leader change (high watermark, commit index,
quorum-replicated)
applied reflected in derived state (applied index, consumer offset)
The deep lesson’s best rows are rung confusions:
append ≠ commit (the tail is provisional — see bottleneck*)
commit ≠ apply (committed-but-unapplied is the replay window)
delivery ≠ processing (queue.md's territory, at the applied rung)
offset ≠ business progress (the coordinate is not the meaning)
Interrogation:
Name the four coordinates in THIS system. (if two rungs share a
coordinate, find out which failure that hides)
When is the ack sent — at which rung? (ack-before-durable is a
deliberate choice or a bug; know which)
Who is allowed to read below the committed rung, and why?
Axis 3 — Ordering Scope #
total: one sequence, one arbiter (Raft log, single WAL)
per-partition: total within, none across (Kafka; the workhorse)
per-key: within a key's residence (partition + key routing —
and only until a repartition)
per-transaction: commit-order batches (CDC transaction boundaries)
Interrogation:
What ordering does the CONSUMER's correctness actually require?
(buying total order for per-key needs is the classic overpay)
What silently breaks the scope? (partition count change re-maps keys;
cross-partition consumers see interleavings)
"Partition order vs global order" — is anyone assuming the latter?
Axis 4 — Readership Model #
single recoverer: WAL — read only at crash, by the writer's successor
converging replicas: replication log — all must apply identically,
in order (determinism inherited from axis 1)
independent consumers: streams — each tracks its own cursor;
the log doesn't know or care who reads
adversarial auditor: audit log — append-only AGAINST THE OPERATOR;
the threat model includes the admin
(a boundary.md trust property: mutation of history
is the attack, so immutability needs enforcement
stronger than convention — WORM storage, external
anchoring, signed batches)
CDC = the replication-log readership extended ACROSS a trust boundary to external consumers — which is why its signature failures are schema-contract failures, not replication failures.
Interrogation:
Who reads, and does the log know them? (registered replicas vs anonymous cursors)
Auditor case: what stops the operator from rewriting history?
CDC: who owns the schema contract, and what breaks when it evolves?
Snapshot + stream handoff: gap or overlap at the seam? (Debezium's
hardest problem — the same gap/overlap dial as lease_fencing.md handoff)
Axis 5 — Retention Authority (imports, fully deferred) #
full history: keep everything; pay forever
compacted: latest-per-key — equivalence rung owned by
garbage_collection.md Part II
bounded retention: the checkpoint_replay.md treaty — retention floor =
oldest checkpoint anyone might restore from;
GC executes, the pin registry adjudicates
Nothing native here; the log is the object of those two blocks' protocols. Likewise: workflow history → checkpoint_replay.md’s event-sourced row; control-plane revision log → its control-plane row.
Special Seat: The Ledger #
The one genuinely novel resident — a log with a semantic invariant ON THE RECORDS THEMSELVES:
entries must balance (double-entry: debits = credits, per transaction)
idempotency keys scoped to BUSINESS operations, not transport retries
history is never mutated — corrections are new compensating entries
A ledger polices its content, not just its order: append-time invariant checking is part of the protocol, and “unbalanced entry” is a rejected append, not a detected corruption. (TigerBeetle, double-entry models.) It is also the purest fact-log: axis 1’s “facts are corrected by new facts, never edited” is the ledger’s whole discipline.
Technical Bottleneck: The Uncommitted Suffix* #
the log's brand is immutability — but its TAIL is provisional.
appended-but-uncommitted records can be truncated on leader change,
lost on crash, or ambiguous in status.
the log lies about its own tail.
A staleness-family member with an inversion: not a stale copy of truth, but a PREMATURE claim of it. Every classic failure in the source doc is a reader or writer trusting a rung the record hadn’t reached:
follower applies uncommitted entry (read below the committed rung)
leader loses acknowledged write (ack sent from the appended rung)
ack before durable (same, one rung lower)
committed position ambiguous (no fence between the log's two natures)
partial record after crash (the appended rung's own torn edge)
Known recipes:
high watermark expose only the committed prefix — the fence
between the log's immutable body and provisional
tail (flagship: consumers above the HW never see
a truncation)
quorum-commit-then-ack the ack IS the rung-3 certificate
flush-before-ack rung-2 discipline for single-node durability
epoch-stamped entries lease_fencing.md's token, per-record: a deposed
leader's divergent suffix is identifiable and
truncatable, and its late appends are rejectable
torn-write detection CRC per record; the appended rung's own edge
must be findable after a crash
A strong design says explicitly:
what a record means (command or fact),
what ordering scope readers may assume,
the four coordinates of the position ladder and where the ack sits,
who may read below the high watermark (ideally: no one),
and what history is safe to discard (per the treaty, not per a hunch).
Log As Protocol (the crossing-point spec — keep) #
append record → assign position
replicate / flush
ack at a NAMED rung
read/fetch from position (at or below the committed fence)
advance consumer cursor (progress claim → checkpoint_replay.md)
replay after checkpoint
retain / compact / delete (→ GC, under the treaty)
handle truncation and leader change (epoch discipline)
Kafka instantiation:
Produce → leader appends (rung 1)
replicas fetch → high watermark advances (rung 3)
consumer Fetches ≤ HW; commits offset separately (rung 4, its own ladder)
retention/compaction rewrite old segments (GC's publication protocol)
Raft instantiation:
leader appends (rung 1) → AppendEntries → quorum (rung 3: commit index)
state machines apply in order (rung 4: applied index)
divergent suffixes truncated by term comparison (epoch recipe)
snapshots compact the applied prefix (checkpoint_replay + GC)
Named Configurations (lookup table) #
Vector = {semantics, ladder discipline, ordering, readership, retention}. Rows marked → are owned elsewhere.
| Name | Vector | Canonical study object | Signature failure |
|---|---|---|---|
| WAL | command, flush-before-page, total, single recoverer, checkpoint-truncated | PostgreSQL WAL | page-before-WAL; early truncation; non-idempotent replay; torn record |
| Commit log | either, quorum/flush-then-ack, per-partition, replicas or consumers, bounded | Kafka partition; Cassandra commitlog | ack-before-durable; ambiguous commit position* |
| Replication log | command, commit-index fence, total, converging replicas, snapshot-compacted | Raft log | follower applies uncommitted*; divergent-suffix confusion; unbounded lag |
| Event log | fact, outbox-disciplined, per-key, independent consumers, retention treaty | Kafka + Schema Registry | emitted-before-commit; schema drift; consumers lag past retention |
| Message stream | either, HW fence, per-partition, independent cursors, bounded → queue.md seam | Produce/Fetch/OffsetCommit | early offset commit (checkpoint_replay); hot partition; rebalance duplicates (lease_fencing) |
| Audit log | fact, append-only vs the operator, total-ish, adversarial auditor, legal holds | CloudTrail; k8s audit | missing denials; mutable entries; disabled during incident; secrets in payload |
| Workflow history → checkpoint_replay.md | fact+command markers, —, per-workflow, single replayer, continue-as-new | Temporal history | non-deterministic replay; unbounded history |
| CDC log | fact (row changes), transaction boundaries, per-txn, cross-boundary consumers, position-fragile | Debezium over Postgres | schema breaks consumer; lost position; snapshot/stream gap-or-overlap; double apply |
| Compacted log → GC Part II | fact latest-per-key, —, per-key, rebuilders, keyed-latest rung | Kafka compaction | early tombstone removal; needed history gone |
| Control-plane log → checkpoint_replay.md | fact (revisions), —, total per store, watchers, compacted revisions | etcd watch/revision | watch past compaction → relist; stale event applied |
| Ledger | fact + append-time invariant, strict durability, total per account set, reconcilers, forever | double-entry; TigerBeetle | double spend; unbalanced entry; idempotency key mis-scoped; history edited |
| Trace log — evicted | observation, best-effort, none, humans, sampled | OpenTelemetry | (→ future observability block) |
Vocabulary #
append record entry position offset LSN index sequence
flush fsync durable commit apply
high watermark commit index applied index flush position
term epoch divergent suffix truncation torn write
partition segment key routing transaction boundary
cursor consumer offset replay
outbox schema contract compensating entry idempotency key
retention compaction tombstone (→ GC, checkpoint_replay)
Deep Lesson #
Log bugs come from confusing pairs on different axes:
append vs commit (axis 2 + bottleneck*: the tail is provisional)
commit vs apply (axis 2: the replay window lives between them)
delivery vs processing (the queue.md seam, at rung 4)
offset vs business progress (a coordinate is not a meaning)
partition order vs global order (axis 3: scope is bought, not assumed)
event vs command (axis 1: decides what replay IS)
audit log vs source of truth (axis 4: an audit log explains; it does not arbitrate)
retention vs recoverability (axis 5: the treaty — imported, honored)
compaction vs full history (GC Part II: name the equivalence rung)
Design procedure: declare command-or-fact, name the four ladder coordinates and pin the ack to one, state the ordering scope consumers may assume, identify the readership (and whether it includes an adversary), fence the tail with a high watermark — and let queue.md, checkpoint_replay.md, and GC govern their own facets of the seam. The named types are recognition shortcuts, not the design space.