Skip to main content
  1. concepts/

Log #

log = append-only sequence of records

It answers:

what happened, in what order, and from where can I replay?

Role in the catalog: the sixth component block — the storage-side substrate, and state_machine.md’s twin: a replicated state machine is literally this block composed with that one (log + deterministic apply).

The queue.md seam, declared up front. queue.md’s axis 1 cleaved remove-on-ack from retained-log; this block IS the retained-log arm, seen from inside. Ownership split:

queue.md owns the consumption discipline   (claim, visibility, ack,
                                            redelivery — who may take
                                            work, when is it done)
log.md owns the structure itself           (append, ordering, durability
                                            ladder, retention — what
                                            happened, what's safe to read)

The consumer offset sits ON the seam, three facets, three owners: position → this file; commit-as-progress-claim → checkpoint_replay.md; redelivery consequence → queue.md.

Central tension:

preserve history  vs  bound storage/cost and read amplification

Design Axes (the core module) #

Axis 1 — Record Semantics (what a record MEANS) #

The doc’s own first Big Question, promoted to the structural cleave:

command/intent:  "do this" — WAL, operation log, redo log.
                 replay = RE-EXECUTE. idempotency is mandatory,
                 determinism is the contract (state_machine.md axis 2).
fact/event:      "this happened" — event log, CDC, audit, ledger.
                 replay = RE-DERIVE. facts are not re-executed;
                 projections are rebuilt from them.
observation:     "I saw this" — trace/diagnostic logs. best-effort,
                 sampled, no replay contract, no commit semantics.
                 EVICTED: shares only the word "log" with this block;
                 belongs to a future observability block.

The deep-lesson row “event vs command” is this axis, and it decides what replay even means before any machinery is chosen.

Interrogation:

Command or fact? (if you can't answer, neither can your replay logic)
Commands: is re-execution idempotent and deterministic — proven?
Facts: emitted before or after the transaction that made them true?
  ("event emitted before commit" = a fact about a world that never happened;
   the transactional-outbox recipe exists for exactly this)

Axis 2 — The Position Ladder (the spine) #

Every record climbs, and every rung has a named coordinate:

appended   assigned a position           (offset, LSN, index)
durable    survives the writer's crash   (flush position, fsync)
committed  survives leader change        (high watermark, commit index,
                                          quorum-replicated)
applied    reflected in derived state    (applied index, consumer offset)

The deep lesson’s best rows are rung confusions:

append ≠ commit     (the tail is provisional — see bottleneck*)
commit ≠ apply      (committed-but-unapplied is the replay window)
delivery ≠ processing (queue.md's territory, at the applied rung)
offset ≠ business progress (the coordinate is not the meaning)

Interrogation:

Name the four coordinates in THIS system. (if two rungs share a
  coordinate, find out which failure that hides)
When is the ack sent — at which rung? (ack-before-durable is a
  deliberate choice or a bug; know which)
Who is allowed to read below the committed rung, and why?

Axis 3 — Ordering Scope #

total:            one sequence, one arbiter        (Raft log, single WAL)
per-partition:    total within, none across        (Kafka; the workhorse)
per-key:          within a key's residence         (partition + key routing —
                                                    and only until a repartition)
per-transaction:  commit-order batches             (CDC transaction boundaries)

Interrogation:

What ordering does the CONSUMER's correctness actually require?
  (buying total order for per-key needs is the classic overpay)
What silently breaks the scope? (partition count change re-maps keys;
  cross-partition consumers see interleavings)
"Partition order vs global order" — is anyone assuming the latter?

Axis 4 — Readership Model #

single recoverer:      WAL — read only at crash, by the writer's successor
converging replicas:   replication log — all must apply identically,
                       in order (determinism inherited from axis 1)
independent consumers: streams — each tracks its own cursor;
                       the log doesn't know or care who reads
adversarial auditor:   audit log — append-only AGAINST THE OPERATOR;
                       the threat model includes the admin
                       (a boundary.md trust property: mutation of history
                        is the attack, so immutability needs enforcement
                        stronger than convention — WORM storage, external
                        anchoring, signed batches)

CDC = the replication-log readership extended ACROSS a trust boundary to external consumers — which is why its signature failures are schema-contract failures, not replication failures.

Interrogation:

Who reads, and does the log know them? (registered replicas vs anonymous cursors)
Auditor case: what stops the operator from rewriting history?
CDC: who owns the schema contract, and what breaks when it evolves?
Snapshot + stream handoff: gap or overlap at the seam? (Debezium's
  hardest problem — the same gap/overlap dial as lease_fencing.md handoff)

Axis 5 — Retention Authority (imports, fully deferred) #

full history:      keep everything; pay forever
compacted:         latest-per-key — equivalence rung owned by
                   garbage_collection.md Part II
bounded retention: the checkpoint_replay.md treaty — retention floor =
                   oldest checkpoint anyone might restore from;
                   GC executes, the pin registry adjudicates

Nothing native here; the log is the object of those two blocks' protocols. Likewise: workflow history → checkpoint_replay.md’s event-sourced row; control-plane revision log → its control-plane row.


Special Seat: The Ledger #

The one genuinely novel resident — a log with a semantic invariant ON THE RECORDS THEMSELVES:

entries must balance (double-entry: debits = credits, per transaction)
idempotency keys scoped to BUSINESS operations, not transport retries
history is never mutated — corrections are new compensating entries

A ledger polices its content, not just its order: append-time invariant checking is part of the protocol, and “unbalanced entry” is a rejected append, not a detected corruption. (TigerBeetle, double-entry models.) It is also the purest fact-log: axis 1’s “facts are corrected by new facts, never edited” is the ledger’s whole discipline.


Technical Bottleneck: The Uncommitted Suffix* #

the log's brand is immutability — but its TAIL is provisional.
appended-but-uncommitted records can be truncated on leader change,
lost on crash, or ambiguous in status.
the log lies about its own tail.

A staleness-family member with an inversion: not a stale copy of truth, but a PREMATURE claim of it. Every classic failure in the source doc is a reader or writer trusting a rung the record hadn’t reached:

follower applies uncommitted entry      (read below the committed rung)
leader loses acknowledged write         (ack sent from the appended rung)
ack before durable                      (same, one rung lower)
committed position ambiguous            (no fence between the log's two natures)
partial record after crash              (the appended rung's own torn edge)

Known recipes:

high watermark          expose only the committed prefix — the fence
                        between the log's immutable body and provisional
                        tail (flagship: consumers above the HW never see
                        a truncation)
quorum-commit-then-ack  the ack IS the rung-3 certificate
flush-before-ack        rung-2 discipline for single-node durability
epoch-stamped entries   lease_fencing.md's token, per-record: a deposed
                        leader's divergent suffix is identifiable and
                        truncatable, and its late appends are rejectable
torn-write detection    CRC per record; the appended rung's own edge
                        must be findable after a crash

A strong design says explicitly:

what a record means (command or fact),
what ordering scope readers may assume,
the four coordinates of the position ladder and where the ack sits,
who may read below the high watermark (ideally: no one),
and what history is safe to discard (per the treaty, not per a hunch).

Log As Protocol (the crossing-point spec — keep) #

append record → assign position
replicate / flush
ack at a NAMED rung
read/fetch from position (at or below the committed fence)
advance consumer cursor (progress claim → checkpoint_replay.md)
replay after checkpoint
retain / compact / delete (→ GC, under the treaty)
handle truncation and leader change (epoch discipline)

Kafka instantiation:

Produce → leader appends (rung 1)
replicas fetch → high watermark advances (rung 3)
consumer Fetches ≤ HW; commits offset separately (rung 4, its own ladder)
retention/compaction rewrite old segments (GC's publication protocol)

Raft instantiation:

leader appends (rung 1) → AppendEntries → quorum (rung 3: commit index)
state machines apply in order (rung 4: applied index)
divergent suffixes truncated by term comparison (epoch recipe)
snapshots compact the applied prefix (checkpoint_replay + GC)

Named Configurations (lookup table) #

Vector = {semantics, ladder discipline, ordering, readership, retention}. Rows marked → are owned elsewhere.

NameVectorCanonical study objectSignature failure
WALcommand, flush-before-page, total, single recoverer, checkpoint-truncatedPostgreSQL WALpage-before-WAL; early truncation; non-idempotent replay; torn record
Commit logeither, quorum/flush-then-ack, per-partition, replicas or consumers, boundedKafka partition; Cassandra commitlogack-before-durable; ambiguous commit position*
Replication logcommand, commit-index fence, total, converging replicas, snapshot-compactedRaft logfollower applies uncommitted*; divergent-suffix confusion; unbounded lag
Event logfact, outbox-disciplined, per-key, independent consumers, retention treatyKafka + Schema Registryemitted-before-commit; schema drift; consumers lag past retention
Message streameither, HW fence, per-partition, independent cursors, bounded → queue.md seamProduce/Fetch/OffsetCommitearly offset commit (checkpoint_replay); hot partition; rebalance duplicates (lease_fencing)
Audit logfact, append-only vs the operator, total-ish, adversarial auditor, legal holdsCloudTrail; k8s auditmissing denials; mutable entries; disabled during incident; secrets in payload
Workflow history → checkpoint_replay.mdfact+command markers, —, per-workflow, single replayer, continue-as-newTemporal historynon-deterministic replay; unbounded history
CDC logfact (row changes), transaction boundaries, per-txn, cross-boundary consumers, position-fragileDebezium over Postgresschema breaks consumer; lost position; snapshot/stream gap-or-overlap; double apply
Compacted log → GC Part IIfact latest-per-key, —, per-key, rebuilders, keyed-latest rungKafka compactionearly tombstone removal; needed history gone
Control-plane log → checkpoint_replay.mdfact (revisions), —, total per store, watchers, compacted revisionsetcd watch/revisionwatch past compaction → relist; stale event applied
Ledgerfact + append-time invariant, strict durability, total per account set, reconcilers, foreverdouble-entry; TigerBeetledouble spend; unbalanced entry; idempotency key mis-scoped; history edited
Trace log — evictedobservation, best-effort, none, humans, sampledOpenTelemetry(→ future observability block)

Vocabulary #

append  record  entry  position  offset  LSN  index  sequence
flush  fsync  durable  commit  apply
high watermark  commit index  applied index  flush position
term  epoch  divergent suffix  truncation  torn write
partition  segment  key routing  transaction boundary
cursor  consumer offset  replay
outbox  schema  contract  compensating entry  idempotency key
retention  compaction  tombstone  (→ GC, checkpoint_replay)

Deep Lesson #

Log bugs come from confusing pairs on different axes:

append            vs  commit              (axis 2 + bottleneck*: the tail is provisional)
commit            vs  apply               (axis 2: the replay window lives between them)
delivery          vs  processing          (the queue.md seam, at rung 4)
offset            vs  business progress   (a coordinate is not a meaning)
partition order   vs  global order        (axis 3: scope is bought, not assumed)
event             vs  command             (axis 1: decides what replay IS)
audit log         vs  source of truth     (axis 4: an audit log explains; it does not arbitrate)
retention         vs  recoverability      (axis 5: the treaty — imported, honored)
compaction        vs  full history        (GC Part II: name the equivalence rung)

Design procedure: declare command-or-fact, name the four ladder coordinates and pin the ack to one, state the ordering scope consumers may assume, identify the readership (and whether it includes an adversary), fence the tail with a high watermark — and let queue.md, checkpoint_replay.md, and GC govern their own facets of the seam. The named types are recognition shortcuts, not the design space.