Consensus / Quorum #
consensus = agree on one value or one ordered sequence of values
quorum = enough participants to make a decision safe
Consensus answers: what is the authoritative decision? Quorum answers: which subset is enough to prove it?
Role in the catalog: the arbiter — the root block. Every other block manages the staleness enemy: bounds it, fences it, names its width. This block is the only one that manufactures facts immune to it — a committed entry is agreed forever; the commit index is the one coordinate in the catalog that never lies. The catalog’s strongest guarantees are purchased here and exported:
terms / epochs → lease_fencing.md's fencing tokens, manufactured
commit index → log.md's rung 3, manufactured
linearizable reads → cache.md's freshness ladder, extended past its top
quorum-anchored authority → lease_fencing.md's top rung — the recipe that
DISSOLVES the paused holder rather than guarding it
membership decisions → who future quorums even are (self-referential seat)
Central tension:
safety under failure vs availability, latency, operational complexity
One protocol at five zoom levels (the source doc’s hidden compression): single-value consensus is the primitive; replicated log is that primitive repeated (Multi-Paxos, Raft); leader election is its internal phase; quorum replication is its commit rule; atomic broadcast is its delivery guarantee viewed from below. Raft appears in all five rows because they ARE Raft, sliced.
Design Axes (the core module) #
Axis 1 — Consensus vs Quorum-Overlap Replication (the structural cleave) #
Not strength levels of one thing — two different contracts:
consensus: ONE total order, one history. divergence impossible by
construction; "committed" is forever.
(Paxos, Raft, Zab, Spanner groups)
quorum overlap: intersection guarantees ONLY. R+W>N ensures reads meet
writes, but concurrent writes yield siblings; divergence
is EXPECTED, and repair (read-repair, anti-entropy) is a
first-class citizen, not a failure path.
(Dynamo N/R/W, Cassandra consistency levels, Riak)
The deep-lesson row “eventual repair vs consensus” is this cleave — listed there as a confusion, used here as the organizing split.
Interrogation:
Does the system promise one history, or intersecting views of many?
Are siblings/conflicts representable, and who resolves them?
Sloppy quorum: which contract did the operator THINK they bought?
Is repair machinery present and load-bearing, or an afterthought?
Axis 2 — What Is Decided #
a value: the primitive (Paxos instance)
a log suffix: ordered commands — the workhorse (Raft, Multi-Paxos)
a coordinator: leader election — consensus about who may propose cheaply
a transaction outcome: commit/abort across groups (Spanner: 2PC where every
participant's vote and the decision are THEMSELVES
consensus-replicated — curing 2PC's Unknown state by
making the coordinator immortal)
membership: the self-referential seat, below
temporary authority: the lease/quorum hybrid — owned by lease_fencing.md;
quorum grants, fence enforces, this block manufactures
the token
Interrogation:
What EXACTLY is agreed — and is anything riding on it that wasn't agreed?
(leader elected ≠ leader's clock trusted; entry committed ≠ entry applied)
Transaction outcomes: where does the decision live if the decider dies?
(2PC baseline: nowhere — that's the caution; Spanner: in a Paxos group)
Axis 3 — Fault Model (what participants may do) #
crash-stop: participants halt cleanly. majority (f+1 of 2f+1) suffices.
crash-recovery: participants halt AND RETURN — so promises must be on disk
before they are spoken. persist-before-ack lives here:
"ack before durable" and "lost accepted state" are this
model violated, not bad luck.
byzantine: participants LIE. not a separate type — the same
intersection rule recomputed for dishonest witnesses:
any two quorums must share an HONEST node, hence
2f+1-of-3f+1, signatures, and quorum certificates.
(PBFT, Tendermint/HotStuff — study only if the threat
model includes equivocation)
Interrogation:
What is persisted before each vote/ack — and is the fsync real?
Can a participant return with amnesia? (a recovered node that forgot its
promise is a Byzantine node by accident)
Does the threat model include lying — or only dying?
Axis 4 — The Read Path (the sixth strength ladder) #
How much does a read’s guarantee cost? Ordered by strength:
follower read: cheapest; stale by replica lag (a cache, honestly labeled)
R-quorum read: intersection with W — fresh IF R+W>N and versions reconcile
lease read: leader serves locally inside a time lease — fast,
and unsafe exactly where clocks lie (axis 1 of
lease_fencing.md: the bottom rung's price)
ReadIndex: leader confirms leadership with a quorum round-trip,
then serves from local state — linearizable without
writing to the log
through-the-log: the read is itself a committed entry — strongest, dearest
Interrogation:
Which rung does each read path buy, and do callers know their rung?
"Old leader serves stale read" — which rung failed, and was it the
lease rung doing exactly what its price sheet said?
Special Seat: Membership (the self-referential case) #
Consensus about who future quorums ARE. The danger is unique: the decision changes the machinery that makes decisions —
old-config quorum and new-config quorum may fail to intersect,
and then two histories can both "commit."
Recipes: joint consensus (Raft: a transitional config where BOTH quorums must agree — the bridge that guarantees intersection across the change), single-node-at-a-time changes (the intersection is arithmetic), learners/non-voting members (catch up BEFORE counting — a promoted-but- lagging voter weakens every future quorum).
The gap/overlap dial, fifth appearance: reconfiguration is ownership handoff for the quorum itself, and joint consensus is the “overlap, fenced” position.
Technical Bottlenecks: the */^o split #
This block’s bottleneck structure is the richest in the catalog, and it splits cleanly along the notation:
Safety = Intersection* #
every safety property reduces to ONE recipe:
any two quorums that matter must share a persistent, honest witness
who carries the truth forward.
One mechanism, five costumes — every safety failure in the classic lists is an intersection violation:
R+W ≤ N read and write quorums miss each other
ack-before-durable the witness FORGOT (crash-recovery violated)
correlated failure the intersection died as one
(capacity.md's statistical bet, here with
safety as the collateral — failure-domain
placement is quorum design)
unsafe reconfiguration old and new quorums fail to intersect
byzantine equivocation the shared witness must be HONEST → 3f+1
Liveness = Termination^o #
FLP: in the pure asynchronous model, no deterministic protocol
guarantees termination. NO WORKABLE RECIPE EXISTS — the catalog's
only ^o.
Everything practical is a BET on partial synchrony, not a solution: randomized timeouts, leader election with backoff, failure detectors. Election storms and dueling-proposer livelock are not bugs to fix — they are the ^o showing through, managed but never solved.
the block's contract, stated honestly:
you will never be wrong (Intersection*),
and you will probably, eventually, decide (Termination^o).
A strong design says explicitly:
what decision needs agreement (axis 2),
which quorums prove it and where they intersect (Intersection*),
what is on disk before any promise is spoken (axis 3),
which read rung each caller buys (axis 4),
how membership changes without breaking intersection,
and what the system does while Termination^o withholds a decision.
Consensus As Protocol (the crossing-point spec — keep) #
Raft instantiation (all five zoom levels in one walkthrough):
candidate increments term (fencing token minted — export)
requests votes; majority elects (leader election phase)
leader appends entries (log.md rung 1)
followers PERSIST, then ack (crash-recovery discipline)
majority replication → commit (Intersection*: quorum replication)
replicas apply in order (log.md rung 4; atomic broadcast fulfilled)
snapshots compact the prefix (checkpoint_replay + GC)
joint consensus changes membership (the self-referential seat)
Dynamo-style instantiation (the other arm of axis 1):
coordinator maps key → N replicas
write succeeds after W acks; read after R responses
R+W>N → intersection; versions reconciled (vector clocks / LWW — chosen!)
read repair + anti-entropy heal divergence (repair is first-class)
Named Configurations (lookup table) #
Vector = {contract, decision object, fault model, read rung}.
| Name | Vector | Canonical study object | Signature failure |
|---|---|---|---|
| Single-value consensus | consensus, a value, crash-recovery, — | Paxos (the root study object) | two values chosen (intersection*); livelock (^o); lost accepted state |
| Replicated log | consensus, log suffix, crash-recovery, varies | Raft | committed entry lost; uncommitted applied ( log.md’s suffix*); divergence mis-repaired |
| Leader election | consensus, coordinator, crash-recovery, — | Raft election | two leaders; old leader acts (→ lease_fencing.md: fence, don’t trust); election storm (^o) |
| Quorum replication | consensus commit rule, entries, crash-recovery, — | AppendEntries majority | ack-before-durable; quorum lost; correlated failure (placement!) |
| Dynamo R/W | overlap, versioned values, crash-stop-ish, R-quorum | N/R/W model | R+W≤N; sibling surprise; sloppy quorum ≠ bought contract; repair lag |
| Linearizable read | consensus, a read’s position, —, ReadIndex | etcd ReadIndex | follower stale read (wrong rung); lease read under clock lie |
| Membership change | consensus, the quorum itself, crash-recovery, — | Raft joint consensus | non-intersecting configs; too many at once; lagging learner promoted |
| Transaction commit | consensus, txn outcome, crash-recovery, — | Spanner 2PC-over-Paxos (2PC alone as caution) | coordinator dies in Prepared → Unknown ( state_machine.md) — cured by replicating the decider |
| Lease/quorum hybrid | consensus grants, temporary authority, —, lease read | → lease_fencing.md (Chubby, Raft leader lease) | downstream skips the token; clock skew; renewal starves under partition |
| Atomic broadcast | consensus, delivery order, crash-recovery, — | Zab | ordered-delivery gap; new leader misses committed prefix |
| Flexible quorums | either, tuned intersection, —, tunable | Flexible Paxos; Cassandra CLs | operator picks non-intersecting sets; topology failure breaks the assumption |
| Byzantine consensus | consensus, any, byzantine, certificates | PBFT; HotStuff | insufficient honest quorum; equivocation; view-change complexity |
Vocabulary #
proposer acceptor learner leader follower candidate
ballot term epoch vote promise quorum majority intersection
commit index applied index committed prefix chosen value
persist-before-ack fsync amnesia
N / R / W sibling read repair anti-entropy sloppy quorum hinted handoff
ReadIndex lease read linearizability
joint consensus learner promotion reconfiguration
quorum certificate equivocation view change
FLP partial synchrony failure detector
Deep Lesson #
Consensus bugs come from confusing pairs on different axes:
leader vs authority forever (→ lease_fencing.md: elected is not fenced)
replication vs commit (log.md rung 2 vs rung 3 — the HW is minted here)
majority ack vs application (rung 3 vs rung 4)
availability vs safety (the tension: quorum loss stops the world, correctly)
quorum read vs latest read (axis 4: name the rung)
membership update vs safe reconfiguration (the self-referential seat: intersection across change)
lease expiry vs stale owner stopped (lease_fencing.md's premise, minted here)
eventual repair vs consensus (axis 1: two contracts, not two strengths)
Design procedure: pick the contract (one history, or intersecting views), name the decision object, put every promise on disk before it is spoken, verify intersection across reads, writes, failures domains, AND reconfigurations — then state what the system does while the catalog’s only ^o withholds its answer. The named types are recognition shortcuts; five of them are one protocol, zoomed.