Skip to main content
  1. concepts/

Consensus / Quorum #

consensus = agree on one value or one ordered sequence of values
quorum    = enough participants to make a decision safe

Consensus answers: what is the authoritative decision? Quorum answers: which subset is enough to prove it?

Role in the catalog: the arbiter — the root block. Every other block manages the staleness enemy: bounds it, fences it, names its width. This block is the only one that manufactures facts immune to it — a committed entry is agreed forever; the commit index is the one coordinate in the catalog that never lies. The catalog’s strongest guarantees are purchased here and exported:

terms / epochs            → lease_fencing.md's fencing tokens, manufactured
commit index              → log.md's rung 3, manufactured
linearizable reads        → cache.md's freshness ladder, extended past its top
quorum-anchored authority → lease_fencing.md's top rung — the recipe that
                            DISSOLVES the paused holder rather than guarding it
membership decisions      → who future quorums even are (self-referential seat)

Central tension:

safety under failure  vs  availability, latency, operational complexity

One protocol at five zoom levels (the source doc’s hidden compression): single-value consensus is the primitive; replicated log is that primitive repeated (Multi-Paxos, Raft); leader election is its internal phase; quorum replication is its commit rule; atomic broadcast is its delivery guarantee viewed from below. Raft appears in all five rows because they ARE Raft, sliced.


Design Axes (the core module) #

Axis 1 — Consensus vs Quorum-Overlap Replication (the structural cleave) #

Not strength levels of one thing — two different contracts:

consensus:        ONE total order, one history. divergence impossible by
                  construction; "committed" is forever.
                  (Paxos, Raft, Zab, Spanner groups)
quorum overlap:   intersection guarantees ONLY. R+W>N ensures reads meet
                  writes, but concurrent writes yield siblings; divergence
                  is EXPECTED, and repair (read-repair, anti-entropy) is a
                  first-class citizen, not a failure path.
                  (Dynamo N/R/W, Cassandra consistency levels, Riak)

The deep-lesson row “eventual repair vs consensus” is this cleave — listed there as a confusion, used here as the organizing split.

Interrogation:

Does the system promise one history, or intersecting views of many?
Are siblings/conflicts representable, and who resolves them?
Sloppy quorum: which contract did the operator THINK they bought?
Is repair machinery present and load-bearing, or an afterthought?

Axis 2 — What Is Decided #

a value:              the primitive (Paxos instance)
a log suffix:         ordered commands — the workhorse (Raft, Multi-Paxos)
a coordinator:        leader election — consensus about who may propose cheaply
a transaction outcome: commit/abort across groups (Spanner: 2PC where every
                      participant's vote and the decision are THEMSELVES
                      consensus-replicated — curing 2PC's Unknown state by
                      making the coordinator immortal)
membership:           the self-referential seat, below
temporary authority:  the lease/quorum hybrid — owned by lease_fencing.md;
                      quorum grants, fence enforces, this block manufactures
                      the token

Interrogation:

What EXACTLY is agreed — and is anything riding on it that wasn't agreed?
  (leader elected ≠ leader's clock trusted; entry committed ≠ entry applied)
Transaction outcomes: where does the decision live if the decider dies?
  (2PC baseline: nowhere — that's the caution; Spanner: in a Paxos group)

Axis 3 — Fault Model (what participants may do) #

crash-stop:       participants halt cleanly. majority (f+1 of 2f+1) suffices.
crash-recovery:   participants halt AND RETURN — so promises must be on disk
                  before they are spoken. persist-before-ack lives here:
                  "ack before durable" and "lost accepted state" are this
                  model violated, not bad luck.
byzantine:        participants LIE. not a separate type — the same
                  intersection rule recomputed for dishonest witnesses:
                  any two quorums must share an HONEST node, hence
                  2f+1-of-3f+1, signatures, and quorum certificates.
                  (PBFT, Tendermint/HotStuff — study only if the threat
                  model includes equivocation)

Interrogation:

What is persisted before each vote/ack — and is the fsync real?
Can a participant return with amnesia? (a recovered node that forgot its
  promise is a Byzantine node by accident)
Does the threat model include lying — or only dying?

Axis 4 — The Read Path (the sixth strength ladder) #

How much does a read’s guarantee cost? Ordered by strength:

follower read:      cheapest; stale by replica lag (a cache, honestly labeled)
R-quorum read:      intersection with W — fresh IF R+W>N and versions reconcile
lease read:         leader serves locally inside a time lease — fast,
                    and unsafe exactly where clocks lie (axis 1 of
                    lease_fencing.md: the bottom rung's price)
ReadIndex:          leader confirms leadership with a quorum round-trip,
                    then serves from local state — linearizable without
                    writing to the log
through-the-log:    the read is itself a committed entry — strongest, dearest

Interrogation:

Which rung does each read path buy, and do callers know their rung?
"Old leader serves stale read" — which rung failed, and was it the
  lease rung doing exactly what its price sheet said?

Special Seat: Membership (the self-referential case) #

Consensus about who future quorums ARE. The danger is unique: the decision changes the machinery that makes decisions —

old-config quorum and new-config quorum may fail to intersect,
and then two histories can both "commit."

Recipes: joint consensus (Raft: a transitional config where BOTH quorums must agree — the bridge that guarantees intersection across the change), single-node-at-a-time changes (the intersection is arithmetic), learners/non-voting members (catch up BEFORE counting — a promoted-but- lagging voter weakens every future quorum).

The gap/overlap dial, fifth appearance: reconfiguration is ownership handoff for the quorum itself, and joint consensus is the “overlap, fenced” position.


Technical Bottlenecks: the */^o split #

This block’s bottleneck structure is the richest in the catalog, and it splits cleanly along the notation:

Safety = Intersection* #

every safety property reduces to ONE recipe:
any two quorums that matter must share a persistent, honest witness
who carries the truth forward.

One mechanism, five costumes — every safety failure in the classic lists is an intersection violation:

R+W ≤ N                    read and write quorums miss each other
ack-before-durable         the witness FORGOT (crash-recovery violated)
correlated failure         the intersection died as one
                           (capacity.md's statistical bet, here with
                            safety as the collateral — failure-domain
                            placement is quorum design)
unsafe reconfiguration     old and new quorums fail to intersect
byzantine equivocation     the shared witness must be HONEST → 3f+1

Liveness = Termination^o #

FLP: in the pure asynchronous model, no deterministic protocol
guarantees termination. NO WORKABLE RECIPE EXISTS — the catalog's
only ^o.

Everything practical is a BET on partial synchrony, not a solution: randomized timeouts, leader election with backoff, failure detectors. Election storms and dueling-proposer livelock are not bugs to fix — they are the ^o showing through, managed but never solved.

the block's contract, stated honestly:
you will never be wrong (Intersection*),
and you will probably, eventually, decide (Termination^o).

A strong design says explicitly:

what decision needs agreement (axis 2),
which quorums prove it and where they intersect (Intersection*),
what is on disk before any promise is spoken (axis 3),
which read rung each caller buys (axis 4),
how membership changes without breaking intersection,
and what the system does while Termination^o withholds a decision.

Consensus As Protocol (the crossing-point spec — keep) #

Raft instantiation (all five zoom levels in one walkthrough):

candidate increments term          (fencing token minted — export)
requests votes; majority elects    (leader election phase)
leader appends entries             (log.md rung 1)
followers PERSIST, then ack        (crash-recovery discipline)
majority replication → commit      (Intersection*: quorum replication)
replicas apply in order            (log.md rung 4; atomic broadcast fulfilled)
snapshots compact the prefix       (checkpoint_replay + GC)
joint consensus changes membership (the self-referential seat)

Dynamo-style instantiation (the other arm of axis 1):

coordinator maps key → N replicas
write succeeds after W acks; read after R responses
R+W>N → intersection; versions reconciled (vector clocks / LWW — chosen!)
read repair + anti-entropy heal divergence (repair is first-class)

Named Configurations (lookup table) #

Vector = {contract, decision object, fault model, read rung}.

NameVectorCanonical study objectSignature failure
Single-value consensusconsensus, a value, crash-recovery, —Paxos (the root study object)two values chosen (intersection*); livelock (^o); lost accepted state
Replicated logconsensus, log suffix, crash-recovery, variesRaftcommitted entry lost; uncommitted applied ( log.md’s suffix*); divergence mis-repaired
Leader electionconsensus, coordinator, crash-recovery, —Raft electiontwo leaders; old leader acts (→ lease_fencing.md: fence, don’t trust); election storm (^o)
Quorum replicationconsensus commit rule, entries, crash-recovery, —AppendEntries majorityack-before-durable; quorum lost; correlated failure (placement!)
Dynamo R/Woverlap, versioned values, crash-stop-ish, R-quorumN/R/W modelR+W≤N; sibling surprise; sloppy quorum ≠ bought contract; repair lag
Linearizable readconsensus, a read’s position, —, ReadIndexetcd ReadIndexfollower stale read (wrong rung); lease read under clock lie
Membership changeconsensus, the quorum itself, crash-recovery, —Raft joint consensusnon-intersecting configs; too many at once; lagging learner promoted
Transaction commitconsensus, txn outcome, crash-recovery, —Spanner 2PC-over-Paxos (2PC alone as caution)coordinator dies in Prepared → Unknown ( state_machine.md) — cured by replicating the decider
Lease/quorum hybridconsensus grants, temporary authority, —, lease readlease_fencing.md (Chubby, Raft leader lease)downstream skips the token; clock skew; renewal starves under partition
Atomic broadcastconsensus, delivery order, crash-recovery, —Zabordered-delivery gap; new leader misses committed prefix
Flexible quorumseither, tuned intersection, —, tunableFlexible Paxos; Cassandra CLsoperator picks non-intersecting sets; topology failure breaks the assumption
Byzantine consensusconsensus, any, byzantine, certificatesPBFT; HotStuffinsufficient honest quorum; equivocation; view-change complexity

Vocabulary #

proposer  acceptor  learner  leader  follower  candidate
ballot  term  epoch  vote  promise  quorum  majority  intersection
commit index  applied index  committed prefix  chosen value
persist-before-ack  fsync  amnesia
N / R / W  sibling  read repair  anti-entropy  sloppy quorum  hinted handoff
ReadIndex  lease read  linearizability
joint consensus  learner promotion  reconfiguration
quorum certificate  equivocation  view change
FLP  partial synchrony  failure detector

Deep Lesson #

Consensus bugs come from confusing pairs on different axes:

leader              vs  authority forever      (→ lease_fencing.md: elected is not fenced)
replication         vs  commit                 (log.md rung 2 vs rung 3 — the HW is minted here)
majority ack        vs  application            (rung 3 vs rung 4)
availability        vs  safety                 (the tension: quorum loss stops the world, correctly)
quorum read         vs  latest read            (axis 4: name the rung)
membership update   vs  safe reconfiguration   (the self-referential seat: intersection across change)
lease expiry        vs  stale owner stopped    (lease_fencing.md's premise, minted here)
eventual repair     vs  consensus              (axis 1: two contracts, not two strengths)

Design procedure: pick the contract (one history, or intersecting views), name the decision object, put every promise on disk before it is spoken, verify intersection across reads, writes, failures domains, AND reconfigurations — then state what the system does while the catalog’s only ^o withholds its answer. The named types are recognition shortcuts; five of them are one protocol, zoomed.