Versioning / Compatibility #
versioning = naming which shape/semantics something has
compatibility = the ability of different versions to coexist safely
It answers:
what must keep working while everything changes?
Role in the catalog: the evolution protocol — the time-axis dual. Every previous block manages coexistence across SPACE (replicas, shards, caches: copies of state at one moment); this block manages coexistence across TIME: old and new versions of code, schema, protocol, and behavior sharing one running system during the only interval that matters — the transition. At scale, the mixed fleet is the permanent condition, not the transition.
Central tension:
evolve quickly vs keep old things working
Design Axes (the core module) #
Axis 1 — What Is Versioned #
the contract API shape, protocol messages (negotiable per session:
Kafka ApiVersions, TLS negotiation — with downgrade
attack as the security face of negotiation)
the data schema, stored bytes — THE DANGEROUS ONE, because data
outlives every binary that wrote it. log.md's "bad
schema immortal in history" was this axis's warning shot.
the behavior semantics under an unchanged shape — where the star*
lives: most compatibility machinery checks shape while
the breakage lives in meaning
the config → checkpoint_replay's control-plane row + the xDS
ACK/NACK notes, whole (version, nonce, applied-vs-
proposed, last-good)
state identity → snapshot.md's coordinate + checkpoint_replay's
binding, as arrows. one native residue: MUTABLE TAG vs
IMMUTABLE DIGEST — a tag is a pointer that moves; a
digest is index_structures.md's content-keying, and mistaking one
for the other is deploying "latest" and calling it
pinned.
Interrogation:
Which of the five is changing — and does the version stamp travel WITH
the thing it describes? (a schema ID inside the record; a version in
the message; unstamped data is future archaeology)
Is anything versioned by a tag that can move under you?
Axis 2 — The Direction Pair (the load-bearing native structure) #
Systematically confused, so state it as the asymmetry it is:
backward compatible: NEW reader, OLD data — upgrades work.
this is the direction everyone tests.
forward compatible: OLD reader, NEW data — rollbacks and mixed
fleets work. this is the direction that saves
you at 3am.
Because a rollback is precisely an old reader meeting state the new code already wrote:
"rollback impossible after new state is written"
is a FORWARD-compatibility debt, incurred silently at deploy time,
collected at incident time.
Forward-specific recipes:
unknown-field tolerance Protobuf's field-number discipline exists
for exactly this: old readers skip what they
don't know
defaults on new fields absence must mean something
write-gating NEVER write the new format until old readers
are extinct (axis 4's sequencing)
Interrogation:
For every change: can the OLD binary read what the NEW one writes?
(test the rollback, not just the upgrade)
Old writers with new readers — the mixed-fleet cross — checked too?
Axis 3 — The Skew Envelope #
Which version combinations may legally coexist, for how long, in what UPGRADE ORDER:
the envelope min/max supported skew (k8s version-skew policy:
the canonical published envelope)
the order control plane before nodes; brokers before clients;
an envelope without an order is unachievable
the two-dial case Kafka's inter-broker.protocol.version vs
log.message.format.version — the canonical study
object because it versions TWO things independently:
what brokers SPEAK vs what they WRITE.
axis 1's contract/data split, made operational —
and the reason a Kafka upgrade is two rollouts,
not one.
Interrogation:
What is the published envelope, and who enforces it before an upgrade?
Is there an order that keeps every intermediate state inside the
envelope? (if the order is impossible, the envelope is fiction)
Which components version independently — and does the runbook know?
Axis 4 — Activation Decoupling #
Deploy ≠ enable ≠ migrate — three separately-dialed moments:
deploy the binary lands (old behavior still active)
enable feature flag / gate flips (behavior changes; deploy is
boundary.md's deployment boundary; the flag is its dimmer)
migrate state moves shape (expand/contract, below)
The deep lesson’s two rows live here:
feature flag ≠ migration flags gate BEHAVIOR; migrations move STATE.
flipping a flag back does not un-write data —
rollback ≠ undo which is retry_idempotency's compensation
lesson, on the time axis: the world (here,
the disk) already saw the middle.
Interrogation:
For this change: which of the three moments does it have, and are they
independently reversible?
What flag combinations exist in production, and which were tested?
(2^N combinations; prune dead flags — a flag left forever is an
untested branch with a pager attached)
The Protocol Worth Keeping Whole: Expand / Contract #
expand add new field/table/format ALONGSIDE old
dual write write both (deliberately chosen OVERLAP — the dial's
EIGHTH appearance: paid with divergence-checking;
the alternative, hard cutover, is a GAP paid with
downtime)
backfill old rows gain the new shape (materialized.md's cutover
steps, composed)
verify divergence check BEFORE anyone depends on the new shape
switch reads consumers move to the new form
stop old writes
contract remove the old — ONLY after rollback is provably no
longer needed (axis 2's write-gating, as the final gate)
The harshest instance, seated: workflow code versioning. Temporal’s constraint is expand/contract where the “old reader” is DETERMINISTIC REPLAY OF IMMORTAL HISTORY:
old code paths can never be contracted while any live workflow's
history references them. version markers are expand/contract for
CONTROL FLOW — the branch itself is dual-written into history.
"new code cannot replay old history" is a forward-compatibility
failure against your own past self.
Technical Bottleneck: The Semantic Gap* #
every compatibility mechanism in this block checks SHAPE:
schema registries validate fields, ApiVersions negotiates messages,
semver's MAJOR bump is self-reported.
NONE of them can check MEANING.
The field that changed units. The event whose business interpretation drifted. The “compatible” schema change that silently altered what null means. The semver lie. Essential — meaning is what consumers actually depend on — and with no mechanical recipe by nature, which places it adjacent to ^o territory: meaning is not machine-checkable. Conventions partially tame it:
tombstone the identifier never reuse a field number/name for new
semantics — Protobuf's `reserved` keyword
is a TOMBSTONE FOR MEANING: GC's grace
period, applied to semantics (flagship)
version the event TYPE when meaning shifts, the name shifts —
OrderShipped_v2 is ugly and honest
contract tests pin BEHAVIOR, not shape: golden inputs and
outputs across versions
deprecation windows long enough for meaning-changes to be
NOTICED, not merely parsed
The star’s one-liner:
the wire can verify that it can read you;
only discipline verifies that it understood you.
A strong design says explicitly:
which of the five things is versioned, stamped in-band (axis 1),
both directions of compatibility — and that rollback was tested (axis 2),
the skew envelope and the order that keeps you inside it (axis 3),
the three activation moments, dialed independently (axis 4),
the expand/contract stage each migration is in, and what gates
contract,
and for meaning: the tombstoned identifiers and the contract tests —
because shape-checkers cannot catch a lie about semantics.
Versioning As Protocol (the crossing-point spec — keep) #
declare version (stamped with the artifact)
negotiate or select (contract axis; downgrade-attack aware)
read/write under that version's semantics
record version WITH durable state/messages (axis 1's stamp)
tolerate or convert old/new forms (axis 2's directions)
roll out gradually (axis 4's dials; feature gates after fleet
convergence — the Kafka sequencing)
reject incompatible versions CLEARLY (an error, not a mystery)
deprecate and remove after a window (expand/contract's gate)
Named Configurations (lookup table) #
Vector = {what, directions, envelope, activation, semantic exposure}. Rows marked → are owned elsewhere.
| Name | Vector | Canonical study object | Signature failure |
|---|---|---|---|
| API versioning | contract, both needed, deprecation window, per-endpoint, semantics-under-same-shape* | k8s apiVersion + conversion | semantics change without version bump*; version explosion; v1 immortal |
| Schema evolution | data, both + registry-enforced, per-subject rules, —, field-reuse* | Protobuf field numbers; Avro rules | reused field number*; missing default (forward debt); type reinterpretation |
| Protocol negotiation | contract, negotiated per session, feature bits, —, downgrade attack | Kafka ApiVersions; TLS | assumed-unsupported feature; downgrade attack ( policy.md’s adversary at the handshake) |
| Config version/ACK → checkpoint_replay + xDS notes | config, —, applied-vs-proposed, last-good, — | xDS version+nonce ACK/NACK | (owned: ACK-for-wrong-nonce, stale applied, poisoned plane) |
| Version identity → snapshot.md, index_structures.md | state identity, —, —, —, tag-vs-digest | Iceberg snapshot; Git commit; OCI digest | mutable tag mistaken for digest; mixed versions (torn*); ABA |
| Rolling upgrade | all five at once, forward is the test, published envelope + order, gates after convergence, — | Kafka IBP vs log format | new writes old-unreadable (forward debt); downgrade impossible; mixed-fleet semantics split |
| Data migration | data, write-gated, —, expand/contract, — | expand/contract pattern | partial backfill; dual-write divergence (the overlap’s bill); contract before rollback-safe |
| Feature flags | behavior, —, —, deploy ≠ enable, flag semantics drift | k8s feature gates | untested combinations; immortal flags; flag-off ≠ data-back (rollback ≠ undo) |
| SemVer | contract promise, self-reported(!), ranges, —, the semver lie* | SemVer + lockfiles | the lie*; transitive breaks; range too broad; lockfile drift |
| Skew envelope | meta, —, the envelope itself, upgrade order, — | k8s version-skew policy | unsupported skew in prod; impossible order; window shorter than reality |
| Event evolution | data (immortal), forward is mandatory (replay!), registry, —, meaning drift* | Schema Registry + Avro/Protobuf | old event unreadable by new consumer; “field always present” assumed; meaning changed under same type* |
| Workflow versioning | code vs own history, forward vs your past self, per-workflow, version markers, — | Temporal versioning/patching | non-deterministic replay; contracted path still referenced by live history |
Vocabulary #
version revision generation stamp in-band
backward forward the direction pair rollback-tested
skew envelope upgrade order mixed fleet two-dial
deploy enable migrate feature gate dark launch
expand dual write backfill verify switch contract
tag digest pinned latest
reserved tombstoned identifier field number
semver the semver lie deprecation window
negotiation downgrade attack feature bits
version marker replay determinism
Deep Lesson #
Versioning bugs come from confusing pairs on different axes:
deploy version vs data version (axis 1: the binary rolls back; the bytes don't)
API shape vs semantic behavior (the star*: shape-checkers can't catch meaning)
backward vs forward (axis 2: the one you test vs the one that saves you)
feature flag vs migration (axis 4: behavior dims; state moves)
rollback vs undo (axis 4: retry_idempotency's lesson, on the time axis)
tag vs immutable digest (axis 1: a pointer that moves is not a version)
old reader support vs old writer support (axis 2's cross: the mixed fleet has both)
Design procedure: stamp the version with the thing, test the rollback direction first, publish the envelope and the order that honors it, dial deploy/enable/migrate independently, run expand/contract and gate the contract on rollback-extinction — and tombstone every identifier whose meaning dies, because the wire will happily parse a lie. The named types are recognition shortcuts, not the design space.