Skip to main content
  1. concepts/

Versioning / Compatibility #

versioning    = naming which shape/semantics something has
compatibility = the ability of different versions to coexist safely

It answers:

what must keep working while everything changes?

Role in the catalog: the evolution protocol — the time-axis dual. Every previous block manages coexistence across SPACE (replicas, shards, caches: copies of state at one moment); this block manages coexistence across TIME: old and new versions of code, schema, protocol, and behavior sharing one running system during the only interval that matters — the transition. At scale, the mixed fleet is the permanent condition, not the transition.

Central tension:

evolve quickly  vs  keep old things working

Design Axes (the core module) #

Axis 1 — What Is Versioned #

the contract   API shape, protocol messages (negotiable per session:
               Kafka ApiVersions, TLS negotiation — with downgrade
               attack as the security face of negotiation)
the data       schema, stored bytes — THE DANGEROUS ONE, because data
               outlives every binary that wrote it. log.md's "bad
               schema immortal in history" was this axis's warning shot.
the behavior   semantics under an unchanged shape — where the star*
               lives: most compatibility machinery checks shape while
               the breakage lives in meaning
the config     → checkpoint_replay's control-plane row + the xDS
               ACK/NACK notes, whole (version, nonce, applied-vs-
               proposed, last-good)
state identity → snapshot.md's coordinate + checkpoint_replay's
               binding, as arrows. one native residue: MUTABLE TAG vs
               IMMUTABLE DIGEST — a tag is a pointer that moves; a
               digest is index_structures.md's content-keying, and mistaking one
               for the other is deploying "latest" and calling it
               pinned.

Interrogation:

Which of the five is changing — and does the version stamp travel WITH
  the thing it describes? (a schema ID inside the record; a version in
  the message; unstamped data is future archaeology)
Is anything versioned by a tag that can move under you?

Axis 2 — The Direction Pair (the load-bearing native structure) #

Systematically confused, so state it as the asymmetry it is:

backward compatible:  NEW reader, OLD data — upgrades work.
                      this is the direction everyone tests.
forward compatible:   OLD reader, NEW data — rollbacks and mixed
                      fleets work. this is the direction that saves
                      you at 3am.

Because a rollback is precisely an old reader meeting state the new code already wrote:

"rollback impossible after new state is written"
is a FORWARD-compatibility debt, incurred silently at deploy time,
collected at incident time.

Forward-specific recipes:

unknown-field tolerance   Protobuf's field-number discipline exists
                          for exactly this: old readers skip what they
                          don't know
defaults on new fields    absence must mean something
write-gating              NEVER write the new format until old readers
                          are extinct (axis 4's sequencing)

Interrogation:

For every change: can the OLD binary read what the NEW one writes?
  (test the rollback, not just the upgrade)
Old writers with new readers — the mixed-fleet cross — checked too?

Axis 3 — The Skew Envelope #

Which version combinations may legally coexist, for how long, in what UPGRADE ORDER:

the envelope        min/max supported skew (k8s version-skew policy:
                    the canonical published envelope)
the order           control plane before nodes; brokers before clients;
                    an envelope without an order is unachievable
the two-dial case   Kafka's inter-broker.protocol.version vs
                    log.message.format.version — the canonical study
                    object because it versions TWO things independently:
                    what brokers SPEAK vs what they WRITE.
                    axis 1's contract/data split, made operational —
                    and the reason a Kafka upgrade is two rollouts,
                    not one.

Interrogation:

What is the published envelope, and who enforces it before an upgrade?
Is there an order that keeps every intermediate state inside the
  envelope? (if the order is impossible, the envelope is fiction)
Which components version independently — and does the runbook know?

Axis 4 — Activation Decoupling #

Deploy ≠ enable ≠ migrate — three separately-dialed moments:

deploy    the binary lands (old behavior still active)
enable    feature flag / gate flips (behavior changes; deploy is
          boundary.md's deployment boundary; the flag is its dimmer)
migrate   state moves shape (expand/contract, below)

The deep lesson’s two rows live here:

feature flag ≠ migration   flags gate BEHAVIOR; migrations move STATE.
                           flipping a flag back does not un-write data —
rollback ≠ undo            which is retry_idempotency's compensation
                           lesson, on the time axis: the world (here,
                           the disk) already saw the middle.

Interrogation:

For this change: which of the three moments does it have, and are they
  independently reversible?
What flag combinations exist in production, and which were tested?
  (2^N combinations; prune dead flags — a flag left forever is an
  untested branch with a pager attached)

The Protocol Worth Keeping Whole: Expand / Contract #

expand        add new field/table/format ALONGSIDE old
dual write    write both (deliberately chosen OVERLAP — the dial's
              EIGHTH appearance: paid with divergence-checking;
              the alternative, hard cutover, is a GAP paid with
              downtime)
backfill      old rows gain the new shape (materialized.md's cutover
              steps, composed)
verify        divergence check BEFORE anyone depends on the new shape
switch reads  consumers move to the new form
stop old writes
contract      remove the old — ONLY after rollback is provably no
              longer needed (axis 2's write-gating, as the final gate)

The harshest instance, seated: workflow code versioning. Temporal’s constraint is expand/contract where the “old reader” is DETERMINISTIC REPLAY OF IMMORTAL HISTORY:

old code paths can never be contracted while any live workflow's
history references them. version markers are expand/contract for
CONTROL FLOW — the branch itself is dual-written into history.
"new code cannot replay old history" is a forward-compatibility
failure against your own past self.

Technical Bottleneck: The Semantic Gap* #

every compatibility mechanism in this block checks SHAPE:
schema registries validate fields, ApiVersions negotiates messages,
semver's MAJOR bump is self-reported.
NONE of them can check MEANING.

The field that changed units. The event whose business interpretation drifted. The “compatible” schema change that silently altered what null means. The semver lie. Essential — meaning is what consumers actually depend on — and with no mechanical recipe by nature, which places it adjacent to ^o territory: meaning is not machine-checkable. Conventions partially tame it:

tombstone the identifier   never reuse a field number/name for new
                           semantics — Protobuf's `reserved` keyword
                           is a TOMBSTONE FOR MEANING: GC's grace
                           period, applied to semantics (flagship)
version the event TYPE     when meaning shifts, the name shifts —
                           OrderShipped_v2 is ugly and honest
contract tests             pin BEHAVIOR, not shape: golden inputs and
                           outputs across versions
deprecation windows        long enough for meaning-changes to be
                           NOTICED, not merely parsed

The star’s one-liner:

the wire can verify that it can read you;
only discipline verifies that it understood you.

A strong design says explicitly:

which of the five things is versioned, stamped in-band (axis 1),
both directions of compatibility — and that rollback was tested (axis 2),
the skew envelope and the order that keeps you inside it (axis 3),
the three activation moments, dialed independently (axis 4),
the expand/contract stage each migration is in, and what gates
contract,
and for meaning: the tombstoned identifiers and the contract tests —
because shape-checkers cannot catch a lie about semantics.

Versioning As Protocol (the crossing-point spec — keep) #

declare version (stamped with the artifact)
negotiate or select (contract axis; downgrade-attack aware)
read/write under that version's semantics
record version WITH durable state/messages (axis 1's stamp)
tolerate or convert old/new forms (axis 2's directions)
roll out gradually (axis 4's dials; feature gates after fleet
  convergence — the Kafka sequencing)
reject incompatible versions CLEARLY (an error, not a mystery)
deprecate and remove after a window (expand/contract's gate)

Named Configurations (lookup table) #

Vector = {what, directions, envelope, activation, semantic exposure}. Rows marked → are owned elsewhere.

NameVectorCanonical study objectSignature failure
API versioningcontract, both needed, deprecation window, per-endpoint, semantics-under-same-shape*k8s apiVersion + conversionsemantics change without version bump*; version explosion; v1 immortal
Schema evolutiondata, both + registry-enforced, per-subject rules, —, field-reuse*Protobuf field numbers; Avro rulesreused field number*; missing default (forward debt); type reinterpretation
Protocol negotiationcontract, negotiated per session, feature bits, —, downgrade attackKafka ApiVersions; TLSassumed-unsupported feature; downgrade attack ( policy.md’s adversary at the handshake)
Config version/ACK → checkpoint_replay + xDS notesconfig, —, applied-vs-proposed, last-good, —xDS version+nonce ACK/NACK(owned: ACK-for-wrong-nonce, stale applied, poisoned plane)
Version identity → snapshot.md, index_structures.mdstate identity, —, —, —, tag-vs-digestIceberg snapshot; Git commit; OCI digestmutable tag mistaken for digest; mixed versions (torn*); ABA
Rolling upgradeall five at once, forward is the test, published envelope + order, gates after convergence, —Kafka IBP vs log formatnew writes old-unreadable (forward debt); downgrade impossible; mixed-fleet semantics split
Data migrationdata, write-gated, —, expand/contract, —expand/contract patternpartial backfill; dual-write divergence (the overlap’s bill); contract before rollback-safe
Feature flagsbehavior, —, —, deploy ≠ enable, flag semantics driftk8s feature gatesuntested combinations; immortal flags; flag-off ≠ data-back (rollback ≠ undo)
SemVercontract promise, self-reported(!), ranges, —, the semver lie*SemVer + lockfilesthe lie*; transitive breaks; range too broad; lockfile drift
Skew envelopemeta, —, the envelope itself, upgrade order, —k8s version-skew policyunsupported skew in prod; impossible order; window shorter than reality
Event evolutiondata (immortal), forward is mandatory (replay!), registry, —, meaning drift*Schema Registry + Avro/Protobufold event unreadable by new consumer; “field always present” assumed; meaning changed under same type*
Workflow versioningcode vs own history, forward vs your past self, per-workflow, version markers, —Temporal versioning/patchingnon-deterministic replay; contracted path still referenced by live history

Vocabulary #

version  revision  generation  stamp  in-band
backward  forward  the direction pair  rollback-tested
skew  envelope  upgrade order  mixed fleet  two-dial
deploy  enable  migrate  feature gate  dark launch
expand  dual write  backfill  verify  switch  contract
tag  digest  pinned  latest
reserved  tombstoned identifier  field number
semver  the semver lie  deprecation window
negotiation  downgrade attack  feature bits
version marker  replay determinism

Deep Lesson #

Versioning bugs come from confusing pairs on different axes:

deploy version       vs  data version         (axis 1: the binary rolls back; the bytes don't)
API shape            vs  semantic behavior    (the star*: shape-checkers can't catch meaning)
backward             vs  forward              (axis 2: the one you test vs the one that saves you)
feature flag         vs  migration            (axis 4: behavior dims; state moves)
rollback             vs  undo                 (axis 4: retry_idempotency's lesson, on the time axis)
tag                  vs  immutable digest     (axis 1: a pointer that moves is not a version)
old reader support   vs  old writer support   (axis 2's cross: the mixed fleet has both)

Design procedure: stamp the version with the thing, test the rollback direction first, publish the envelope and the order that honors it, dial deploy/enable/migrate independently, run expand/contract and gate the contract on rollback-extinction — and tombstone every identifier whose meaning dies, because the wire will happily parse a lie. The named types are recognition shortcuts, not the design space.