Skip to main content
  1. System Design Components/

A08 Failure Workbook #

This workbook turns the default A08 Time-Bounded Exclusive Allocation failure modes into tiny TLA+ bug models.

Use them to internalize:

  • what the failure really is
  • whether it is a safety bug or a liveness bug
  • what invariant should catch it
  • what control prevents it

Baseline #

Checked baseline:

This is the “good” model.

Failure scenarios #

1. Duplicate hold #

Files:

Bug:

  • create hold does not check whether another active hold already exists

Invariant:

  • AtMostOneActiveHold

Insight:

  • this is a pure safety bug
  • the prevention is a guarded exclusive create / reservation
  • if this fails, no amount of later repair can recover cleanly without compensation

2. Expired hold not released #

Files:

Bug:

  • there is no expiry transition at all

Invariant:

  • NoPastDueActiveHold

Insight:

  • this looks like a safety property in the tiny model, but in real systems it is mostly a liveness problem
  • the real question is usually:
    • expired holds must be eventually releasable
  • that means scanners/reapers/checkpoint discipline matter

3. Confirm after expiry #

Files:

Bug:

  • confirm ignores expiry

Invariant:

  • NoLateConfirm

Insight:

  • this is the core time-boundary bug in A08
  • wall clock passing is not enough; commit path must check authoritative expiry truth
  • prevention is the confirm guard on current hold state and expiry boundary

4. Cancel / confirm race #

Files:

Bug:

  • confirm ignores the fact that the hold was already cancelled

Invariant:

  • CancelledHoldCanNeverConfirm

Insight:

  • this is really a stale-transition bug
  • prevention is a guarded state machine transition
  • once a terminal or conflicting transition happens, later transitions must be rejected

5. Stale availability after hold expiry #

Files:

Bug model:

  • truth changes first on expiry
  • availability view catches up later

Invariant:

  • ViewMatchesTruthImmediately

Insight:

  • this is intentionally too strong and should fail
  • this scenario is not primarily a truth corruption bug
  • it is a derived-state lag bug
  • the real system rule is:
    • browse may be stale
    • confirm path must still use truth

Overall lessons #

A08 mixes safety and liveness #

Safety-heavy:

  • duplicate hold
  • confirm after expiry
  • cancel/confirm race

Liveness-heavy:

  • expired hold not released

Derived-state/refinement:

  • stale availability after expiry

The main prevention controls #

  • guarded exclusive hold creation
  • guarded confirm transition
  • guarded cancel/expire transition
  • expiry worker or reaper
  • commit path reads source truth, not stale browse view

Why this archetype is so useful in TLA+ #

Because the bugs are easy to explain badly in prose, but small models make them precise fast.