Skip to main content
  1. System Design Components/

Design Rules Overlay For Role-Based Failure Mitigation

Design Rules Overlay For Role-Based Failure Mitigation #

Status: Archive candidate. Keep as historical reference; prefer system-design-core-index.md and the core notes for day-to-day use.

Overlay-on-overlay note; keep only as dormant reference.

Yes, the Design Rules overlay is useful for failure mitigation.

But it is useful as a secondary lens, not the primary mitigation generator.

The primary role-based failure framework should still answer:

  • what failed
  • what local control prevents corruption
  • what truth is authoritative
  • what repair restores correctness

The Design Rules overlay helps answer a different mitigation question:

  • where should the mitigation live?
  • what contract should it publish?
  • what implementation detail should stay hidden?
  • what can be substituted later without changing semantics?

Correct Positioning #

Use the layers in this order:

  1. role-based failure

    • generate the likely failure
  2. base mitigation

    • choose the concrete control:
    • CAS
    • unique constraint
    • lease
    • fencing
    • outbox
    • checkpoint
    • rebuild
    • reconciliation
  3. Design Rules overlay

    • place the control in the right module
    • define the published contract
    • keep implementation details hidden
    • identify clean substitution and evolution options

That is the right division of labor.


What The Overlay Adds #

The role-based framework tells you:

  • what is failing
  • what mechanism is needed

The overlay helps you shape that mitigation into:

  • the right authority boundary
  • the right interface contract
  • the right hidden module
  • the right substitution path
  • the right evolution move

So this is a mitigation-structure lens.

It does not replace the primary mitigation vocabulary.


Where It Helps Most #

The overlay is most useful when the mitigation is architectural rather than purely local.

Coordination failures #

Example:

  • failure: stale owner accepted
  • primary mitigation: fencing token

Overlay contribution:

  • authoritative module: lease service / owner record
  • published contract: claim/renew/release + epoch
  • hidden module: heartbeat, expiry detection, reaper
  • substitution options: DB lease -> etcd lease -> Redis lease

This makes the mitigation cleaner and more evolvable.

Derived failures #

Example:

  • failure: projection drift
  • primary mitigation: checkpointed projector + rebuild

Overlay contribution:

  • authoritative module: source truth plus projector checkpoint
  • published contract: changelog subscription + rebuild/replay semantics
  • hidden module: batching, projector scheduling, backfill internals
  • evolution move: split online projector from rebuild lane

External Effect failures #

Example:

  • failure: effect succeeded but ack lost
  • primary mitigation: outbox + reconciliation + idempotent receiver

Overlay contribution:

  • authoritative module: effect record / outbox / inbox dedup state
  • published contract: delivery id, idempotency key, ack semantics
  • hidden module: transport choice, retry scheduler, relay internals
  • substitution options: direct relay -> broker -> workflow engine

Immutable failures #

Example:

  • failure: pointer mismatch or incomplete publish
  • primary mitigation: manifest/head publish after content durable

Overlay contribution:

  • authoritative module: manifest/head record
  • published contract: publish/resolve/ref semantics
  • hidden module: blob placement, replication, GC
  • evolution move: split namespace from content store

Truth failures #

Example:

  • failure: lost update or invalid concurrent mutation
  • primary mitigation: CAS, unique constraint, or transaction

Overlay contribution:

  • authoritative module: truth store
  • published contract: conditional mutation boundary
  • hidden module: lock/index/storage-engine implementation
  • evolution move: move guard from application logic into the data layer

Where It Helps Less #

The overlay is less helpful for very local mitigations such as:

  • a plain CAS
  • a simple uniqueness constraint
  • a direct version check
  • a small retry/backoff loop

Those are already well handled by:

  • role-based failure phrases
  • bounded mechanism families

The overlay still applies, but the added value is smaller.


Best Rule #

If the mitigation requires reasoning about:

  • module boundaries
  • stable APIs
  • swappable internals
  • core versus periphery
  • long-term evolution

then the Design Rules overlay is worth applying.

If the mitigation is just:

  • add a version check
  • add a unique key
  • add retry with jitter

then the base failure/mechanism framework is usually enough.


Quick Examples #

Coordination -> stale owner accepted #

  • base mitigation: fencing token
  • overlay question:
    • where is epoch truth authoritative?
    • which module validates the epoch?
    • what part of renewal logic stays hidden?

Derived -> projection drift #

  • base mitigation: checkpoint + replay + rebuild
  • overlay question:
    • what is the projector contract?
    • what rebuild lane is separate from hot serving?
    • what can be substituted without changing view semantics?

External Effect -> retry ambiguity #

  • base mitigation: outbox + idempotency + reconciliation
  • overlay question:
    • where is effect truth authoritative?
    • what idempotency contract is visible to receivers?
    • can delivery transport change without changing effect semantics?

Short Conclusion #

The role-based failure framework should stay primary.

The Design Rules overlay is useful for failure mitigation when you need to shape the mitigation as a modular system:

  • who owns the control
  • what contract is exposed
  • what implementation stays hidden
  • what can evolve safely later

So the right framing is:

  • role-based failure generates the failure and base mitigation
  • Design Rules overlay improves the structure of that mitigation