Design Rules Overlay For Role-Based Failure Mitigation #

Status: Archive candidate. Keep as historical reference; prefer system-design-core-index.md and the core notes for day-to-day use.

Overlay-on-overlay note; keep only as dormant reference.

Yes, the Design Rules overlay is useful for failure mitigation.

But it is useful as a secondary lens, not the primary mitigation generator.

The primary role-based failure framework should still answer:

what failed
what local control prevents corruption
what truth is authoritative
what repair restores correctness

The Design Rules overlay helps answer a different mitigation question:

where should the mitigation live?
what contract should it publish?
what implementation detail should stay hidden?
what can be substituted later without changing semantics?

Correct Positioning #

Use the layers in this order:

role-based failure
- generate the likely failure
base mitigation
- choose the concrete control:
- CAS
- unique constraint
- lease
- fencing
- outbox
- checkpoint
- rebuild
- reconciliation
Design Rules overlay
- place the control in the right module
- define the published contract
- keep implementation details hidden
- identify clean substitution and evolution options

That is the right division of labor.

What The Overlay Adds #

The role-based framework tells you:

what is failing
what mechanism is needed

The overlay helps you shape that mitigation into:

the right authority boundary
the right interface contract
the right hidden module
the right substitution path
the right evolution move

So this is a mitigation-structure lens.

It does not replace the primary mitigation vocabulary.

Where It Helps Most #

The overlay is most useful when the mitigation is architectural rather than purely local.

`Coordination` failures #

Example:

failure: stale owner accepted
primary mitigation: fencing token

Overlay contribution:

authoritative module: lease service / owner record
published contract: claim/renew/release + epoch
hidden module: heartbeat, expiry detection, reaper
substitution options: DB lease -> etcd lease -> Redis lease

This makes the mitigation cleaner and more evolvable.

`Derived` failures #

Example:

failure: projection drift
primary mitigation: checkpointed projector + rebuild

Overlay contribution:

authoritative module: source truth plus projector checkpoint
published contract: changelog subscription + rebuild/replay semantics
hidden module: batching, projector scheduling, backfill internals
evolution move: split online projector from rebuild lane

`External Effect` failures #

Example:

failure: effect succeeded but ack lost
primary mitigation: outbox + reconciliation + idempotent receiver

Overlay contribution:

authoritative module: effect record / outbox / inbox dedup state
published contract: delivery id, idempotency key, ack semantics
hidden module: transport choice, retry scheduler, relay internals
substitution options: direct relay -> broker -> workflow engine

`Immutable` failures #

Example:

failure: pointer mismatch or incomplete publish
primary mitigation: manifest/head publish after content durable

Overlay contribution:

authoritative module: manifest/head record
published contract: publish/resolve/ref semantics
hidden module: blob placement, replication, GC
evolution move: split namespace from content store

`Truth` failures #

Example:

failure: lost update or invalid concurrent mutation
primary mitigation: CAS, unique constraint, or transaction

Overlay contribution:

authoritative module: truth store
published contract: conditional mutation boundary
hidden module: lock/index/storage-engine implementation
evolution move: move guard from application logic into the data layer

Where It Helps Less #

The overlay is less helpful for very local mitigations such as:

a plain CAS
a simple uniqueness constraint
a direct version check
a small retry/backoff loop

Those are already well handled by:

role-based failure phrases
bounded mechanism families

The overlay still applies, but the added value is smaller.

Best Rule #

If the mitigation requires reasoning about:

module boundaries
stable APIs
swappable internals
core versus periphery
long-term evolution

then the Design Rules overlay is worth applying.

If the mitigation is just:

add a version check
add a unique key
add retry with jitter

then the base failure/mechanism framework is usually enough.

Quick Examples #

`Coordination -> stale owner accepted` #

base mitigation: fencing token
overlay question:
- where is epoch truth authoritative?
- which module validates the epoch?
- what part of renewal logic stays hidden?

`Derived -> projection drift` #

base mitigation: checkpoint + replay + rebuild
overlay question:
- what is the projector contract?
- what rebuild lane is separate from hot serving?
- what can be substituted without changing view semantics?

`External Effect -> retry ambiguity` #

base mitigation: outbox + idempotency + reconciliation
overlay question:
- where is effect truth authoritative?
- what idempotency contract is visible to receivers?
- can delivery transport change without changing effect semantics?

Short Conclusion #

The role-based failure framework should stay primary.

The Design Rules overlay is useful for failure mitigation when you need to shape the mitigation as a modular system:

who owns the control
what contract is exposed
what implementation stays hidden
what can evolve safely later

So the right framing is:

role-based failure generates the failure and base mitigation
Design Rules overlay improves the structure of that mitigation

Design Rules Overlay For Role-Based Failure Mitigation #

Correct Positioning #

What The Overlay Adds #

Where It Helps Most #

Coordination failures #

Derived failures #

External Effect failures #

Immutable failures #

Truth failures #

Where It Helps Less #

Best Rule #

Quick Examples #

Coordination -> stale owner accepted #

Derived -> projection drift #

External Effect -> retry ambiguity #

Short Conclusion #

`Coordination` failures #

`Derived` failures #

`External Effect` failures #

`Immutable` failures #

`Truth` failures #

`Coordination -> stale owner accepted` #

`Derived -> projection drift` #

`External Effect -> retry ambiguity` #