Infra Archetype Taxonomy Reference #

Use this as the canonical infra taxonomy after the shift from the older six-bucket infra scheme to the normalized I01-I21 set.

For each archetype, this note lists:

entities
source-of-truth entities
write paths
read paths
sequence shape
failure phrases with prevention and repair
scaling bottlenecks and mitigation
how the archetype connects to other archetypes

The goal is to keep the archetypes operational and composable, not just nameable.

This note is the second layer after the infra family sheet:

families are for generating candidate paths quickly
archetypes are for making those paths precise in terms of truth, transitions, ownership, repair, and bottlenecks

Use this note when you need to answer:

what is authoritative?
what transition is guarded?
what stale actor or stale revision must be fenced?
what repair loop restores correctness?
what bottleneck is canonical for this shape?
what deployment and trust posture is the clean default for this shape?

Protocol Notation #

For infra interview recall, it is often useful to render an archetype as one or more compact protocols.

Assume the verbs are API or protocol operations, not just vague English labels.

So a protocol step should read like:

actor[operation {guard on truth}] -> next actor

Where:

actor is the component issuing or owning the step
operation is the protocol/API operation
guard on truth is the predicate that makes the step legal

Examples:

claimant[claim {ClaimState.owner_id = null or LeaseState.expiry_at < now}]
consumer[commit_progress {SinkEffectState.applied(event_id) = true}]
agent[apply {incoming.version > AppliedVersionState.local_version}]
worker[complete {ExecutionAttemptState.attempt_id = my_attempt}]

Branches #

Protocols often branch.

Use:

-> for mainline progression
|-> for guarded branch
=> done|retry|requeue|reclaim|rollback for compact terminal outcomes

Example:

scheduler[place {CapacityState.free_slots > 0}] ->
  worker[launch {ExecutionLeaseState.owner_id = worker_id}] ->
  worker[complete {ExecutionAttemptState.attempt_id = my_attempt}] => done
  |-> reconciler[reclaim {ExecutionLeaseState.expiry_at < now}] -> scheduler[reassign] => requeue

Recall Rule #

For most infra archetypes, do not try to memorize a full sequence diagram. Memorize:

main protocol
repair/failure protocol
optional control/update protocol

That usually captures the load-bearing mechanics better than prose alone.

Before the protocol, list the source-of-truth objects used by the guards, and nest the critical attributes under each object.

Use this shape:

Source of truth in protocol:
- TruthObjectA
  - critical_attr_1
  - critical_attr_2
- TruthObjectB
  - critical_attr_3

Ownership-Shaped State #

Ownership-bearing archetypes should be read through one of three state shapes.

`owner-only` #

Use when ownership ends only by:

explicit release
explicit overwrite
versioned transition

Canonical fields:

owner_id
version or equivalent guarded revision

`lease-backed ownership` #

Use when ownership is:

soft-state
renewal-based
reclaimable after timeout
protected by fencing

Canonical fields:

owner_id
expiry_at
optional epoch, term, or fencing token

This may appear as:

one combined object
- for example ExecutionLeaseState.owner_id, ExecutionLeaseState.expiry_at
or split objects
- for example ClaimState.owner_id + LeaseState.expiry_at + OwnerEpoch.current

`attempt-scoped execution` #

Use when correctness depends on:

which execution generation is current
stale completion rejection
replay or reassignment after failure

Canonical fields:

attempt_id
usually paired with either:
- owner-only
- or lease-backed ownership

Normalization Rule #

When reclaim or stale-actor fencing depends on timeout, the protocol should expose:

who owns it
when that ownership expires
and, if needed, which generation or epoch is current

So:

owner_id answers who is allowed to act now
expiry_at answers until when that authority is valid
attempt_id or epoch answers which generation is allowed to commit

Archetype To Family Fit #

Each archetype should feel like a mechanically precise realization of one family center of gravity, even when adjacent families or overlays are present.

Use this compact fit table:

Archetype	Default family fit	Why it fits
`I01 Coordination / Consensus Metadata`	`Coordination / Authority`	authoritative metadata mutation, revisions, watches, and leadership correctness
`I02 Claim / Lease / Exclusive Ownership`	`Coordination / Authority`	ownership and fencing are the primary correctness center
`I03 Due-Time Release + Claimable Run`	`Execution / Resource Management`	due-time eligibility releases work into execution
`I04 Frontier Scan + Claimable Run`	`Execution / Resource Management`	uncovered-work discovery releases work into execution
`I05 Append Log + Consumer Progress`	`Messaging / Log / Stream`	append, consume, progress, replay, retention
`I06 Projection / Index / Search Pipeline`	`Serving State / Storage` or `Observability / Telemetry`	derived read truth and rebuild semantics dominate
`I07 Cache / Origin Projection / Edge Delivery`	`Serving State / Storage`	read acceleration and freshness around origin truth
`I08 Traffic Shaping / Admission Control`	`Routing / Mediation` or `Execution / Resource Management`	demand gating and fairness dominate the hot path
`I09 Sequence / Identifier Generation`	`Coordination / Authority`	uniqueness and monotonic allocation dominate correctness
`I10 Membership / Presence / Registry`	`Membership / Registry`	register, heartbeat, expire, lookup, watch
`I11 Control Plane + Snapshot Distribution`	`Control Plane / Distribution`	versioned config truth, publication, apply, rollback
`I12 Workflow + External Side Effect`	no single family default	workflow correctness and side-effect reconciliation are the center
`I13 Shared Subject Coordination`	no common family default	shared subject sequencing and merge correctness dominate
`I14 Immutable Artifact Namespace + Delivery`	`Artifact / Rollout / Release`	immutable version publication and delivery truth dominate
`I15 Execution Fleet + Worker Substrate`	`Execution / Resource Management`	placement, capacity, execution lease, completion, reconciliation
`I16 Key-Scoped Mutable State / Replicated KV`	`Serving State / Storage`	current serving truth and replication semantics dominate
`I17 Traffic Steering / Request Mediation Plane`	`Routing / Mediation`	routing, backend health, stickiness, retries, drain
`I18 Telemetry / Time-Series Pipeline`	`Observability / Telemetry`	ingest, aggregate, query, alert, retention
`I19 Replicated Chunk / Block / File Storage Substrate`	`Serving State / Storage`	chunk placement, replica repair, namespace and metadata truth
`I20 Computation / Dataflow / DAG Execution`	`Computation / Dataflow`	graph topology, operator state, shuffle, checkpoint, and output visibility
`I21 Trust Boundary / Cryptographic Proof Substrate`	`Identity / Trust / Policy`	principal binding, key material, signed statements, revocation, audit, and attestation across trust boundaries

If an archetype still feels too coarse after this view, add the secondary runtime lens from:

secondary-execution-component-overlay.md

That overlay is most helpful for run, stream, workflow, fleet, telemetry, and storage-substrate systems.

For DB-like replicated-state systems, also use:

transaction-and-replication-overlay-for-db-like-archetypes.md

That one is narrower and is most useful for A07, I16, I01, and I02.

For the post-topology trust boundary pass, use:

security-and-privacy-lens-after-deployment-topology.md

That lens should usually be applied after:

archetype selection
correctness reasoning
scale reasoning
deployment topology choice

not before.

Post-Topology Security / Privacy Lens #

Apply security and privacy after you have already chosen:

serving domain
coordination domain
fault domain
recovery domain

At that point, ask:

who is trusted in each domain?
what can each domain observe?
what can each domain mutate?
what must hold across each boundary:
- confidentiality
- integrity
- authenticity
- unlinkability
- auditability

Default substrate #

For most infra archetypes, the default substrate is:

M1 cryptographic identity binding
M2 authenticated encryption
M4 signed or MAC’d versioned state

Compressed:

identity
secure channel or encrypted payload
signed and versioned control/state objects

Everything else is specialized:

M3 for forward secrecy
M5 for rotating identifiers / unlinkability
M6 for privacy-preserving proof or authorization
M7 for aggregate privacy
M8 for access-pattern privacy
M9 for attested execution on untrusted hosts

Compact template #

After deployment topology, add this block:

principals:
adversary:
trusted domains:
untrusted domains:
sensitive boundaries:
baseline substrate:
- M1 + M2 + M4
specialized move if needed:
defeats:
does not defeat:

Archetype defaults #

Archetype	Default security fit after topology	Common special move
`I01`	signed and versioned quorum state over authenticated channels	Byzantine or attested control only if quorum members are not trusted
`I02`	signed owner / epoch / lease tokens	stronger lease issuer trust story if lease authority itself is suspect
`I03`	signed schedule truth	encrypt runnable payload if scheduler should not read it
`I04`	signed checkpoint and frontier truth	signed completion proof or attestation if workers may lie
`I05`	authenticated producers/consumers, encrypted transport or payload, signed sequence metadata	forward secrecy for E2EE-style logs
`I06`	normal plaintext-index regime over secure channels	searchable encryption only when server plaintext visibility is unacceptable
`I07`	signed cache/config/content metadata over secure transport	signed-content edge if CDN/edge should be untrusted
`I08`	authenticated subject plus signed policy/versioning	anonymous tokens for privacy-preserving rate limits
`I09`	authenticated range allocator plus signed range/version truth	stronger fencing if allocation rights are delegated
`I10`	authenticated members and signed heartbeat/view state	rotating identifiers for privacy-sensitive presence/discovery
`I11`	signed snapshots/config with version/freshness checks	transparency or multi-signer approval for high-trust control planes
`I12`	signed workflow state and side-effect receipts	encrypt payload or add proof if side effects are disputed or sensitive
`I13`	signed operations and versioned merge state	encrypted collaboration when operator should not read content
`I14`	signed artifact publication and signed head advances	transparency log for supply-chain accountability
`I15`	authenticated workers plus signed placement/lease state	TEE attestation for confidential workloads
`I16`	authenticated clients, encrypted values, signed versioned state	oblivious access only for strong access-pattern privacy
`I17`	authenticated hops, mTLS, signed routes/policies	onion-routing composition for unlinkability
`I18`	signed exporters/samples	differential privacy for user-level analytics privacy
`I19`	encrypted chunks and signed metadata/manifests	proof-of-possession/replication or oblivious read only if needed
`I20`	signed graph definitions, authenticated workers, signed checkpoints/output commits	TEE attestation for confidential computation or signed lineage proofs for disputed outputs
`I21`	workload identity, signed claims, revocation state, and tamper-evident audit	transparency logs, threshold signing, or attestation when issuer/operator trust is not enough

`I01 Coordination / Consensus Metadata` #

Entities: LeaseState, LeadershipState, MembershipState, RevisionedConfig
Source of truth: LeaseState, LeadershipState, MembershipState, RevisionedConfig
Ownership mode: lease-backed authority via LeaseState.holder_id + expiry_at, fenced by LeadershipState.term
Write paths: acquire leadership, renew lease, release leadership, mutate metadata, watch registration
Read paths: current leader read, quorum-backed metadata read, watch/revision stream
Sequence: client -> coordination API -> quorum store -> watch stream -> watchers
Failure modes:
- split brain
  - prevention: quorum-backed guarded transition, monotonic term/epoch, fencing
  - repair: revoke stale leader, force new election, resync watchers from revision
- watch gap on reconnect
  - prevention: revision cursor on watch, compaction boundary checks
  - repair: replay from retained revision or reload full snapshot
Scaling bottlenecks: quorum write latency, hot metadata key, watch fanout mitigation: shard metadata by coordination domain, narrow hot keys, snapshot-plus-watch fanout tiers
Connections: input: admin/control writes, membership changes output: I02, I10, I11, I15 decorator/middleware: coordination layer under other archetypes that need authoritative election or revisioning

Load-Bearing Protocols #

Source of truth in protocol:

RevisionedConfig
- version
- compaction_floor
LeadershipState
- term
LeaseState
- holder_id
- expiry_at

Main protocol:

client[mutate_metadata {RevisionedConfig.version = expected_version}] ->
  quorum_store[commit {quorum reachable}] ->
  watcher[resume_watch {watch_revision >= RevisionedConfig.version}] => done

Repair/control protocol:

leader_candidate[acquire_leadership {LeadershipState.term = expected_term and quorum reachable}] ->
  leader[renew_lease {LeaseState.holder_id = me and LeaseState.expiry_at >= now}] ->
  follower[observe_revision {RevisionedConfig.version > local_version}] => done
  |-> watcher[reload_snapshot {watch_revision < RevisionedConfig.compaction_floor or watch_gap_detected = true}] => requeue

`I02 Claim / Lease / Exclusive Ownership` #

Entities: ClaimState, LeaseState, OwnerEpoch
Source of truth: ClaimState, LeaseState, OwnerEpoch
Ownership mode: lease-backed ownership split across ClaimState.owner_id, LeaseState.expiry_at, and OwnerEpoch.current
Write paths: claim, renew, release, expire stale claim, reclaim abandoned work
Read paths: current owner read, lease expiry read
Sequence: claimant -> lease service -> lease store -> claimant acts with epoch -> downstream fenced write
Failure modes:
- duplicate claim
  - prevention: guarded claim transition, unique active owner, lease + epoch
  - repair: expire/reap stale claim, reconcile duplicate actors downstream
- stale holder acting after expiry
  - prevention: fencing token / epoch check on every downstream commit
  - repair: reassign ownership, replay or reconcile partial side effects
Scaling bottlenecks: hot claim key, renew storms, lease-manager contention mitigation: partition claim domains, batch renewals, longer leases with fencing, local caches for read-only ownership observation
Connections: input: I03, I04, I15 output: protects A08, A12, I15, I17 decorator/middleware: ownership guard around mutable truth or execution

Load-Bearing Protocols #

Source of truth in protocol:

ClaimState
- owner_id
LeaseState
- expiry_at
OwnerEpoch
- current

Main protocol:

claimant[claim {ClaimState.owner_id = null or LeaseState.expiry_at < now}] ->
  holder[renew {OwnerEpoch.current = my_epoch}] ->
  holder[commit {OwnerEpoch.current = my_epoch}] => done

Repair/fencing protocol:

downstream_writer[reject_commit {OwnerEpoch.current != presented_epoch}] => retry
  |-> reaper[reclaim {LeaseState.expiry_at < now}] => requeue

Third-Layer Mechanism Lens #

Canonical unit of work: claim
Most relevant design-space dimensions: authorityDelegation, progressVisibility, recovery, coordAuthority
Common mechanism variants: lease-acquired authority vs claim-acquired ownership, heartbeat/renewal vs pure TTL expiry, centralized lease truth vs consensus-backed lease truth
Dominant invariant families: at-most-one-current-valid-owner, stale-actor-must-not-commit, every-eligible-item-is-owned-or-completed-or-rediscoverable
Canonical failure signatures: split-brain ownership, zombie write, orphaned claim, false expiry, renew storm
Good real-system anchors: Chubby, ZooKeeper recipes, etcd leases, SQS visibility-timeout-style claim loops

`I03 Due-Time Release + Claimable Run` #

Entities: ScheduleState, RunnableState, AttemptState, DueIndexState
Source of truth: ScheduleState, RunnableState
Write paths: schedule job, materialize due job, claim runnable attempt, complete/fail attempt
Read paths: due scan by time bucket, runnable fetch, history/progress read
Sequence: client -> scheduler API -> schedule store -> due scanner -> runnable store/queue -> worker
Failure modes:
- due item materialized twice
  - prevention: checkpoint-after-durable-materialization, idempotent release keyed by logical run
  - repair: dedup runnable records, reconcile duplicate attempts
- due item never materialized
  - prevention: overdue reconciliation sweep, monotonic scan checkpoints
  - repair: rescan overdue buckets, rebuild runnable set from schedule truth
Scaling bottlenecks: due bucket hotspot, bursty wakeups, runnable queue bursts mitigation: bucket sharding, jitter, hierarchical timing wheels, downstream fleet buffering
Connections: input: user schedules, cron, retries output: I15 or I02 for claim/execution decorator/middleware: often front door to workflow/orchestration systems

Load-Bearing Protocols #

Source of truth in protocol:

ScheduleState
- version
- next_due_at
DueIndexState
- bucket_start
RunnableState
- status
AttemptState
- attempt_id

Main protocol:

client[schedule {ScheduleState.version = expected_version}] ->
  due_scanner[materialize_runnable {DueIndexState.bucket_start <= now and RunnableState.status = ABSENT}] ->
  worker[claim {RunnableState.status = READY}] ->
  worker[complete {AttemptState.attempt_id = my_attempt}] => done

Repair protocol:

sweeper[rescan_overdue {ScheduleState.next_due_at < now and RunnableState.status = ABSENT}] => requeue

Third-Layer Mechanism Lens #

Canonical unit of work: run
Most relevant design-space dimensions: lifecycleShape, recovery, progressVisibility, authorityDelegation
Common mechanism variants: due-index scan vs timing-wheel release, broker-mediated runnable queue vs direct fleet handoff, requeue vs retry-in-place after failure
Dominant invariant families: every-eligible-item-is-owned-or-completed-or-rediscoverable, progress-never-regresses, completion-only-after-prerequisite-effect-is-durable
Canonical failure signatures: missed due release, duplicate run materialization, lost checkpoint, retry storm, lateness burst
Good real-system anchors: Quartz, EventBridge Scheduler, Airflow scheduler, hierarchical timing wheels

`I04 Frontier Scan + Claimable Run` #

Entities: FrontierState, ClaimState, CheckpointState, ProgressState
Source of truth: FrontierState, CheckpointState
Ownership mode: lease-backed batch claim via ClaimState.owner_id + expiry_at
Write paths: expand frontier, claim uncovered work, checkpoint covered progress, requeue incomplete work
Read paths: frontier scan, progress read, claim ownership read
Sequence: frontier manager -> frontier store -> workers claim batches -> checkpoint update -> next frontier opens
Failure modes:
- frontier advanced too far
  - prevention: checkpoint only after durable success, guarded frontier advance
  - repair: rescan from last safe checkpoint, rebuild uncovered set
- uncovered work skipped
  - prevention: coverage invariant over checkpoint/frontier, resumable scan discipline
  - repair: anti-entropy rescan, replay unfinished partitions
Scaling bottlenecks: hot checkpoint row, skewed ranges, claim bursts mitigation: partition frontier, hierarchical checkpoints, skew-aware shard splitting
Connections: input: discovery systems, batch scanners, DAG dependency release output: I15, I06, A10 decorator/middleware: often paired with I05 or I15 for execution

Load-Bearing Protocols #

Source of truth in protocol:

CheckpointState
- covered_cursor
FrontierState
- high_watermark_cursor
ClaimState
- owner_id
- expiry_at
ProgressState
- batch_done

Main protocol:

frontier_manager[claim_range {CheckpointState.covered_cursor < FrontierState.high_watermark_cursor}] ->
  worker[process_batch {ClaimState.owner_id = my_worker}] ->
  worker[checkpoint {ProgressState.batch_done = true}] => done

Repair protocol:

reconciler[requeue_range {ClaimState.expiry_at < now or ProgressState.batch_done = false}] => requeue

Third-Layer Mechanism Lens #

Canonical unit of work: batch
Most relevant design-space dimensions: lifecycleShape, progressVisibility, reconciliation, recovery
Common mechanism variants: centralized frontier manager vs partition-local frontier ownership, checkpoint-write vs event-emission progress, periodic resweep vs continuous anti-entropy
Dominant invariant families: every-eligible-item-is-owned-or-completed-or-rediscoverable, progress-never-regresses, drift-eventually-corrected
Canonical failure signatures: uncovered work skipped, frontier advanced too far, lost checkpoint, orphaned claim, duplicate batch claim
Good real-system anchors: Mercator, crawler frontiers, repair sweepers, compaction scanners

`I05 Append Log + Consumer Progress` #

Entities: LogSegment, Offset, ConsumerProgress, PartitionState
Source of truth: LogSegment, ConsumerProgress
Write paths: append record, commit consumer progress, compact/retain segments
Read paths: fetch from offset, replay from offset, lag read
Sequence: producer -> broker/log -> consumer fetch -> effect -> progress commit
Failure modes:
- offset advanced before effect committed
  - prevention: effect-before-offset discipline, idempotent sink, transactional commit where possible
  - repair: replay from safe offset, reconcile sink/effect ambiguity
- duplicate consumption
  - prevention: idempotent consumer, inbox dedup, fenced consumer group progress
  - repair: replay with dedup or downstream reconciliation
Scaling bottlenecks: hot partition, broker I/O, rebalance churn mitigation: partition key design, batch I/O, consumer group tuning, tiered storage
Connections: input: producers, CDC, event emitters output: I06, I12, A14, analytics systems decorator/middleware: backbone under many async systems

Load-Bearing Protocols #

Source of truth in protocol:

PartitionState
- leader
LogSegment
- high_watermark
ConsumerProgress
- next_offset
SinkEffectState
- durable

Main protocol:

producer[append {PartitionState.leader = current_leader}] ->
  consumer[fetch {ConsumerProgress.next_offset <= LogSegment.high_watermark}] ->
  consumer[process] ->
  consumer[commit_progress {SinkEffectState.durable = true}] => done

Repair protocol:

consumer[replay {SinkEffectState.durable = false or SinkEffectState.durable = unknown}] => retry

`I06 Projection / Index / Search Pipeline` #

Entities: SourceState, IndexEntryState, ProjectionState, ProjectorCheckpoint
Source of truth: SourceState
Write paths: source mutation, projector apply, delete/tombstone propagation, reindex
Read paths: projection query, search query, freshness read
Sequence: source writer -> source truth -> projector/indexer -> projection/index -> reader
Failure modes:
- stale projection
  - prevention: ordered projector checkpoints, replayable source log, monotonic apply
  - repair: replay or rebuild from source truth
- missing entry / tombstone not propagated
  - prevention: tombstone handling, completeness checks, backfill sweeps
  - repair: reindex affected range or full rebuild
Scaling bottlenecks: fanout on write, query fanout, rebuild cost mitigation: partitioned projectors, async pipelines, hierarchical indexes, background rebuild lanes
Connections: input: I05, source truth from many product archetypes output: A05, A15, monitoring/search surfaces decorator/middleware: derived read path for many primary archetypes

Load-Bearing Protocols #

Source of truth in protocol:

SourceState
- version
ProjectorCheckpoint
- next_offset
- gap_detected
ProjectionState
- version

Main protocol:

writer[mutate_source {SourceState.version = expected_version}] ->
  projector[apply {ProjectorCheckpoint.next_offset = expected_offset}] ->
  indexer[publish_projection {ProjectionState.version = source_version}] => done

Repair protocol:

rebuilder[reindex {ProjectorCheckpoint.gap_detected = true}] => replay

`I07 Cache / Origin Projection / Edge Delivery` #

Entities: OriginState, CacheEntry, PurgeVersion, EdgePolicyState
Source of truth: OriginState
Write paths: populate cache, refresh cache, invalidate/purge cache, propagate edge policy
Read paths: edge/cache read, origin fallback read, freshness/version read
Sequence: client -> edge/cache -> cache miss -> origin -> cache fill -> later invalidation/purge
Failure modes:
- stale cache
  - prevention: TTL/versioned purge discipline, origin version checks on fill
  - repair: purge, refresh, force origin read
- cache stampede
  - prevention: single-flight fill, request coalescing, soft TTL with refresh ahead
  - repair: temporary shed/load protect origin, backfill cache gradually
Scaling bottlenecks: hot key, miss storm, invalidation fanout mitigation: request collapsing, regional caches, stale-while-revalidate, purge trees
Connections: input: I06, I14, any origin truth output: read acceleration to product and infra systems decorator/middleware: read-path decorator in front of truth or projection

Load-Bearing Protocols #

Source of truth in protocol:

CacheEntry
- expires_at
- miss
- version
OriginState
- version
PurgeVersion
- version

Read/fill protocol:

client[read_cache {CacheEntry.expires_at >= now}] => done
  |-> cache[fill {CacheEntry.miss = true}] ->
     origin[read {OriginState.version >= requested_version}] ->
     cache[publish_entry {PurgeVersion.version = expected_version}] => done

Invalidation protocol:

invalidator[purge {PurgeVersion.version = expected_version}] ->
  edge[drop_entry {CacheEntry.version < PurgeVersion.version}] => refresh

`I08 Traffic Shaping / Admission Control` #

Entities: BudgetState, WindowState, TokenBucketState, ConcurrencyState, PolicyState
Source of truth: BudgetState, WindowState, TokenBucketState, ConcurrencyState, PolicyState
Write paths: evaluate admit/reject/defer, consume tokens or slots, refill/reset, update shaping policy
Read paths: current budget/policy read, decision audit read
Sequence: request -> evaluator -> budget/policy store or local snapshot -> admit/reject/defer -> downstream
Failure modes:
- over-admit under race
  - prevention: atomic budget update, local-fast-path with bounded drift, concurrency caps
  - repair: shed/defer excess load, reset counters from truth
- stale policy apply
  - prevention: versioned policy snapshots, monotonic apply
  - repair: republish policy, invalidate local snapshot, reconcile incorrect decisions if needed
Scaling bottlenecks: hot tenant key, evaluator hot path, policy fanout mitigation: local token buckets with periodic sync, shard budgets, route-local admission tiers
Connections: input: I11 policies, I17 request flow output: protects I17, I15, external dependencies decorator/middleware: request-path gate before expensive downstream systems

Load-Bearing Protocols #

Source of truth in protocol:

TokenBucketState
- available_tokens
ConcurrencyState
- available_slots
PolicyState
- version
- denies

Main protocol:

request[evaluate_admission {TokenBucketState.available_tokens > 0 and PolicyState.version >= local_policy_version}] ->
  evaluator[consume_budget {ConcurrencyState.available_slots > 0}] => done

Reject/defer protocol:

evaluator[reject {TokenBucketState.available_tokens = 0 or PolicyState.denies = true}] => retry

`I09 Sequence / Identifier Generation` #

Entities: CounterState, RangeLeaseState, WorkerIdState, EpochState
Source of truth: CounterState, RangeLeaseState, WorkerIdState, EpochState
Ownership mode: lease-backed range ownership via RangeLeaseState.owner_id + expiry_at, fenced by EpochState.current
Write paths: allocate counter/range, claim worker ID, generate ID, renew worker lease
Read paths: generated ID response, generator health read
Sequence: client -> allocator or local generator -> counter/range lease state -> id returned
Failure modes:
- duplicate IDs
  - prevention: unique worker identity or leased ranges, monotonic epoch, guarded local sequence advance
  - repair: fence bad generator, rotate worker IDs, repair duplicates only if downstream can reconcile
- non-monotonic IDs
  - prevention: monotonic local clock discipline or logical sequence fallback
  - repair: epoch bump and restart generation lane
Scaling bottlenecks: central allocator hotspot, worker-id contention mitigation: range leasing, local generation, wider worker-id space
Connections: input: control-plane worker registration output: IDs for many archetypes decorator/middleware: identity decorator on write paths

Load-Bearing Protocols #

Source of truth in protocol:

RangeLeaseState
- owner_id
- expiry_at
- max_value
EpochState
- current
WorkerIdState
- conflict

Main protocol:

client[allocate_range {RangeLeaseState.owner_id = null or RangeLeaseState.expiry_at < now}] ->
  generator[issue_id {EpochState.current = my_epoch and LocalCounter.value < RangeLeaseState.max_value}] => done

Repair protocol:

allocator[rotate_worker_id {WorkerIdState.conflict = true}] => retry

`I10 Membership / Presence / Registry` #

Entities: MemberState, PresenceState, RegistryVersion
Source of truth: MemberState, PresenceState, RegistryVersion
Write paths: register member, heartbeat, expire member, update registry version
Read paths: membership watch, presence query, service lookup
Sequence: node/session -> registry -> heartbeat updates -> watchers/readers consume membership view
Failure modes:
- false death
  - prevention: suspicion before eviction, heartbeat grace, version/incarnation rules
  - repair: rejoin with higher incarnation, anti-entropy membership sync
- ghost member
  - prevention: expiry sweep, heartbeat TTL, watch versioning
  - repair: explicit purge or rebuild registry from active heartbeats
Scaling bottlenecks: heartbeat fan-in, watch fanout, hot group membership mitigation: hierarchical membership, piggyback/gossip where appropriate, cached registry reads
Connections: input: node health, session state output: I17, I15, I01 decorator/middleware: discovery/presence overlay for routing or assignment

Load-Bearing Protocols #

Source of truth in protocol:

RegistryVersion
- version
MemberState
- member_id
PresenceState
- status
- last_heartbeat_at

Main protocol:

node[register {RegistryVersion.version = expected_version}] ->
  node[heartbeat {MemberState.member_id = my_id}] ->
  reader[lookup {PresenceState.status = ALIVE}] => done

Repair protocol:

reaper[expire {PresenceState.last_heartbeat_at < now - ttl}] => refresh

`I11 Control Plane + Snapshot Distribution` #

Entities: ConfigState, SnapshotState, AppliedVersionState
Source of truth: ConfigState
Write paths: config mutate, publish snapshot, agent apply version
Read paths: config read, watch/version stream, local snapshot read
Sequence: admin -> control plane -> config truth -> snapshot publisher/watch -> agent -> local apply
Failure modes:
- out-of-order snapshot apply
  - prevention: monotonic version checks, ACK/NACK discipline
  - repair: refetch full snapshot, replay from current version
- partial rollout
  - prevention: staged rollout tracking, applied-version reporting
  - repair: rollback revision, republish to missing agents
Scaling bottlenecks: fanout to many agents, hot tenant config, snapshot size mitigation: delta snapshots, snapshot CDN/brokers, tenant partitioning
Connections: input: policy/admin truth output: I08, I17, I15, A11 decorator/middleware: control-plane layer for many serving and evaluation systems

Load-Bearing Protocols #

Source of truth in protocol:

ConfigState
- version
SnapshotState
- version
AppliedVersionState
- local_version
- error_rate

Main protocol:

admin[mutate_config {ConfigState.version = expected_version}] ->
  publisher[publish_snapshot {SnapshotState.version = ConfigState.version}] ->
  agent[apply {SnapshotState.version > AppliedVersionState.local_version}] ->
  agent[ack {AppliedVersionState.local_version = SnapshotState.version}] => done

Repair/control protocol:

controller[rollback {AppliedVersionState.error_rate > rollout_threshold or rollout_health = BAD}] => republish

Substrate Cleavage #

I11 is value-separable because authoring, publication, propagation, local apply, health observation, and rollback have different truth states and failure modes.

Compact flow:

author -> validate -> commit config truth -> publish version -> distribute snapshot/delta
       -> agent apply -> report applied version/health -> advance rollout or rollback

Substrate slice	Value it provides	Truth / state
Authoring / intent	captures desired config or policy change	`ConfigIntentState`, `ChangeRequestState`
Validation / guardrail	rejects unsafe or invalid changes before publication	`ValidationState`, `PolicyCheckState`
Versioned truth store	owns authoritative config history	`ConfigState`, `ConfigVersionState`, `RevisionHistoryState`
Publication	converts truth into an immutable release/snapshot	`SnapshotState`, `DeltaState`, `PublicationState`
Distribution / fanout	moves snapshot or delta to many targets	`DistributionState`, `WatchCursorState`, `FanoutState`
Local apply	makes a target serve using a specific version	`AppliedVersionState`, `LocalSnapshotState`
Health / convergence observation	tells whether rollout is safe to continue	`TargetHealthState`, `ConvergenceState`, `ApplyAckState`
Rollout / rollback control	advances, pauses, or reverts versions	`RolloutState`, `RollbackState`, `WaveState`
Drift / repair	detects and corrects targets serving the wrong version	`DriftState`, `RepairState`

Canonical substrate APIs:

Boundary	API shape
Authoring API	`propose_change`, `validate_change`, `approve_change`
Truth mutation API	`put_config(expected_version, patch)`
Publication API	`publish_snapshot(version)` or `publish_delta(from_version, to_version)`
Watch / fetch API	`watch(from_version)` or `get_snapshot(version)`
Apply API	`apply(version, snapshot_or_delta)`
ACK / health API	`ack(version, status, health_signals)`
Rollout API	`advance_wave`, `pause_rollout`, `rollback(version)`
Repair API	`refetch_full_snapshot`, `reconcile_drift(target_id)`

Core invariants:

ConfigState.version is monotonic.
A target must not apply a version older than its current AppliedVersionState.local_version.
A published snapshot must correspond to exactly one config version.
Rollout may advance only from observed health and convergence state, not from publication success alone.
Drift repair must converge targets back to a declared desired version.

Common cleavage examples:

System	Slice it teaches
`xDS`	versioned control-plane publication and target ACK/NACK
Kubernetes watch/resourceVersion	watch, cache, and replay from versioned API truth
feature-flag platform	local snapshot apply, targeting rules, and staged rollout
Argo CD / GitOps controller	desired config truth, apply, drift detection, rollback

Third-Layer Mechanism Lens #

Canonical unit of work: snapshot publication, apply target
Most relevant design-space dimensions: authorityDelegation, progressVisibility, reconciliation, coordAuthority
Common mechanism variants: push vs pull apply, full snapshot vs delta distribution, central publisher vs tiered fanout/cache hierarchy
Dominant invariant families: progress-never-regresses, stale-actor-must-not-commit, drift-eventually-corrected
Canonical failure signatures: stale policy apply, partial rollout, drift not reconciled, reconciliation thrash, fanout overload
Good real-system anchors: Envoy xDS, Kubernetes watch/resourceVersion distribution, feature-flag control planes

`I12 Workflow + External Side Effect` #

Entities: WorkflowState, OutboxEvent, DeliveryAttemptState, EffectResultState
Source of truth: WorkflowState, OutboxEvent
Write paths: create workflow, guarded transition, emit outbox, record delivery attempt/result
Read paths: workflow status read, replay/reconciliation read
Sequence: client -> workflow service -> workflow truth + outbox -> effect worker -> external system -> result update
Failure modes:
- crash after transition but before side effect
  - prevention: transactional outbox, idempotency key
  - repair: replay outbox, reconcile with provider
- retry ambiguity against provider
  - prevention: idempotent external API or provider-side dedup keys
  - repair: reconciliation poll/manual correction
Scaling bottlenecks: worker backlog, external provider bottleneck, hot workflow rows mitigation: queue buffering, provider-specific rate shaping, partition workflows by tenant/key
Connections: input: many product/process triggers output: external systems, notifications, payments decorator/middleware: effect-handling shell around state machines

Load-Bearing Protocols #

Source of truth in protocol:

WorkflowState
- version
- state
OutboxEvent
- status
DeliveryAttemptState
- attempt_id
EffectResultState
- provider_result

Main protocol:

client[start_workflow {WorkflowState.version = expected_version}] ->
  workflow_service[transition {WorkflowState.state in allowed_predecessors}] ->
  worker[deliver_effect {OutboxEvent.status = READY}] ->
  worker[record_result {DeliveryAttemptState.attempt_id = my_attempt}] => done

Repair protocol:

reconciler[replay_outbox {OutboxEvent.status = READY or EffectResultState.provider_result = AMBIGUOUS}] => retry

Third-Layer Mechanism Lens #

Canonical unit of work: workflow instance
Most relevant design-space dimensions: recovery, progressVisibility, reconciliation, authorityDelegation
Common mechanism variants: replay-based durable execution vs compensation-heavy saga, central orchestrator vs broker-mediated step execution, event-emission vs checkpoint-write progress
Dominant invariant families: progress-never-regresses, completion-only-after-prerequisite-effect-is-durable, stale-actor-must-not-commit
Canonical failure signatures: duplicate execution, stale completion, stuck in nonterminal state, ambiguous provider result, retry storm
Good real-system anchors: Temporal, Cadence, Step Functions, Sagas, Life Beyond Distributed Transactions

`I13 Shared Subject Coordination` #

Entities: SubjectState, OperationLog, VersionState, SessionState
Source of truth: SubjectState, OperationLog
Write paths: submit op, sequence/merge op, snapshot subject, advance version
Read paths: subscribe to op stream, fetch snapshot, replay from version
Sequence: client session -> coordinator -> op log + subject state -> fanout to subscribers
Failure modes:
- out-of-order apply
  - prevention: per-subject sequencing or causality checks
  - repair: replay from op log, rebuild subject snapshot
- divergence
  - prevention: authoritative merge/sequence discipline, versioned sync
  - repair: resync from snapshot plus missing ops
Scaling bottlenecks: hot subject coordinator, replay cost, subscriber fanout mitigation: shard by subject, snapshots, local session buffering
Connections: input: collaborative client edits output: A17, A14 decorator/middleware: coordination substrate for shared mutable products

Load-Bearing Protocols #

Source of truth in protocol:

VersionState
- base_version
SubjectState
- version
OperationLog
- offset

Main protocol:

client[submit_op {VersionState.base_version = expected_version}] ->
  coordinator[sequence_or_merge {SubjectState.version = expected_version}] ->
  coordinator[publish_op {OperationLog.offset = next}] ->
  subscriber[apply {OperationLog.offset > local_offset}] => done

Repair protocol:

client[resync {VersionState.base_version != SubjectState.version}] => retry

`I14 Immutable Artifact Namespace + Delivery` #

Entities: NamespaceHeadState, ManifestState, ArtifactBlobState, DistributionState
Source of truth: NamespaceHeadState, ManifestState, ArtifactBlobState
Write paths: upload blob, write manifest, advance namespace head/tag, distribute/cache artifact
Read paths: resolve tag/head, fetch manifest, fetch blob
Sequence: publisher -> blob store -> manifest store -> namespace head CAS -> clients resolve/fetch
Failure modes:
- content uploaded but namespace not advanced
  - prevention: publish-content-first then head-move discipline
  - repair: retry head advance or garbage collect orphaned content
- namespace CAS race
  - prevention: head compare-and-swap, immutable manifests
  - repair: retry against current head, materialize conflict/version
Scaling bottlenecks: hot namespace metadata, popular blob amplification, sync storms mitigation: CDN for blobs, metadata sharding, client-side delta sync
Connections: input: build/publish pipelines output: A18, deployment systems, storage clients decorator/middleware: immutable delivery substrate under user-facing namespace systems

Load-Bearing Protocols #

Source of truth in protocol:

ArtifactBlobState
- digest
ManifestState
- id
NamespaceHeadState
- version
- advance_failed

Main protocol:

publisher[upload_blob {ArtifactBlobState.digest absent}] ->
  publisher[write_manifest {ManifestState.id absent}] ->
  publisher[advance_head {NamespaceHeadState.version = expected_version}] => done

Repair protocol:

gc[collect_orphaned_blob {NamespaceHeadState.advance_failed = true}] => retry

`I15 Execution Fleet + Worker Substrate` #

Entities: WorkerState, CapacityState, PlacementState, ExecutionLeaseState, ExecutionAttemptState, RuntimeSlotState
Source of truth: CapacityState, PlacementState, ExecutionLeaseState, ExecutionAttemptState
Ownership mode: attempt-scoped execution on top of lease-backed ownership via ExecutionLeaseState.owner_id + expiry_at and ExecutionAttemptState.attempt_id
Write paths: register/heartbeat worker, place runnable work, start execution, renew lease, persist completion, release capacity, preempt/evict placement
Read paths: worker/capacity read, placement queue read, execution status read
Sequence: invoker/scheduler -> placement service -> capacity + placement truth -> worker -> completion -> reconciler
Failure modes:
- duplicate placement
  - prevention: guarded placement transition, capacity reservation before dispatch
  - repair: cancel duplicate attempt, fence stale completion
- capacity leak after worker crash
  - prevention: lease expiry + reconciler, completion path releases slot
  - repair: reclaim slot, requeue stranded work
- preempted or evicted work continues acting
  - prevention: lease/token fencing on post-preemption completion and side effects, monotonic placement/attempt version
  - repair: cancel stale attempt, reclaim slot, requeue if policy allows
Scaling bottlenecks: worker saturation, placement contention, cold starts, heartbeat fan-in mitigation: warm pools, hierarchical schedulers, placement partitioning, local heartbeats aggregated upstream
Connections: input: I03, I04, I12 output: execution results to many archetypes decorator/middleware: execution substrate under schedulers, scanners, ML/ETL jobs

Load-Bearing Protocols #

Source of truth in protocol:

CapacityState
- free_slots
- slot_released
PlacementState
- version
- priority
ExecutionLeaseState
- owner_id
- expiry_at
ExecutionAttemptState
- attempt_id
- state

Main runtime protocol:

scheduler[place {CapacityState.free_slots > 0 and PlacementState.version = expected_version}] ->
  worker[launch {ExecutionLeaseState.owner_id = worker_id}] ->
  worker[heartbeat {ExecutionLeaseState.expiry_at >= now}] ->
  worker[complete {ExecutionAttemptState.attempt_id = my_attempt}] => done

Repair protocol:

reconciler[reclaim {ExecutionLeaseState.expiry_at < now and ExecutionAttemptState.state != COMPLETED}] ->
  scheduler[reassign {CapacityState.free_slots > 0}] => requeue

Preemption protocol:

controller[preempt {PlacementState.priority < incoming_priority}] ->
  worker[stop {ExecutionLeaseState.owner_id = worker_id}] ->
  scheduler[reassign {CapacityState.slot_released = true}] => done

Third-Layer Mechanism Lens #

Canonical unit of work: run, sometimes allocation
Most relevant design-space dimensions: authorityDelegation, progressVisibility, recovery, coordAuthority
Common mechanism variants: leader-led placement vs worker-pull claim, heartbeat-pull vs lease-presence, requeue vs failover, warm-pool-heavy vs cold-start-heavy fleets
Dominant invariant families: at-most-one-current-valid-owner, stale-actor-must-not-commit, every-eligible-item-is-owned-or-completed-or-rediscoverable
Canonical failure signatures: duplicate placement, stale completion, orphaned claim, capacity leak, heartbeat fan-in overload
Good real-system anchors: Borg, Omega, Kubernetes scheduler, Lambda fleets, Nomad

`I16 Key-Scoped Mutable State / Replicated KV` #

Entities: KeyState, ReplicaVersionState, TTLState, EvictionState
Source of truth: KeyState, ReplicaVersionState, TTLState
Write paths: put/update key, conditional overwrite, expire key, replicate key, evict key
Read paths: point get, range/get if supported, replica read
Sequence: client -> KV API -> partition leader/replica set -> replication -> reader
Failure modes:
- stale read after failover
  - prevention: quorum/leader read policy, replica version checks
  - repair: read repair, anti-entropy replication
- ghost expired key
  - prevention: authoritative TTL semantics, expiry visibility rules
  - repair: expiry sweep, tombstone propagation
Scaling bottlenecks: hot key, leader hotspot, memory pressure mitigation: key splitting, replication-aware caching, bounded value size, hot-key isolation
Connections: input: application writes, control-plane updates output: sessions, counters, small serving truth for many systems decorator/middleware: can serve as substrate under I08, I10, I17

Load-Bearing Protocols #

Source of truth in protocol:

KeyState
- version
ReplicaVersionState
- term
- version
- diverged
TTLState
- expires_at

Main protocol:

client[put {KeyState.version = expected_version}] ->
  replica_set[replicate {ReplicaVersionState.term = current_term}] ->
  reader[get {ReplicaVersionState.version >= required_version}] => done

Repair protocol:

repair[anti_entropy {ReplicaVersionState.diverged = true}] => replay

`I17 Traffic Steering / Request Mediation Plane` #

Entities: RouteState, BackendHealthState, PolicyState, AffinityState, ConnectionState
Source of truth: RouteState, BackendHealthState, PolicyState
Write paths: update route/policy, mark backend health, establish/reuse connection state, record mediation decision
Read paths: request path route lookup, health read, affinity lookup, policy evaluation
Sequence: client request -> steering/data plane -> route/policy snapshot -> backend selection -> backend
Failure modes:
- routing to dead backend
  - prevention: health-aware selection, passive + active health, outlier ejection
  - repair: retry/re-route, drain bad backend, refresh health state
- stale policy enforcement
  - prevention: versioned policy snapshot, monotonic config apply
  - repair: republish config, invalidate bad local snapshot
Scaling bottlenecks: hot VIP/route, TLS and connection tables, health-check fanout mitigation: connection pooling, tiered gateways, route partitioning, aggregate health signals
Connections: input: I11 config, I10 membership, I08 admission policy output: mediates traffic to almost any serving archetype decorator/middleware: classic middleware plane in front of backends

Load-Bearing Protocols #

Source of truth in protocol:

RouteState
- version
BackendHealthState
- status
PolicyState
- allows
RetryBudget
- remaining

Main protocol:

client[resolve_route {RouteState.version >= local_route_version}] ->
  proxy[select_backend {BackendHealthState.status = HEALTHY}] ->
  proxy[forward {PolicyState.allows = true}] => done

Retry/drain protocol:

proxy[retry {BackendHealthState.status = UNHEALTHY and RetryBudget.remaining > 0}] => retry

Third-Layer Mechanism Lens #

Canonical unit of work: route decision
Most relevant design-space dimensions: roleTopology, authorityDelegation, progressVisibility, reconciliation
Common mechanism variants: local snapshot routing vs remote policy lookup, passive vs active health evaluation, fail-open vs fail-closed mediation, retry-at-proxy vs retry-at-client
Dominant invariant families: drift-eventually-corrected, stale-actor-must-not-commit, bounded-retry-no-amplification
Canonical failure signatures: routing black hole, retry storm, stale policy apply, drift not reconciled, reconciliation flap
Good real-system anchors: Envoy, xDS, Maglev, Tail at Scale, API gateways

`I18 Telemetry / Time-Series Pipeline` #

Entities: SampleState, SeriesState, LabelIndexState, RuleState, AlertState, BlockState
Source of truth: SampleState, SeriesState, LabelIndexState, RuleState, AlertState
Write paths: ingest sample/event, compact blocks, update label index, evaluate rules, transition alert state
Read paths: range query, top-k/time-window query, dashboard read, alert state read
Sequence: agent/exporter -> ingest tier -> WAL/block store + label index -> rule engine -> dashboard/alert consumer
Failure modes:
- dropped sample / late sample skew
  - prevention: WAL before ack, bounded lateness policy, clock/ordering discipline
  - repair: replay WAL, backfill sample ranges if available
- alert flapping
  - prevention: hysteresis, stable evaluation windows, dedup across alertmanager tiers
  - repair: suppress noisy alert state, recompute from recent window
Scaling bottlenecks: ingest throughput, high-cardinality labels, query fanout, compaction I/O mitigation: remote-write sharding, label/cardinality controls, rollups, tiered query storage
Connections: input: exporters, logs/events from I05, worker stats from I15 output: monitoring for all archetypes decorator/middleware: observability overlay on top of every other system

Load-Bearing Protocols #

Source of truth in protocol:

BlockState
- wal_available
SeriesState
- window_complete
AlertState
- rule_result
SampleState
- late
- dropped

Main protocol:

exporter[emit_sample] ->
  ingest[append_wal {BlockState.wal_available = true}] ->
  rule_engine[evaluate {SeriesState.window_complete = true}] ->
  alert_manager[transition_alert {AlertState.rule_result = FIRING}] => done

Repair protocol:

backfill[replay_wal {SampleState.late = true or SampleState.dropped = true}] => retry

`I19 Replicated Chunk / Block / File Storage Substrate` #

Entities: NamespaceState, MetadataState, ChunkPlacementState, ReplicaState, WriterLeaseState, RepairState
Source of truth: NamespaceState, MetadataState, ChunkPlacementState, ReplicaState, WriterLeaseState
Ownership mode: lease-backed mutable-writer authority via WriterLeaseState.owner_id + expiry_at
Write paths: create/update namespace metadata, allocate chunk/block, write chunk, update placement, repair/rebalance replicas
Read paths: metadata lookup, chunk location read, chunk fetch, replica health read
Sequence: client -> metadata service -> chunk placement lookup -> storage nodes -> metadata/placement update -> repair loops
Failure modes:
- metadata/data divergence
  - prevention: metadata-first placement discipline with committed placement updates, write lease on mutable path
  - repair: scrub/reconcile metadata against chunk replicas, rebuild placement map
- replica under-count after node loss
  - prevention: replica-count tracking, failure detection, repair scheduling
  - repair: copy from surviving replicas, rebalance placement
Scaling bottlenecks: metadata master/partition hotspot, hot file/chunk, repair bandwidth, small-file amplification mitigation: metadata sharding, chunking, background rebalance windows, compaction/packing for small files
Connections: input: publishers, checkpoint writers, sync clients output: backs A18, checkpoints, media/file systems decorator/middleware: storage substrate under namespace/versioned systems and execution checkpoints

Load-Bearing Protocols #

Source of truth in protocol:

ChunkPlacementState
- replicas_needed
MetadataState
- version
WriterLeaseState
- owner_id
- expiry_at
ReplicaState
- count

Main protocol:

client[allocate_chunk {ChunkPlacementState.replicas_needed > 0}] ->
  metadata_service[commit_placement {MetadataState.version = expected_version}] ->
  storage_node[write_chunk {WriterLeaseState.owner_id = writer_id and WriterLeaseState.expiry_at >= now}] ->
  metadata_service[finalize_write {ReplicaState.count >= target}] => done

Repair protocol:

repair_worker[re_replicate {ReplicaState.count < target_replica_count}] => rebalance

`I20 Computation / Dataflow / DAG Execution` #

Entities: GraphState, OperatorState, StageState, TaskAttemptState, InputPartitionState, ShuffleState, CheckpointState, OutputState, WatermarkState
Source of truth: GraphState, StageState, TaskAttemptState, CheckpointState, OutputState
Ownership mode: attempt-scoped execution via TaskAttemptState.attempt_id; graph versioning via GraphState.version
Write paths: submit graph, plan stages, release ready stage, claim/run task attempt, publish shuffle block, complete checkpoint, commit output
Read paths: graph/job status read, task status read, checkpoint read, shuffle block read, output/result read, watermark/progress read
Sequence: submitter -> planner -> scheduler -> workers -> shuffle/checkpoint services -> output committer -> sink/result store
Failure modes:
- stale task attempt commits output
  - prevention: attempt-scoped temp output, commit guard on current TaskAttemptState.attempt_id
  - repair: reject stale commit, rerun task from input, shuffle, or checkpoint
- checkpoint before output durability
  - prevention: checkpoint completion waits for durable operator state and sink precommit/commit boundary
  - repair: restore from last completed checkpoint, abort ambiguous sink transactions
- lost shuffle block
  - prevention: durable or recomputable shuffle metadata, producer attempt tracking
  - repair: rerun producing task or fetch from replicated shuffle storage
- watermark advances too far
  - prevention: bounded-lateness policy and monotonic watermark guards per input
  - repair: late-data side output, correction/retraction, or window recomputation if supported
Scaling bottlenecks: shuffle fanout, hot key/group, checkpoint I/O, scheduler bottleneck, worker saturation, state-store growth mitigation: key salting, partition rebalancing, incremental checkpoints, local recovery, autoscaling, backpressure
Connections: input: I05 logs/streams, I19 files/chunks, I04 frontier scans output: I06 projections/indexes, I18 telemetry aggregates, materialized datasets decorator/middleware: uses I15 for worker placement and I11 for graph/config rollout

Load-Bearing Protocols #

Source of truth in protocol:

GraphState
- version
- operators
- edges
StageState
- dependencies
- status
TaskAttemptState
- attempt_id
- owner_id
CheckpointState
- checkpoint_id
- completed
OutputState
- committed_version
WatermarkState
- event_time_frontier

Main protocol:

submitter[submit_graph {GraphState.version = new}] ->
  planner[plan_stages {GraphState.version = current}] ->
  scheduler[release_stage {all upstream StageState.status = SUCCEEDED}] ->
  worker[run_task {TaskAttemptState.attempt_id = current}] ->
  shuffle_service[publish_block {TaskAttemptState.attempt_id = current}] ->
  checkpoint_coordinator[complete_checkpoint {CheckpointState.completed = true}] ->
  output_committer[commit_output {all partitions complete and attempt_id = current}] => done

Repair protocol:

scheduler[detect_failed_attempt {heartbeat expired or shuffle block missing}] ->
  scheduler[reissue_task {TaskAttemptState.attempt_id = new}] ->
  worker[recompute_from_input_or_checkpoint] ->
  output_committer[reject_stale_commit {attempt_id != current}]

Concrete anchors:

MapReduce: batch DAG with input splits, map stage, shuffle/sort barrier, reduce stage, and output files.
Flink: streaming DAG with operators, keyed state, checkpoint barriers, watermarks, and checkpoint-gated sink commits.

`I21 Trust Boundary / Cryptographic Proof Substrate` #

Use this archetype when the hard part is not the hot-path authorization decision itself, but the substrate that makes trust decisions verifiable across domains.

Entities: PrincipalState, IdentityBindingState, KeyMaterialState, GrantState, SignedStatementState, RevocationState, AttestationState, AuditLogState, TrustBundleState
Source of truth: PrincipalState, IdentityBindingState, KeyMaterialState, GrantState, RevocationState, TrustBundleState
Ownership mode: delegated authority via issuer-controlled keys and signed statements; freshness bounded by RevocationState and TrustBundleState.version
Write paths: bind identity, issue credential/claim, rotate key, publish trust bundle, revoke grant/key, record signed statement, append audit event, verify attestation
Read paths: verify identity, fetch trust bundle, check revocation, validate signed statement, read audit/provenance, evaluate attestation evidence
Sequence: issuer -> identity/key truth -> signed credential/statement -> verifier -> revocation/trust-bundle check -> audit/transparency
Failure modes:
- stale credential accepted after revocation
  - prevention: short lifetimes, revocation version checks, trust-bundle freshness
  - repair: publish revocation, force re-issuance, invalidate cached decisions
- wrong principal binding
  - prevention: proof-of-possession, audience binding, issuer constraints
  - repair: revoke binding, rotate affected keys, audit impacted statements
- forged or replayed signed statement
  - prevention: nonce/timestamp/audience binding, signature verification, replay cache for sensitive writes
  - repair: add compromised signer to revocation state, regenerate statements
- issuer/operator compromise
  - prevention: threshold signing, transparency log, separation of duties
  - repair: key ceremony, log audit, rotate trust roots
Scaling bottlenecks: verifier hot path, revocation fanout, trust-bundle propagation, audit log write volume, signing service throughput mitigation: local trust-bundle cache, short-lived credentials, append-only audit log partitioning, signer sharding/HSM pools
Connections: input: principals, workloads, artifacts, devices, human users output: protects I01, I05, I11, I14, I15, I16, I17, I20 decorator/middleware: trust substrate across every cross-domain boundary

Load-Bearing Protocols #

Source of truth in protocol:

PrincipalState
- principal_id
- status
IdentityBindingState
- subject
- issuer
- audience
KeyMaterialState
- key_id
- public_key
- valid_from
- valid_until
RevocationState
- revoked_key_ids
- revoked_grants
- version
TrustBundleState
- version
- issuer_roots
SignedStatementState
- statement_id
- signature
- subject
- claims

Main protocol:

issuer[bind_identity {PrincipalState.status = ACTIVE}] ->
  issuer[issue_statement {KeyMaterialState.valid_until > now}] ->
  verifier[verify_signature {TrustBundleState.version fresh}] ->
  verifier[check_revocation {RevocationState.version >= required_version}] ->
  audit[append_verification {SignedStatementState.statement_id = verified}] => accepted

Repair/control protocol:

security_controller[revoke {compromise_detected = true}] ->
  publisher[publish_trust_bundle {TrustBundleState.version = next}] ->
  verifier[reject_cached_statement {statement.key_id in RevocationState.revoked_key_ids}] => repaired

Substrate Cleavage #

I21 is value-separable because identity binding, key custody, statement issuance, verification, revocation, audit, and attestation are different trust boundaries.

Compact flow:

establish principal -> bind key/identity -> issue signed claim -> distribute trust roots
                    -> verify claim -> check freshness/revocation -> audit/prove

Substrate slice	Value it provides	Truth / state
Principal registry	names the actor or workload	`PrincipalState`
Identity binding	binds subject to key, workload, user, device, or artifact	`IdentityBindingState`
Key custody / rotation	controls signing authority and key lifetime	`KeyMaterialState`, `KeyRotationState`
Grant / claim issuance	creates verifiable statements	`GrantState`, `SignedStatementState`
Trust-bundle distribution	tells verifiers which issuers/keys to trust	`TrustBundleState`, `RootVersionState`
Verification hot path	validates signatures, audience, expiry, and issuer	`VerificationState`, `DecisionCacheState`
Revocation / freshness	invalidates compromised or expired authority	`RevocationState`, `FreshnessState`
Audit / transparency	makes issuance and verification accountable	`AuditLogState`, `TransparencyLogState`
Attestation	proves runtime, build, device, or environment properties	`AttestationState`, `EvidenceState`

Canonical substrate APIs:

Boundary	API shape
Principal API	`create_principal`, `disable_principal`
Binding API	`bind_identity(subject, public_key, issuer)`
Key API	`rotate_key`, `publish_jwks_or_bundle`, `retire_key`
Issuance API	`issue_statement(subject, claims, audience, ttl)`
Verification API	`verify_statement(statement, audience, freshness_requirement)`
Revocation API	`revoke_key`, `revoke_grant`, `publish_revocation(version)`
Audit API	`append_audit_event`, `query_statement_lineage`
Attestation API	`submit_evidence`, `verify_attestation(policy)`

Core invariants:

A verifier must accept only statements signed by a trusted, non-revoked issuer key.
A statement must be bound to the intended subject and audience.
Revocation and trust-bundle versions must be fresh enough for the risk boundary.
Key rotation must not create an unbounded window where old and new authority both over-admit.
Audit/provenance records must be append-only or tamper-evident for disputed decisions.

Common cleavage examples:

System	Slice it teaches
`SPIFFE / SPIRE`	workload identity and trust-bundle distribution
`OIDC / JWKS`	signed token issuance and verifier key distribution
`Sigstore`	artifact identity, signing, transparency, and keyless provenance
`TUF`	signed metadata, key rotation, and update trust thresholds
`SLSA / in-toto`	signed provenance and build/deploy attestation

Practical Grouping #

Control and ownership: I01, I02, I10, I11
Progress and execution: I03, I04, I05, I12, I15, I20
Serving and mediation: I07, I08, I17
Storage and namespace: I14, I16, I19
Search/derived/observability: I06, I18, I20
Identity and trust: I09, I21

Use the dominant one first, then compose the rest as:

input archetype
output archetype
decorator/middleware
substrate

Best Canonical Study Object #

Not every archetype is best learned through the same kind of object.

Use the strongest study object for the shape:

protocol spec when the archetype is protocol-shaped
seminal paper when the archetype is architecture or control-loop shaped
mechanism family when correctness comes mainly from a reusable mechanism
implementation lineage when the archetype is best learned through real system families

Archetype	Best study object type	Best canonical study object
`I01 Coordination / Consensus Metadata`	`seminal paper`	`Raft`
`I02 Claim / Lease / Exclusive Ownership`	`mechanism family`	`lease + fencing token`
`I03 Due-Time Release + Claimable Run`	`mechanism family`	`timing wheel / delayed queue / due-index scanner`
`I04 Frontier Scan + Claimable Run`	`mechanism family`	`frontier + checkpoint + resumable scan`
`I05 Append Log + Consumer Progress`	`protocol spec`	`Kafka protocol`
`I06 Projection / Index / Search Pipeline`	`implementation lineage`	`CDC/projector/indexer systems`
`I07 Cache / Origin Projection / Edge Delivery`	`protocol spec`	`HTTP Caching`
`I08 Traffic Shaping / Admission Control`	`mechanism family`	`token bucket / leaky bucket / concurrency limiter / fair queuing`
`I09 Sequence / Identifier Generation`	`implementation lineage`	`Snowflake-style ID generation + range leasing`
`I10 Membership / Presence / Registry`	`seminal paper`	`SWIM`
`I11 Control Plane + Snapshot Distribution`	`protocol spec`	`xDS`
`I12 Workflow + External Side Effect`	`mechanism family`	`transactional outbox + saga/reconciliation`
`I13 Shared Subject Coordination`	`mechanism family`	`OT / CRDT / central sequencer`
`I14 Immutable Artifact Namespace + Delivery`	`protocol spec`	`OCI Distribution Spec`
`I15 Execution Fleet + Worker Substrate`	`seminal paper / implementation lineage`	`Borg`
`I16 Key-Scoped Mutable State / Replicated KV`	`implementation lineage`	`Dynamo / Bigtable / FoundationDB / etcd-like KV families`
`I17 Traffic Steering / Request Mediation Plane`	`implementation lineage`	`Envoy/xDS + load-balancer retry/outlier-ejection model`
`I18 Telemetry / Time-Series Pipeline`	`implementation lineage`	`Prometheus/TSDB design`
`I19 Replicated Chunk / Block / File Storage Substrate`	`seminal paper / implementation lineage`	`GFS`
`I20 Computation / Dataflow / DAG Execution`	`implementation lineage`	`MapReduce / Flink / Spark / Beam`
`I21 Trust Boundary / Cryptographic Proof Substrate`	`protocol / implementation lineage`	`SPIFFE / SPIRE`; adjacent: `Sigstore`, `TUF`, `SLSA / in-toto`

Infra Archetype Taxonomy Reference #

Protocol Notation #

Branches #

Recall Rule #

Ownership-Shaped State #

owner-only #

lease-backed ownership #

attempt-scoped execution #

Normalization Rule #

Archetype To Family Fit #

Post-Topology Security / Privacy Lens #

Default substrate #

Compact template #

Archetype defaults #

I01 Coordination / Consensus Metadata #

Load-Bearing Protocols #

I02 Claim / Lease / Exclusive Ownership #

Load-Bearing Protocols #

Third-Layer Mechanism Lens #

I03 Due-Time Release + Claimable Run #

Load-Bearing Protocols #

Third-Layer Mechanism Lens #

I04 Frontier Scan + Claimable Run #

Load-Bearing Protocols #

Third-Layer Mechanism Lens #

I05 Append Log + Consumer Progress #

Load-Bearing Protocols #

I06 Projection / Index / Search Pipeline #

Load-Bearing Protocols #

I07 Cache / Origin Projection / Edge Delivery #

Load-Bearing Protocols #

I08 Traffic Shaping / Admission Control #

Load-Bearing Protocols #

I09 Sequence / Identifier Generation #

Load-Bearing Protocols #

I10 Membership / Presence / Registry #

Load-Bearing Protocols #

I11 Control Plane + Snapshot Distribution #

Load-Bearing Protocols #

Substrate Cleavage #

Third-Layer Mechanism Lens #

I12 Workflow + External Side Effect #

Load-Bearing Protocols #

Third-Layer Mechanism Lens #

I13 Shared Subject Coordination #

Load-Bearing Protocols #

I14 Immutable Artifact Namespace + Delivery #

Load-Bearing Protocols #

I15 Execution Fleet + Worker Substrate #

Load-Bearing Protocols #

Third-Layer Mechanism Lens #

I16 Key-Scoped Mutable State / Replicated KV #

Load-Bearing Protocols #

I17 Traffic Steering / Request Mediation Plane #

Load-Bearing Protocols #

Third-Layer Mechanism Lens #

I18 Telemetry / Time-Series Pipeline #

Load-Bearing Protocols #

I19 Replicated Chunk / Block / File Storage Substrate #

Load-Bearing Protocols #

I20 Computation / Dataflow / DAG Execution #

Load-Bearing Protocols #

I21 Trust Boundary / Cryptographic Proof Substrate #

Load-Bearing Protocols #

Substrate Cleavage #

Practical Grouping #

Best Canonical Study Object #

`owner-only` #

`lease-backed ownership` #

`attempt-scoped execution` #

`I01 Coordination / Consensus Metadata` #

`I02 Claim / Lease / Exclusive Ownership` #

`I03 Due-Time Release + Claimable Run` #

`I04 Frontier Scan + Claimable Run` #

`I05 Append Log + Consumer Progress` #

`I06 Projection / Index / Search Pipeline` #

`I07 Cache / Origin Projection / Edge Delivery` #

`I08 Traffic Shaping / Admission Control` #

`I09 Sequence / Identifier Generation` #

`I10 Membership / Presence / Registry` #

`I11 Control Plane + Snapshot Distribution` #

`I12 Workflow + External Side Effect` #

`I13 Shared Subject Coordination` #

`I14 Immutable Artifact Namespace + Delivery` #

`I15 Execution Fleet + Worker Substrate` #

`I16 Key-Scoped Mutable State / Replicated KV` #

`I17 Traffic Steering / Request Mediation Plane` #

`I18 Telemetry / Time-Series Pipeline` #

`I19 Replicated Chunk / Block / File Storage Substrate` #

`I20 Computation / Dataflow / DAG Execution` #

`I21 Trust Boundary / Cryptographic Proof Substrate` #