Infra Archetype Taxonomy Reference
Infra Archetype Taxonomy Reference #
Use this as the canonical infra taxonomy after the shift from the older six-bucket infra scheme to the normalized I01-I21 set.
For each archetype, this note lists:
- entities
- source-of-truth entities
- write paths
- read paths
- sequence shape
- failure phrases with prevention and repair
- scaling bottlenecks and mitigation
- how the archetype connects to other archetypes
The goal is to keep the archetypes operational and composable, not just nameable.
This note is the second layer after the infra family sheet:
- families are for generating candidate paths quickly
- archetypes are for making those paths precise in terms of truth, transitions, ownership, repair, and bottlenecks
Use this note when you need to answer:
- what is authoritative?
- what transition is guarded?
- what stale actor or stale revision must be fenced?
- what repair loop restores correctness?
- what bottleneck is canonical for this shape?
- what deployment and trust posture is the clean default for this shape?
Protocol Notation #
For infra interview recall, it is often useful to render an archetype as one or more compact protocols.
Assume the verbs are API or protocol operations, not just vague English labels.
So a protocol step should read like:
actor[operation {guard on truth}] -> next actor
Where:
actoris the component issuing or owning the stepoperationis the protocol/API operationguard on truthis the predicate that makes the step legal
Examples:
claimant[claim {ClaimState.owner_id = null or LeaseState.expiry_at < now}]consumer[commit_progress {SinkEffectState.applied(event_id) = true}]agent[apply {incoming.version > AppliedVersionState.local_version}]worker[complete {ExecutionAttemptState.attempt_id = my_attempt}]
Branches #
Protocols often branch.
Use:
->for mainline progression|->for guarded branch=> done|retry|requeue|reclaim|rollbackfor compact terminal outcomes
Example:
scheduler[place {CapacityState.free_slots > 0}] ->
worker[launch {ExecutionLeaseState.owner_id = worker_id}] ->
worker[complete {ExecutionAttemptState.attempt_id = my_attempt}] => done
|-> reconciler[reclaim {ExecutionLeaseState.expiry_at < now}] -> scheduler[reassign] => requeue
Recall Rule #
For most infra archetypes, do not try to memorize a full sequence diagram. Memorize:
- main protocol
- repair/failure protocol
- optional control/update protocol
That usually captures the load-bearing mechanics better than prose alone.
Before the protocol, list the source-of-truth objects used by the guards, and nest the critical attributes under each object.
Use this shape:
Source of truth in protocol:TruthObjectAcritical_attr_1critical_attr_2
TruthObjectBcritical_attr_3
Ownership-Shaped State #
Ownership-bearing archetypes should be read through one of three state shapes.
owner-only #
Use when ownership ends only by:
- explicit release
- explicit overwrite
- versioned transition
Canonical fields:
owner_idversionor equivalent guarded revision
lease-backed ownership #
Use when ownership is:
- soft-state
- renewal-based
- reclaimable after timeout
- protected by fencing
Canonical fields:
owner_idexpiry_at- optional
epoch,term, or fencing token
This may appear as:
- one combined object
- for example
ExecutionLeaseState.owner_id,ExecutionLeaseState.expiry_at
- for example
- or split objects
- for example
ClaimState.owner_id+LeaseState.expiry_at+OwnerEpoch.current
- for example
attempt-scoped execution #
Use when correctness depends on:
- which execution generation is current
- stale completion rejection
- replay or reassignment after failure
Canonical fields:
attempt_id- usually paired with either:
owner-only- or
lease-backed ownership
Normalization Rule #
When reclaim or stale-actor fencing depends on timeout, the protocol should expose:
- who owns it
- when that ownership expires
- and, if needed, which generation or epoch is current
So:
owner_idanswers who is allowed to act nowexpiry_atanswers until when that authority is validattempt_idorepochanswers which generation is allowed to commit
Archetype To Family Fit #
Each archetype should feel like a mechanically precise realization of one family center of gravity, even when adjacent families or overlays are present.
Use this compact fit table:
| Archetype | Default family fit | Why it fits |
|---|---|---|
I01 Coordination / Consensus Metadata | Coordination / Authority | authoritative metadata mutation, revisions, watches, and leadership correctness |
I02 Claim / Lease / Exclusive Ownership | Coordination / Authority | ownership and fencing are the primary correctness center |
I03 Due-Time Release + Claimable Run | Execution / Resource Management | due-time eligibility releases work into execution |
I04 Frontier Scan + Claimable Run | Execution / Resource Management | uncovered-work discovery releases work into execution |
I05 Append Log + Consumer Progress | Messaging / Log / Stream | append, consume, progress, replay, retention |
I06 Projection / Index / Search Pipeline | Serving State / Storage or Observability / Telemetry | derived read truth and rebuild semantics dominate |
I07 Cache / Origin Projection / Edge Delivery | Serving State / Storage | read acceleration and freshness around origin truth |
I08 Traffic Shaping / Admission Control | Routing / Mediation or Execution / Resource Management | demand gating and fairness dominate the hot path |
I09 Sequence / Identifier Generation | Coordination / Authority | uniqueness and monotonic allocation dominate correctness |
I10 Membership / Presence / Registry | Membership / Registry | register, heartbeat, expire, lookup, watch |
I11 Control Plane + Snapshot Distribution | Control Plane / Distribution | versioned config truth, publication, apply, rollback |
I12 Workflow + External Side Effect | no single family default | workflow correctness and side-effect reconciliation are the center |
I13 Shared Subject Coordination | no common family default | shared subject sequencing and merge correctness dominate |
I14 Immutable Artifact Namespace + Delivery | Artifact / Rollout / Release | immutable version publication and delivery truth dominate |
I15 Execution Fleet + Worker Substrate | Execution / Resource Management | placement, capacity, execution lease, completion, reconciliation |
I16 Key-Scoped Mutable State / Replicated KV | Serving State / Storage | current serving truth and replication semantics dominate |
I17 Traffic Steering / Request Mediation Plane | Routing / Mediation | routing, backend health, stickiness, retries, drain |
I18 Telemetry / Time-Series Pipeline | Observability / Telemetry | ingest, aggregate, query, alert, retention |
I19 Replicated Chunk / Block / File Storage Substrate | Serving State / Storage | chunk placement, replica repair, namespace and metadata truth |
I20 Computation / Dataflow / DAG Execution | Computation / Dataflow | graph topology, operator state, shuffle, checkpoint, and output visibility |
I21 Trust Boundary / Cryptographic Proof Substrate | Identity / Trust / Policy | principal binding, key material, signed statements, revocation, audit, and attestation across trust boundaries |
If an archetype still feels too coarse after this view, add the secondary runtime lens from:
That overlay is most helpful for run, stream, workflow, fleet, telemetry, and storage-substrate systems.
For DB-like replicated-state systems, also use:
That one is narrower and is most useful for A07, I16, I01, and I02.
For the post-topology trust boundary pass, use:
That lens should usually be applied after:
- archetype selection
- correctness reasoning
- scale reasoning
- deployment topology choice
not before.
Post-Topology Security / Privacy Lens #
Apply security and privacy after you have already chosen:
- serving domain
- coordination domain
- fault domain
- recovery domain
At that point, ask:
- who is trusted in each domain?
- what can each domain observe?
- what can each domain mutate?
- what must hold across each boundary:
- confidentiality
- integrity
- authenticity
- unlinkability
- auditability
Default substrate #
For most infra archetypes, the default substrate is:
M1cryptographic identity bindingM2authenticated encryptionM4signed or MAC’d versioned state
Compressed:
- identity
- secure channel or encrypted payload
- signed and versioned control/state objects
Everything else is specialized:
M3for forward secrecyM5for rotating identifiers / unlinkabilityM6for privacy-preserving proof or authorizationM7for aggregate privacyM8for access-pattern privacyM9for attested execution on untrusted hosts
Compact template #
After deployment topology, add this block:
principals:adversary:trusted domains:untrusted domains:sensitive boundaries:baseline substrate:M1 + M2 + M4
specialized move if needed:defeats:does not defeat:
Archetype defaults #
| Archetype | Default security fit after topology | Common special move |
|---|---|---|
I01 | signed and versioned quorum state over authenticated channels | Byzantine or attested control only if quorum members are not trusted |
I02 | signed owner / epoch / lease tokens | stronger lease issuer trust story if lease authority itself is suspect |
I03 | signed schedule truth | encrypt runnable payload if scheduler should not read it |
I04 | signed checkpoint and frontier truth | signed completion proof or attestation if workers may lie |
I05 | authenticated producers/consumers, encrypted transport or payload, signed sequence metadata | forward secrecy for E2EE-style logs |
I06 | normal plaintext-index regime over secure channels | searchable encryption only when server plaintext visibility is unacceptable |
I07 | signed cache/config/content metadata over secure transport | signed-content edge if CDN/edge should be untrusted |
I08 | authenticated subject plus signed policy/versioning | anonymous tokens for privacy-preserving rate limits |
I09 | authenticated range allocator plus signed range/version truth | stronger fencing if allocation rights are delegated |
I10 | authenticated members and signed heartbeat/view state | rotating identifiers for privacy-sensitive presence/discovery |
I11 | signed snapshots/config with version/freshness checks | transparency or multi-signer approval for high-trust control planes |
I12 | signed workflow state and side-effect receipts | encrypt payload or add proof if side effects are disputed or sensitive |
I13 | signed operations and versioned merge state | encrypted collaboration when operator should not read content |
I14 | signed artifact publication and signed head advances | transparency log for supply-chain accountability |
I15 | authenticated workers plus signed placement/lease state | TEE attestation for confidential workloads |
I16 | authenticated clients, encrypted values, signed versioned state | oblivious access only for strong access-pattern privacy |
I17 | authenticated hops, mTLS, signed routes/policies | onion-routing composition for unlinkability |
I18 | signed exporters/samples | differential privacy for user-level analytics privacy |
I19 | encrypted chunks and signed metadata/manifests | proof-of-possession/replication or oblivious read only if needed |
I20 | signed graph definitions, authenticated workers, signed checkpoints/output commits | TEE attestation for confidential computation or signed lineage proofs for disputed outputs |
I21 | workload identity, signed claims, revocation state, and tamper-evident audit | transparency logs, threshold signing, or attestation when issuer/operator trust is not enough |
I01 Coordination / Consensus Metadata #
- Entities:
LeaseState,LeadershipState,MembershipState,RevisionedConfig - Source of truth:
LeaseState,LeadershipState,MembershipState,RevisionedConfig - Ownership mode:
lease-backed authority via
LeaseState.holder_id + expiry_at, fenced byLeadershipState.term - Write paths: acquire leadership, renew lease, release leadership, mutate metadata, watch registration
- Read paths: current leader read, quorum-backed metadata read, watch/revision stream
- Sequence:
client -> coordination API -> quorum store -> watch stream -> watchers - Failure modes:
split brain- prevention: quorum-backed guarded transition, monotonic term/epoch, fencing
- repair: revoke stale leader, force new election, resync watchers from revision
watch gap on reconnect- prevention: revision cursor on watch, compaction boundary checks
- repair: replay from retained revision or reload full snapshot
- Scaling bottlenecks: quorum write latency, hot metadata key, watch fanout mitigation: shard metadata by coordination domain, narrow hot keys, snapshot-plus-watch fanout tiers
- Connections:
input: admin/control writes, membership changes
output:
I02,I10,I11,I15decorator/middleware: coordination layer under other archetypes that need authoritative election or revisioning
Load-Bearing Protocols #
Source of truth in protocol:
RevisionedConfigversioncompaction_floor
LeadershipStateterm
LeaseStateholder_idexpiry_at
Main protocol:
client[mutate_metadata {RevisionedConfig.version = expected_version}] ->
quorum_store[commit {quorum reachable}] ->
watcher[resume_watch {watch_revision >= RevisionedConfig.version}] => done
Repair/control protocol:
leader_candidate[acquire_leadership {LeadershipState.term = expected_term and quorum reachable}] ->
leader[renew_lease {LeaseState.holder_id = me and LeaseState.expiry_at >= now}] ->
follower[observe_revision {RevisionedConfig.version > local_version}] => done
|-> watcher[reload_snapshot {watch_revision < RevisionedConfig.compaction_floor or watch_gap_detected = true}] => requeue
I02 Claim / Lease / Exclusive Ownership #
- Entities:
ClaimState,LeaseState,OwnerEpoch - Source of truth:
ClaimState,LeaseState,OwnerEpoch - Ownership mode:
lease-backed ownership split across
ClaimState.owner_id,LeaseState.expiry_at, andOwnerEpoch.current - Write paths: claim, renew, release, expire stale claim, reclaim abandoned work
- Read paths: current owner read, lease expiry read
- Sequence:
claimant -> lease service -> lease store -> claimant acts with epoch -> downstream fenced write - Failure modes:
duplicate claim- prevention: guarded claim transition, unique active owner, lease + epoch
- repair: expire/reap stale claim, reconcile duplicate actors downstream
stale holder acting after expiry- prevention: fencing token / epoch check on every downstream commit
- repair: reassign ownership, replay or reconcile partial side effects
- Scaling bottlenecks: hot claim key, renew storms, lease-manager contention mitigation: partition claim domains, batch renewals, longer leases with fencing, local caches for read-only ownership observation
- Connections:
input:
I03,I04,I15output: protectsA08,A12,I15,I17decorator/middleware: ownership guard around mutable truth or execution
Load-Bearing Protocols #
Source of truth in protocol:
ClaimStateowner_id
LeaseStateexpiry_at
OwnerEpochcurrent
Main protocol:
claimant[claim {ClaimState.owner_id = null or LeaseState.expiry_at < now}] ->
holder[renew {OwnerEpoch.current = my_epoch}] ->
holder[commit {OwnerEpoch.current = my_epoch}] => done
Repair/fencing protocol:
downstream_writer[reject_commit {OwnerEpoch.current != presented_epoch}] => retry
|-> reaper[reclaim {LeaseState.expiry_at < now}] => requeue
Third-Layer Mechanism Lens #
- Canonical unit of work:
claim - Most relevant design-space dimensions:
authorityDelegation,progressVisibility,recovery,coordAuthority - Common mechanism variants: lease-acquired authority vs claim-acquired ownership, heartbeat/renewal vs pure TTL expiry, centralized lease truth vs consensus-backed lease truth
- Dominant invariant families:
at-most-one-current-valid-owner,stale-actor-must-not-commit,every-eligible-item-is-owned-or-completed-or-rediscoverable - Canonical failure signatures: split-brain ownership, zombie write, orphaned claim, false expiry, renew storm
- Good real-system anchors: Chubby, ZooKeeper recipes, etcd leases, SQS visibility-timeout-style claim loops
I03 Due-Time Release + Claimable Run #
- Entities:
ScheduleState,RunnableState,AttemptState,DueIndexState - Source of truth:
ScheduleState,RunnableState - Write paths: schedule job, materialize due job, claim runnable attempt, complete/fail attempt
- Read paths: due scan by time bucket, runnable fetch, history/progress read
- Sequence:
client -> scheduler API -> schedule store -> due scanner -> runnable store/queue -> worker - Failure modes:
due item materialized twice- prevention: checkpoint-after-durable-materialization, idempotent release keyed by logical run
- repair: dedup runnable records, reconcile duplicate attempts
due item never materialized- prevention: overdue reconciliation sweep, monotonic scan checkpoints
- repair: rescan overdue buckets, rebuild runnable set from schedule truth
- Scaling bottlenecks: due bucket hotspot, bursty wakeups, runnable queue bursts mitigation: bucket sharding, jitter, hierarchical timing wheels, downstream fleet buffering
- Connections:
input: user schedules, cron, retries
output:
I15orI02for claim/execution decorator/middleware: often front door to workflow/orchestration systems
Load-Bearing Protocols #
Source of truth in protocol:
ScheduleStateversionnext_due_at
DueIndexStatebucket_start
RunnableStatestatus
AttemptStateattempt_id
Main protocol:
client[schedule {ScheduleState.version = expected_version}] ->
due_scanner[materialize_runnable {DueIndexState.bucket_start <= now and RunnableState.status = ABSENT}] ->
worker[claim {RunnableState.status = READY}] ->
worker[complete {AttemptState.attempt_id = my_attempt}] => done
Repair protocol:
sweeper[rescan_overdue {ScheduleState.next_due_at < now and RunnableState.status = ABSENT}] => requeue
Third-Layer Mechanism Lens #
- Canonical unit of work:
run - Most relevant design-space dimensions:
lifecycleShape,recovery,progressVisibility,authorityDelegation - Common mechanism variants: due-index scan vs timing-wheel release, broker-mediated runnable queue vs direct fleet handoff, requeue vs retry-in-place after failure
- Dominant invariant families:
every-eligible-item-is-owned-or-completed-or-rediscoverable,progress-never-regresses,completion-only-after-prerequisite-effect-is-durable - Canonical failure signatures: missed due release, duplicate run materialization, lost checkpoint, retry storm, lateness burst
- Good real-system anchors: Quartz, EventBridge Scheduler, Airflow scheduler, hierarchical timing wheels
I04 Frontier Scan + Claimable Run #
- Entities:
FrontierState,ClaimState,CheckpointState,ProgressState - Source of truth:
FrontierState,CheckpointState - Ownership mode:
lease-backed batch claim via
ClaimState.owner_id + expiry_at - Write paths: expand frontier, claim uncovered work, checkpoint covered progress, requeue incomplete work
- Read paths: frontier scan, progress read, claim ownership read
- Sequence:
frontier manager -> frontier store -> workers claim batches -> checkpoint update -> next frontier opens - Failure modes:
frontier advanced too far- prevention: checkpoint only after durable success, guarded frontier advance
- repair: rescan from last safe checkpoint, rebuild uncovered set
uncovered work skipped- prevention: coverage invariant over checkpoint/frontier, resumable scan discipline
- repair: anti-entropy rescan, replay unfinished partitions
- Scaling bottlenecks: hot checkpoint row, skewed ranges, claim bursts mitigation: partition frontier, hierarchical checkpoints, skew-aware shard splitting
- Connections:
input: discovery systems, batch scanners, DAG dependency release
output:
I15,I06,A10decorator/middleware: often paired withI05orI15for execution
Load-Bearing Protocols #
Source of truth in protocol:
CheckpointStatecovered_cursor
FrontierStatehigh_watermark_cursor
ClaimStateowner_idexpiry_at
ProgressStatebatch_done
Main protocol:
frontier_manager[claim_range {CheckpointState.covered_cursor < FrontierState.high_watermark_cursor}] ->
worker[process_batch {ClaimState.owner_id = my_worker}] ->
worker[checkpoint {ProgressState.batch_done = true}] => done
Repair protocol:
reconciler[requeue_range {ClaimState.expiry_at < now or ProgressState.batch_done = false}] => requeue
Third-Layer Mechanism Lens #
- Canonical unit of work:
batch - Most relevant design-space dimensions:
lifecycleShape,progressVisibility,reconciliation,recovery - Common mechanism variants: centralized frontier manager vs partition-local frontier ownership, checkpoint-write vs event-emission progress, periodic resweep vs continuous anti-entropy
- Dominant invariant families:
every-eligible-item-is-owned-or-completed-or-rediscoverable,progress-never-regresses,drift-eventually-corrected - Canonical failure signatures: uncovered work skipped, frontier advanced too far, lost checkpoint, orphaned claim, duplicate batch claim
- Good real-system anchors: Mercator, crawler frontiers, repair sweepers, compaction scanners
I05 Append Log + Consumer Progress #
- Entities:
LogSegment,Offset,ConsumerProgress,PartitionState - Source of truth:
LogSegment,ConsumerProgress - Write paths: append record, commit consumer progress, compact/retain segments
- Read paths: fetch from offset, replay from offset, lag read
- Sequence:
producer -> broker/log -> consumer fetch -> effect -> progress commit - Failure modes:
offset advanced before effect committed- prevention: effect-before-offset discipline, idempotent sink, transactional commit where possible
- repair: replay from safe offset, reconcile sink/effect ambiguity
duplicate consumption- prevention: idempotent consumer, inbox dedup, fenced consumer group progress
- repair: replay with dedup or downstream reconciliation
- Scaling bottlenecks: hot partition, broker I/O, rebalance churn mitigation: partition key design, batch I/O, consumer group tuning, tiered storage
- Connections:
input: producers, CDC, event emitters
output:
I06,I12,A14, analytics systems decorator/middleware: backbone under many async systems
Load-Bearing Protocols #
Source of truth in protocol:
PartitionStateleader
LogSegmenthigh_watermark
ConsumerProgressnext_offset
SinkEffectStatedurable
Main protocol:
producer[append {PartitionState.leader = current_leader}] ->
consumer[fetch {ConsumerProgress.next_offset <= LogSegment.high_watermark}] ->
consumer[process] ->
consumer[commit_progress {SinkEffectState.durable = true}] => done
Repair protocol:
consumer[replay {SinkEffectState.durable = false or SinkEffectState.durable = unknown}] => retry
I06 Projection / Index / Search Pipeline #
- Entities:
SourceState,IndexEntryState,ProjectionState,ProjectorCheckpoint - Source of truth:
SourceState - Write paths: source mutation, projector apply, delete/tombstone propagation, reindex
- Read paths: projection query, search query, freshness read
- Sequence:
source writer -> source truth -> projector/indexer -> projection/index -> reader - Failure modes:
stale projection- prevention: ordered projector checkpoints, replayable source log, monotonic apply
- repair: replay or rebuild from source truth
missing entry / tombstone not propagated- prevention: tombstone handling, completeness checks, backfill sweeps
- repair: reindex affected range or full rebuild
- Scaling bottlenecks: fanout on write, query fanout, rebuild cost mitigation: partitioned projectors, async pipelines, hierarchical indexes, background rebuild lanes
- Connections:
input:
I05, source truth from many product archetypes output:A05,A15, monitoring/search surfaces decorator/middleware: derived read path for many primary archetypes
Load-Bearing Protocols #
Source of truth in protocol:
SourceStateversion
ProjectorCheckpointnext_offsetgap_detected
ProjectionStateversion
Main protocol:
writer[mutate_source {SourceState.version = expected_version}] ->
projector[apply {ProjectorCheckpoint.next_offset = expected_offset}] ->
indexer[publish_projection {ProjectionState.version = source_version}] => done
Repair protocol:
rebuilder[reindex {ProjectorCheckpoint.gap_detected = true}] => replay
I07 Cache / Origin Projection / Edge Delivery #
- Entities:
OriginState,CacheEntry,PurgeVersion,EdgePolicyState - Source of truth:
OriginState - Write paths: populate cache, refresh cache, invalidate/purge cache, propagate edge policy
- Read paths: edge/cache read, origin fallback read, freshness/version read
- Sequence:
client -> edge/cache -> cache miss -> origin -> cache fill -> later invalidation/purge - Failure modes:
stale cache- prevention: TTL/versioned purge discipline, origin version checks on fill
- repair: purge, refresh, force origin read
cache stampede- prevention: single-flight fill, request coalescing, soft TTL with refresh ahead
- repair: temporary shed/load protect origin, backfill cache gradually
- Scaling bottlenecks: hot key, miss storm, invalidation fanout mitigation: request collapsing, regional caches, stale-while-revalidate, purge trees
- Connections:
input:
I06,I14, any origin truth output: read acceleration to product and infra systems decorator/middleware: read-path decorator in front of truth or projection
Load-Bearing Protocols #
Source of truth in protocol:
CacheEntryexpires_atmissversion
OriginStateversion
PurgeVersionversion
Read/fill protocol:
client[read_cache {CacheEntry.expires_at >= now}] => done
|-> cache[fill {CacheEntry.miss = true}] ->
origin[read {OriginState.version >= requested_version}] ->
cache[publish_entry {PurgeVersion.version = expected_version}] => done
Invalidation protocol:
invalidator[purge {PurgeVersion.version = expected_version}] ->
edge[drop_entry {CacheEntry.version < PurgeVersion.version}] => refresh
I08 Traffic Shaping / Admission Control #
- Entities:
BudgetState,WindowState,TokenBucketState,ConcurrencyState,PolicyState - Source of truth:
BudgetState,WindowState,TokenBucketState,ConcurrencyState,PolicyState - Write paths: evaluate admit/reject/defer, consume tokens or slots, refill/reset, update shaping policy
- Read paths: current budget/policy read, decision audit read
- Sequence:
request -> evaluator -> budget/policy store or local snapshot -> admit/reject/defer -> downstream - Failure modes:
over-admit under race- prevention: atomic budget update, local-fast-path with bounded drift, concurrency caps
- repair: shed/defer excess load, reset counters from truth
stale policy apply- prevention: versioned policy snapshots, monotonic apply
- repair: republish policy, invalidate local snapshot, reconcile incorrect decisions if needed
- Scaling bottlenecks: hot tenant key, evaluator hot path, policy fanout mitigation: local token buckets with periodic sync, shard budgets, route-local admission tiers
- Connections:
input:
I11policies,I17request flow output: protectsI17,I15, external dependencies decorator/middleware: request-path gate before expensive downstream systems
Load-Bearing Protocols #
Source of truth in protocol:
TokenBucketStateavailable_tokens
ConcurrencyStateavailable_slots
PolicyStateversiondenies
Main protocol:
request[evaluate_admission {TokenBucketState.available_tokens > 0 and PolicyState.version >= local_policy_version}] ->
evaluator[consume_budget {ConcurrencyState.available_slots > 0}] => done
Reject/defer protocol:
evaluator[reject {TokenBucketState.available_tokens = 0 or PolicyState.denies = true}] => retry
I09 Sequence / Identifier Generation #
- Entities:
CounterState,RangeLeaseState,WorkerIdState,EpochState - Source of truth:
CounterState,RangeLeaseState,WorkerIdState,EpochState - Ownership mode:
lease-backed range ownership via
RangeLeaseState.owner_id + expiry_at, fenced byEpochState.current - Write paths: allocate counter/range, claim worker ID, generate ID, renew worker lease
- Read paths: generated ID response, generator health read
- Sequence:
client -> allocator or local generator -> counter/range lease state -> id returned - Failure modes:
duplicate IDs- prevention: unique worker identity or leased ranges, monotonic epoch, guarded local sequence advance
- repair: fence bad generator, rotate worker IDs, repair duplicates only if downstream can reconcile
non-monotonic IDs- prevention: monotonic local clock discipline or logical sequence fallback
- repair: epoch bump and restart generation lane
- Scaling bottlenecks: central allocator hotspot, worker-id contention mitigation: range leasing, local generation, wider worker-id space
- Connections: input: control-plane worker registration output: IDs for many archetypes decorator/middleware: identity decorator on write paths
Load-Bearing Protocols #
Source of truth in protocol:
RangeLeaseStateowner_idexpiry_atmax_value
EpochStatecurrent
WorkerIdStateconflict
Main protocol:
client[allocate_range {RangeLeaseState.owner_id = null or RangeLeaseState.expiry_at < now}] ->
generator[issue_id {EpochState.current = my_epoch and LocalCounter.value < RangeLeaseState.max_value}] => done
Repair protocol:
allocator[rotate_worker_id {WorkerIdState.conflict = true}] => retry
I10 Membership / Presence / Registry #
- Entities:
MemberState,PresenceState,RegistryVersion - Source of truth:
MemberState,PresenceState,RegistryVersion - Write paths: register member, heartbeat, expire member, update registry version
- Read paths: membership watch, presence query, service lookup
- Sequence:
node/session -> registry -> heartbeat updates -> watchers/readers consume membership view - Failure modes:
false death- prevention: suspicion before eviction, heartbeat grace, version/incarnation rules
- repair: rejoin with higher incarnation, anti-entropy membership sync
ghost member- prevention: expiry sweep, heartbeat TTL, watch versioning
- repair: explicit purge or rebuild registry from active heartbeats
- Scaling bottlenecks: heartbeat fan-in, watch fanout, hot group membership mitigation: hierarchical membership, piggyback/gossip where appropriate, cached registry reads
- Connections:
input: node health, session state
output:
I17,I15,I01decorator/middleware: discovery/presence overlay for routing or assignment
Load-Bearing Protocols #
Source of truth in protocol:
RegistryVersionversion
MemberStatemember_id
PresenceStatestatuslast_heartbeat_at
Main protocol:
node[register {RegistryVersion.version = expected_version}] ->
node[heartbeat {MemberState.member_id = my_id}] ->
reader[lookup {PresenceState.status = ALIVE}] => done
Repair protocol:
reaper[expire {PresenceState.last_heartbeat_at < now - ttl}] => refresh
I11 Control Plane + Snapshot Distribution #
- Entities:
ConfigState,SnapshotState,AppliedVersionState - Source of truth:
ConfigState - Write paths: config mutate, publish snapshot, agent apply version
- Read paths: config read, watch/version stream, local snapshot read
- Sequence:
admin -> control plane -> config truth -> snapshot publisher/watch -> agent -> local apply - Failure modes:
out-of-order snapshot apply- prevention: monotonic version checks, ACK/NACK discipline
- repair: refetch full snapshot, replay from current version
partial rollout- prevention: staged rollout tracking, applied-version reporting
- repair: rollback revision, republish to missing agents
- Scaling bottlenecks: fanout to many agents, hot tenant config, snapshot size mitigation: delta snapshots, snapshot CDN/brokers, tenant partitioning
- Connections:
input: policy/admin truth
output:
I08,I17,I15,A11decorator/middleware: control-plane layer for many serving and evaluation systems
Load-Bearing Protocols #
Source of truth in protocol:
ConfigStateversion
SnapshotStateversion
AppliedVersionStatelocal_versionerror_rate
Main protocol:
admin[mutate_config {ConfigState.version = expected_version}] ->
publisher[publish_snapshot {SnapshotState.version = ConfigState.version}] ->
agent[apply {SnapshotState.version > AppliedVersionState.local_version}] ->
agent[ack {AppliedVersionState.local_version = SnapshotState.version}] => done
Repair/control protocol:
controller[rollback {AppliedVersionState.error_rate > rollout_threshold or rollout_health = BAD}] => republish
Substrate Cleavage #
I11 is value-separable because authoring, publication, propagation, local apply, health observation, and rollback have different truth states and failure modes.
Compact flow:
author -> validate -> commit config truth -> publish version -> distribute snapshot/delta
-> agent apply -> report applied version/health -> advance rollout or rollback
| Substrate slice | Value it provides | Truth / state |
|---|---|---|
| Authoring / intent | captures desired config or policy change | ConfigIntentState, ChangeRequestState |
| Validation / guardrail | rejects unsafe or invalid changes before publication | ValidationState, PolicyCheckState |
| Versioned truth store | owns authoritative config history | ConfigState, ConfigVersionState, RevisionHistoryState |
| Publication | converts truth into an immutable release/snapshot | SnapshotState, DeltaState, PublicationState |
| Distribution / fanout | moves snapshot or delta to many targets | DistributionState, WatchCursorState, FanoutState |
| Local apply | makes a target serve using a specific version | AppliedVersionState, LocalSnapshotState |
| Health / convergence observation | tells whether rollout is safe to continue | TargetHealthState, ConvergenceState, ApplyAckState |
| Rollout / rollback control | advances, pauses, or reverts versions | RolloutState, RollbackState, WaveState |
| Drift / repair | detects and corrects targets serving the wrong version | DriftState, RepairState |
Canonical substrate APIs:
| Boundary | API shape |
|---|---|
| Authoring API | propose_change, validate_change, approve_change |
| Truth mutation API | put_config(expected_version, patch) |
| Publication API | publish_snapshot(version) or publish_delta(from_version, to_version) |
| Watch / fetch API | watch(from_version) or get_snapshot(version) |
| Apply API | apply(version, snapshot_or_delta) |
| ACK / health API | ack(version, status, health_signals) |
| Rollout API | advance_wave, pause_rollout, rollback(version) |
| Repair API | refetch_full_snapshot, reconcile_drift(target_id) |
Core invariants:
ConfigState.versionis monotonic.- A target must not apply a version older than its current
AppliedVersionState.local_version. - A published snapshot must correspond to exactly one config version.
- Rollout may advance only from observed health and convergence state, not from publication success alone.
- Drift repair must converge targets back to a declared desired version.
Common cleavage examples:
| System | Slice it teaches |
|---|---|
xDS | versioned control-plane publication and target ACK/NACK |
| Kubernetes watch/resourceVersion | watch, cache, and replay from versioned API truth |
| feature-flag platform | local snapshot apply, targeting rules, and staged rollout |
| Argo CD / GitOps controller | desired config truth, apply, drift detection, rollback |
Third-Layer Mechanism Lens #
- Canonical unit of work:
snapshot publication,apply target - Most relevant design-space dimensions:
authorityDelegation,progressVisibility,reconciliation,coordAuthority - Common mechanism variants: push vs pull apply, full snapshot vs delta distribution, central publisher vs tiered fanout/cache hierarchy
- Dominant invariant families:
progress-never-regresses,stale-actor-must-not-commit,drift-eventually-corrected - Canonical failure signatures: stale policy apply, partial rollout, drift not reconciled, reconciliation thrash, fanout overload
- Good real-system anchors: Envoy xDS, Kubernetes watch/resourceVersion distribution, feature-flag control planes
I12 Workflow + External Side Effect #
- Entities:
WorkflowState,OutboxEvent,DeliveryAttemptState,EffectResultState - Source of truth:
WorkflowState,OutboxEvent - Write paths: create workflow, guarded transition, emit outbox, record delivery attempt/result
- Read paths: workflow status read, replay/reconciliation read
- Sequence:
client -> workflow service -> workflow truth + outbox -> effect worker -> external system -> result update - Failure modes:
crash after transition but before side effect- prevention: transactional outbox, idempotency key
- repair: replay outbox, reconcile with provider
retry ambiguity against provider- prevention: idempotent external API or provider-side dedup keys
- repair: reconciliation poll/manual correction
- Scaling bottlenecks: worker backlog, external provider bottleneck, hot workflow rows mitigation: queue buffering, provider-specific rate shaping, partition workflows by tenant/key
- Connections: input: many product/process triggers output: external systems, notifications, payments decorator/middleware: effect-handling shell around state machines
Load-Bearing Protocols #
Source of truth in protocol:
WorkflowStateversionstate
OutboxEventstatus
DeliveryAttemptStateattempt_id
EffectResultStateprovider_result
Main protocol:
client[start_workflow {WorkflowState.version = expected_version}] ->
workflow_service[transition {WorkflowState.state in allowed_predecessors}] ->
worker[deliver_effect {OutboxEvent.status = READY}] ->
worker[record_result {DeliveryAttemptState.attempt_id = my_attempt}] => done
Repair protocol:
reconciler[replay_outbox {OutboxEvent.status = READY or EffectResultState.provider_result = AMBIGUOUS}] => retry
Third-Layer Mechanism Lens #
- Canonical unit of work:
workflow instance - Most relevant design-space dimensions:
recovery,progressVisibility,reconciliation,authorityDelegation - Common mechanism variants: replay-based durable execution vs compensation-heavy saga, central orchestrator vs broker-mediated step execution, event-emission vs checkpoint-write progress
- Dominant invariant families:
progress-never-regresses,completion-only-after-prerequisite-effect-is-durable,stale-actor-must-not-commit - Canonical failure signatures: duplicate execution, stale completion, stuck in nonterminal state, ambiguous provider result, retry storm
- Good real-system anchors: Temporal, Cadence, Step Functions, Sagas, Life Beyond Distributed Transactions
I13 Shared Subject Coordination #
- Entities:
SubjectState,OperationLog,VersionState,SessionState - Source of truth:
SubjectState,OperationLog - Write paths: submit op, sequence/merge op, snapshot subject, advance version
- Read paths: subscribe to op stream, fetch snapshot, replay from version
- Sequence:
client session -> coordinator -> op log + subject state -> fanout to subscribers - Failure modes:
out-of-order apply- prevention: per-subject sequencing or causality checks
- repair: replay from op log, rebuild subject snapshot
divergence- prevention: authoritative merge/sequence discipline, versioned sync
- repair: resync from snapshot plus missing ops
- Scaling bottlenecks: hot subject coordinator, replay cost, subscriber fanout mitigation: shard by subject, snapshots, local session buffering
- Connections:
input: collaborative client edits
output:
A17,A14decorator/middleware: coordination substrate for shared mutable products
Load-Bearing Protocols #
Source of truth in protocol:
VersionStatebase_version
SubjectStateversion
OperationLogoffset
Main protocol:
client[submit_op {VersionState.base_version = expected_version}] ->
coordinator[sequence_or_merge {SubjectState.version = expected_version}] ->
coordinator[publish_op {OperationLog.offset = next}] ->
subscriber[apply {OperationLog.offset > local_offset}] => done
Repair protocol:
client[resync {VersionState.base_version != SubjectState.version}] => retry
I14 Immutable Artifact Namespace + Delivery #
- Entities:
NamespaceHeadState,ManifestState,ArtifactBlobState,DistributionState - Source of truth:
NamespaceHeadState,ManifestState,ArtifactBlobState - Write paths: upload blob, write manifest, advance namespace head/tag, distribute/cache artifact
- Read paths: resolve tag/head, fetch manifest, fetch blob
- Sequence:
publisher -> blob store -> manifest store -> namespace head CAS -> clients resolve/fetch - Failure modes:
content uploaded but namespace not advanced- prevention: publish-content-first then head-move discipline
- repair: retry head advance or garbage collect orphaned content
namespace CAS race- prevention: head compare-and-swap, immutable manifests
- repair: retry against current head, materialize conflict/version
- Scaling bottlenecks: hot namespace metadata, popular blob amplification, sync storms mitigation: CDN for blobs, metadata sharding, client-side delta sync
- Connections:
input: build/publish pipelines
output:
A18, deployment systems, storage clients decorator/middleware: immutable delivery substrate under user-facing namespace systems
Load-Bearing Protocols #
Source of truth in protocol:
ArtifactBlobStatedigest
ManifestStateid
NamespaceHeadStateversionadvance_failed
Main protocol:
publisher[upload_blob {ArtifactBlobState.digest absent}] ->
publisher[write_manifest {ManifestState.id absent}] ->
publisher[advance_head {NamespaceHeadState.version = expected_version}] => done
Repair protocol:
gc[collect_orphaned_blob {NamespaceHeadState.advance_failed = true}] => retry
I15 Execution Fleet + Worker Substrate #
- Entities:
WorkerState,CapacityState,PlacementState,ExecutionLeaseState,ExecutionAttemptState,RuntimeSlotState - Source of truth:
CapacityState,PlacementState,ExecutionLeaseState,ExecutionAttemptState - Ownership mode:
attempt-scoped execution on top of lease-backed ownership via
ExecutionLeaseState.owner_id + expiry_atandExecutionAttemptState.attempt_id - Write paths: register/heartbeat worker, place runnable work, start execution, renew lease, persist completion, release capacity, preempt/evict placement
- Read paths: worker/capacity read, placement queue read, execution status read
- Sequence:
invoker/scheduler -> placement service -> capacity + placement truth -> worker -> completion -> reconciler - Failure modes:
duplicate placement- prevention: guarded placement transition, capacity reservation before dispatch
- repair: cancel duplicate attempt, fence stale completion
capacity leak after worker crash- prevention: lease expiry + reconciler, completion path releases slot
- repair: reclaim slot, requeue stranded work
preempted or evicted work continues acting- prevention: lease/token fencing on post-preemption completion and side effects, monotonic placement/attempt version
- repair: cancel stale attempt, reclaim slot, requeue if policy allows
- Scaling bottlenecks: worker saturation, placement contention, cold starts, heartbeat fan-in mitigation: warm pools, hierarchical schedulers, placement partitioning, local heartbeats aggregated upstream
- Connections:
input:
I03,I04,I12output: execution results to many archetypes decorator/middleware: execution substrate under schedulers, scanners, ML/ETL jobs
Load-Bearing Protocols #
Source of truth in protocol:
CapacityStatefree_slotsslot_released
PlacementStateversionpriority
ExecutionLeaseStateowner_idexpiry_at
ExecutionAttemptStateattempt_idstate
Main runtime protocol:
scheduler[place {CapacityState.free_slots > 0 and PlacementState.version = expected_version}] ->
worker[launch {ExecutionLeaseState.owner_id = worker_id}] ->
worker[heartbeat {ExecutionLeaseState.expiry_at >= now}] ->
worker[complete {ExecutionAttemptState.attempt_id = my_attempt}] => done
Repair protocol:
reconciler[reclaim {ExecutionLeaseState.expiry_at < now and ExecutionAttemptState.state != COMPLETED}] ->
scheduler[reassign {CapacityState.free_slots > 0}] => requeue
Preemption protocol:
controller[preempt {PlacementState.priority < incoming_priority}] ->
worker[stop {ExecutionLeaseState.owner_id = worker_id}] ->
scheduler[reassign {CapacityState.slot_released = true}] => done
Third-Layer Mechanism Lens #
- Canonical unit of work:
run, sometimesallocation - Most relevant design-space dimensions:
authorityDelegation,progressVisibility,recovery,coordAuthority - Common mechanism variants: leader-led placement vs worker-pull claim, heartbeat-pull vs lease-presence, requeue vs failover, warm-pool-heavy vs cold-start-heavy fleets
- Dominant invariant families:
at-most-one-current-valid-owner,stale-actor-must-not-commit,every-eligible-item-is-owned-or-completed-or-rediscoverable - Canonical failure signatures: duplicate placement, stale completion, orphaned claim, capacity leak, heartbeat fan-in overload
- Good real-system anchors: Borg, Omega, Kubernetes scheduler, Lambda fleets, Nomad
I16 Key-Scoped Mutable State / Replicated KV #
- Entities:
KeyState,ReplicaVersionState,TTLState,EvictionState - Source of truth:
KeyState,ReplicaVersionState,TTLState - Write paths: put/update key, conditional overwrite, expire key, replicate key, evict key
- Read paths: point get, range/get if supported, replica read
- Sequence:
client -> KV API -> partition leader/replica set -> replication -> reader - Failure modes:
stale read after failover- prevention: quorum/leader read policy, replica version checks
- repair: read repair, anti-entropy replication
ghost expired key- prevention: authoritative TTL semantics, expiry visibility rules
- repair: expiry sweep, tombstone propagation
- Scaling bottlenecks: hot key, leader hotspot, memory pressure mitigation: key splitting, replication-aware caching, bounded value size, hot-key isolation
- Connections:
input: application writes, control-plane updates
output: sessions, counters, small serving truth for many systems
decorator/middleware: can serve as substrate under
I08,I10,I17
Load-Bearing Protocols #
Source of truth in protocol:
KeyStateversion
ReplicaVersionStatetermversiondiverged
TTLStateexpires_at
Main protocol:
client[put {KeyState.version = expected_version}] ->
replica_set[replicate {ReplicaVersionState.term = current_term}] ->
reader[get {ReplicaVersionState.version >= required_version}] => done
Repair protocol:
repair[anti_entropy {ReplicaVersionState.diverged = true}] => replay
I17 Traffic Steering / Request Mediation Plane #
- Entities:
RouteState,BackendHealthState,PolicyState,AffinityState,ConnectionState - Source of truth:
RouteState,BackendHealthState,PolicyState - Write paths: update route/policy, mark backend health, establish/reuse connection state, record mediation decision
- Read paths: request path route lookup, health read, affinity lookup, policy evaluation
- Sequence:
client request -> steering/data plane -> route/policy snapshot -> backend selection -> backend - Failure modes:
routing to dead backend- prevention: health-aware selection, passive + active health, outlier ejection
- repair: retry/re-route, drain bad backend, refresh health state
stale policy enforcement- prevention: versioned policy snapshot, monotonic config apply
- repair: republish config, invalidate bad local snapshot
- Scaling bottlenecks: hot VIP/route, TLS and connection tables, health-check fanout mitigation: connection pooling, tiered gateways, route partitioning, aggregate health signals
- Connections:
input:
I11config,I10membership,I08admission policy output: mediates traffic to almost any serving archetype decorator/middleware: classic middleware plane in front of backends
Load-Bearing Protocols #
Source of truth in protocol:
RouteStateversion
BackendHealthStatestatus
PolicyStateallows
RetryBudgetremaining
Main protocol:
client[resolve_route {RouteState.version >= local_route_version}] ->
proxy[select_backend {BackendHealthState.status = HEALTHY}] ->
proxy[forward {PolicyState.allows = true}] => done
Retry/drain protocol:
proxy[retry {BackendHealthState.status = UNHEALTHY and RetryBudget.remaining > 0}] => retry
Third-Layer Mechanism Lens #
- Canonical unit of work:
route decision - Most relevant design-space dimensions:
roleTopology,authorityDelegation,progressVisibility,reconciliation - Common mechanism variants: local snapshot routing vs remote policy lookup, passive vs active health evaluation, fail-open vs fail-closed mediation, retry-at-proxy vs retry-at-client
- Dominant invariant families:
drift-eventually-corrected,stale-actor-must-not-commit,bounded-retry-no-amplification - Canonical failure signatures: routing black hole, retry storm, stale policy apply, drift not reconciled, reconciliation flap
- Good real-system anchors: Envoy, xDS, Maglev, Tail at Scale, API gateways
I18 Telemetry / Time-Series Pipeline #
- Entities:
SampleState,SeriesState,LabelIndexState,RuleState,AlertState,BlockState - Source of truth:
SampleState,SeriesState,LabelIndexState,RuleState,AlertState - Write paths: ingest sample/event, compact blocks, update label index, evaluate rules, transition alert state
- Read paths: range query, top-k/time-window query, dashboard read, alert state read
- Sequence:
agent/exporter -> ingest tier -> WAL/block store + label index -> rule engine -> dashboard/alert consumer - Failure modes:
dropped sample / late sample skew- prevention: WAL before ack, bounded lateness policy, clock/ordering discipline
- repair: replay WAL, backfill sample ranges if available
alert flapping- prevention: hysteresis, stable evaluation windows, dedup across alertmanager tiers
- repair: suppress noisy alert state, recompute from recent window
- Scaling bottlenecks: ingest throughput, high-cardinality labels, query fanout, compaction I/O mitigation: remote-write sharding, label/cardinality controls, rollups, tiered query storage
- Connections:
input: exporters, logs/events from
I05, worker stats fromI15output: monitoring for all archetypes decorator/middleware: observability overlay on top of every other system
Load-Bearing Protocols #
Source of truth in protocol:
BlockStatewal_available
SeriesStatewindow_complete
AlertStaterule_result
SampleStatelatedropped
Main protocol:
exporter[emit_sample] ->
ingest[append_wal {BlockState.wal_available = true}] ->
rule_engine[evaluate {SeriesState.window_complete = true}] ->
alert_manager[transition_alert {AlertState.rule_result = FIRING}] => done
Repair protocol:
backfill[replay_wal {SampleState.late = true or SampleState.dropped = true}] => retry
I19 Replicated Chunk / Block / File Storage Substrate #
- Entities:
NamespaceState,MetadataState,ChunkPlacementState,ReplicaState,WriterLeaseState,RepairState - Source of truth:
NamespaceState,MetadataState,ChunkPlacementState,ReplicaState,WriterLeaseState - Ownership mode:
lease-backed mutable-writer authority via
WriterLeaseState.owner_id + expiry_at - Write paths: create/update namespace metadata, allocate chunk/block, write chunk, update placement, repair/rebalance replicas
- Read paths: metadata lookup, chunk location read, chunk fetch, replica health read
- Sequence:
client -> metadata service -> chunk placement lookup -> storage nodes -> metadata/placement update -> repair loops - Failure modes:
metadata/data divergence- prevention: metadata-first placement discipline with committed placement updates, write lease on mutable path
- repair: scrub/reconcile metadata against chunk replicas, rebuild placement map
replica under-count after node loss- prevention: replica-count tracking, failure detection, repair scheduling
- repair: copy from surviving replicas, rebalance placement
- Scaling bottlenecks: metadata master/partition hotspot, hot file/chunk, repair bandwidth, small-file amplification mitigation: metadata sharding, chunking, background rebalance windows, compaction/packing for small files
- Connections:
input: publishers, checkpoint writers, sync clients
output: backs
A18, checkpoints, media/file systems decorator/middleware: storage substrate under namespace/versioned systems and execution checkpoints
Load-Bearing Protocols #
Source of truth in protocol:
ChunkPlacementStatereplicas_needed
MetadataStateversion
WriterLeaseStateowner_idexpiry_at
ReplicaStatecount
Main protocol:
client[allocate_chunk {ChunkPlacementState.replicas_needed > 0}] ->
metadata_service[commit_placement {MetadataState.version = expected_version}] ->
storage_node[write_chunk {WriterLeaseState.owner_id = writer_id and WriterLeaseState.expiry_at >= now}] ->
metadata_service[finalize_write {ReplicaState.count >= target}] => done
Repair protocol:
repair_worker[re_replicate {ReplicaState.count < target_replica_count}] => rebalance
I20 Computation / Dataflow / DAG Execution #
- Entities:
GraphState,OperatorState,StageState,TaskAttemptState,InputPartitionState,ShuffleState,CheckpointState,OutputState,WatermarkState - Source of truth:
GraphState,StageState,TaskAttemptState,CheckpointState,OutputState - Ownership mode:
attempt-scoped execution via
TaskAttemptState.attempt_id; graph versioning viaGraphState.version - Write paths: submit graph, plan stages, release ready stage, claim/run task attempt, publish shuffle block, complete checkpoint, commit output
- Read paths: graph/job status read, task status read, checkpoint read, shuffle block read, output/result read, watermark/progress read
- Sequence:
submitter -> planner -> scheduler -> workers -> shuffle/checkpoint services -> output committer -> sink/result store - Failure modes:
stale task attempt commits output- prevention: attempt-scoped temp output, commit guard on current
TaskAttemptState.attempt_id - repair: reject stale commit, rerun task from input, shuffle, or checkpoint
- prevention: attempt-scoped temp output, commit guard on current
checkpoint before output durability- prevention: checkpoint completion waits for durable operator state and sink precommit/commit boundary
- repair: restore from last completed checkpoint, abort ambiguous sink transactions
lost shuffle block- prevention: durable or recomputable shuffle metadata, producer attempt tracking
- repair: rerun producing task or fetch from replicated shuffle storage
watermark advances too far- prevention: bounded-lateness policy and monotonic watermark guards per input
- repair: late-data side output, correction/retraction, or window recomputation if supported
- Scaling bottlenecks: shuffle fanout, hot key/group, checkpoint I/O, scheduler bottleneck, worker saturation, state-store growth mitigation: key salting, partition rebalancing, incremental checkpoints, local recovery, autoscaling, backpressure
- Connections:
input:
I05logs/streams,I19files/chunks,I04frontier scans output:I06projections/indexes,I18telemetry aggregates, materialized datasets decorator/middleware: usesI15for worker placement andI11for graph/config rollout
Load-Bearing Protocols #
Source of truth in protocol:
GraphStateversionoperatorsedges
StageStatedependenciesstatus
TaskAttemptStateattempt_idowner_id
CheckpointStatecheckpoint_idcompleted
OutputStatecommitted_version
WatermarkStateevent_time_frontier
Main protocol:
submitter[submit_graph {GraphState.version = new}] ->
planner[plan_stages {GraphState.version = current}] ->
scheduler[release_stage {all upstream StageState.status = SUCCEEDED}] ->
worker[run_task {TaskAttemptState.attempt_id = current}] ->
shuffle_service[publish_block {TaskAttemptState.attempt_id = current}] ->
checkpoint_coordinator[complete_checkpoint {CheckpointState.completed = true}] ->
output_committer[commit_output {all partitions complete and attempt_id = current}] => done
Repair protocol:
scheduler[detect_failed_attempt {heartbeat expired or shuffle block missing}] ->
scheduler[reissue_task {TaskAttemptState.attempt_id = new}] ->
worker[recompute_from_input_or_checkpoint] ->
output_committer[reject_stale_commit {attempt_id != current}]
Concrete anchors:
MapReduce: batch DAG with input splits, map stage, shuffle/sort barrier, reduce stage, and output files.Flink: streaming DAG with operators, keyed state, checkpoint barriers, watermarks, and checkpoint-gated sink commits.
I21 Trust Boundary / Cryptographic Proof Substrate #
Use this archetype when the hard part is not the hot-path authorization decision itself, but the substrate that makes trust decisions verifiable across domains.
- Entities:
PrincipalState,IdentityBindingState,KeyMaterialState,GrantState,SignedStatementState,RevocationState,AttestationState,AuditLogState,TrustBundleState - Source of truth:
PrincipalState,IdentityBindingState,KeyMaterialState,GrantState,RevocationState,TrustBundleState - Ownership mode:
delegated authority via issuer-controlled keys and signed statements; freshness bounded by
RevocationStateandTrustBundleState.version - Write paths: bind identity, issue credential/claim, rotate key, publish trust bundle, revoke grant/key, record signed statement, append audit event, verify attestation
- Read paths: verify identity, fetch trust bundle, check revocation, validate signed statement, read audit/provenance, evaluate attestation evidence
- Sequence:
issuer -> identity/key truth -> signed credential/statement -> verifier -> revocation/trust-bundle check -> audit/transparency - Failure modes:
stale credential accepted after revocation- prevention: short lifetimes, revocation version checks, trust-bundle freshness
- repair: publish revocation, force re-issuance, invalidate cached decisions
wrong principal binding- prevention: proof-of-possession, audience binding, issuer constraints
- repair: revoke binding, rotate affected keys, audit impacted statements
forged or replayed signed statement- prevention: nonce/timestamp/audience binding, signature verification, replay cache for sensitive writes
- repair: add compromised signer to revocation state, regenerate statements
issuer/operator compromise- prevention: threshold signing, transparency log, separation of duties
- repair: key ceremony, log audit, rotate trust roots
- Scaling bottlenecks: verifier hot path, revocation fanout, trust-bundle propagation, audit log write volume, signing service throughput mitigation: local trust-bundle cache, short-lived credentials, append-only audit log partitioning, signer sharding/HSM pools
- Connections:
input: principals, workloads, artifacts, devices, human users
output: protects
I01,I05,I11,I14,I15,I16,I17,I20decorator/middleware: trust substrate across every cross-domain boundary
Load-Bearing Protocols #
Source of truth in protocol:
PrincipalStateprincipal_idstatus
IdentityBindingStatesubjectissueraudience
KeyMaterialStatekey_idpublic_keyvalid_fromvalid_until
RevocationStaterevoked_key_idsrevoked_grantsversion
TrustBundleStateversionissuer_roots
SignedStatementStatestatement_idsignaturesubjectclaims
Main protocol:
issuer[bind_identity {PrincipalState.status = ACTIVE}] ->
issuer[issue_statement {KeyMaterialState.valid_until > now}] ->
verifier[verify_signature {TrustBundleState.version fresh}] ->
verifier[check_revocation {RevocationState.version >= required_version}] ->
audit[append_verification {SignedStatementState.statement_id = verified}] => accepted
Repair/control protocol:
security_controller[revoke {compromise_detected = true}] ->
publisher[publish_trust_bundle {TrustBundleState.version = next}] ->
verifier[reject_cached_statement {statement.key_id in RevocationState.revoked_key_ids}] => repaired
Substrate Cleavage #
I21 is value-separable because identity binding, key custody, statement issuance, verification, revocation, audit, and attestation are different trust boundaries.
Compact flow:
establish principal -> bind key/identity -> issue signed claim -> distribute trust roots
-> verify claim -> check freshness/revocation -> audit/prove
| Substrate slice | Value it provides | Truth / state |
|---|---|---|
| Principal registry | names the actor or workload | PrincipalState |
| Identity binding | binds subject to key, workload, user, device, or artifact | IdentityBindingState |
| Key custody / rotation | controls signing authority and key lifetime | KeyMaterialState, KeyRotationState |
| Grant / claim issuance | creates verifiable statements | GrantState, SignedStatementState |
| Trust-bundle distribution | tells verifiers which issuers/keys to trust | TrustBundleState, RootVersionState |
| Verification hot path | validates signatures, audience, expiry, and issuer | VerificationState, DecisionCacheState |
| Revocation / freshness | invalidates compromised or expired authority | RevocationState, FreshnessState |
| Audit / transparency | makes issuance and verification accountable | AuditLogState, TransparencyLogState |
| Attestation | proves runtime, build, device, or environment properties | AttestationState, EvidenceState |
Canonical substrate APIs:
| Boundary | API shape |
|---|---|
| Principal API | create_principal, disable_principal |
| Binding API | bind_identity(subject, public_key, issuer) |
| Key API | rotate_key, publish_jwks_or_bundle, retire_key |
| Issuance API | issue_statement(subject, claims, audience, ttl) |
| Verification API | verify_statement(statement, audience, freshness_requirement) |
| Revocation API | revoke_key, revoke_grant, publish_revocation(version) |
| Audit API | append_audit_event, query_statement_lineage |
| Attestation API | submit_evidence, verify_attestation(policy) |
Core invariants:
- A verifier must accept only statements signed by a trusted, non-revoked issuer key.
- A statement must be bound to the intended subject and audience.
- Revocation and trust-bundle versions must be fresh enough for the risk boundary.
- Key rotation must not create an unbounded window where old and new authority both over-admit.
- Audit/provenance records must be append-only or tamper-evident for disputed decisions.
Common cleavage examples:
| System | Slice it teaches |
|---|---|
SPIFFE / SPIRE | workload identity and trust-bundle distribution |
OIDC / JWKS | signed token issuance and verifier key distribution |
Sigstore | artifact identity, signing, transparency, and keyless provenance |
TUF | signed metadata, key rotation, and update trust thresholds |
SLSA / in-toto | signed provenance and build/deploy attestation |
Practical Grouping #
- Control and ownership:
I01,I02,I10,I11 - Progress and execution:
I03,I04,I05,I12,I15,I20 - Serving and mediation:
I07,I08,I17 - Storage and namespace:
I14,I16,I19 - Search/derived/observability:
I06,I18,I20 - Identity and trust:
I09,I21
Use the dominant one first, then compose the rest as:
- input archetype
- output archetype
- decorator/middleware
- substrate
Best Canonical Study Object #
Not every archetype is best learned through the same kind of object.
Use the strongest study object for the shape:
protocol specwhen the archetype is protocol-shapedseminal paperwhen the archetype is architecture or control-loop shapedmechanism familywhen correctness comes mainly from a reusable mechanismimplementation lineagewhen the archetype is best learned through real system families
| Archetype | Best study object type | Best canonical study object |
|---|---|---|
I01 Coordination / Consensus Metadata | seminal paper | Raft |
I02 Claim / Lease / Exclusive Ownership | mechanism family | lease + fencing token |
I03 Due-Time Release + Claimable Run | mechanism family | timing wheel / delayed queue / due-index scanner |
I04 Frontier Scan + Claimable Run | mechanism family | frontier + checkpoint + resumable scan |
I05 Append Log + Consumer Progress | protocol spec | Kafka protocol |
I06 Projection / Index / Search Pipeline | implementation lineage | CDC/projector/indexer systems |
I07 Cache / Origin Projection / Edge Delivery | protocol spec | HTTP Caching |
I08 Traffic Shaping / Admission Control | mechanism family | token bucket / leaky bucket / concurrency limiter / fair queuing |
I09 Sequence / Identifier Generation | implementation lineage | Snowflake-style ID generation + range leasing |
I10 Membership / Presence / Registry | seminal paper | SWIM |
I11 Control Plane + Snapshot Distribution | protocol spec | xDS |
I12 Workflow + External Side Effect | mechanism family | transactional outbox + saga/reconciliation |
I13 Shared Subject Coordination | mechanism family | OT / CRDT / central sequencer |
I14 Immutable Artifact Namespace + Delivery | protocol spec | OCI Distribution Spec |
I15 Execution Fleet + Worker Substrate | seminal paper / implementation lineage | Borg |
I16 Key-Scoped Mutable State / Replicated KV | implementation lineage | Dynamo / Bigtable / FoundationDB / etcd-like KV families |
I17 Traffic Steering / Request Mediation Plane | implementation lineage | Envoy/xDS + load-balancer retry/outlier-ejection model |
I18 Telemetry / Time-Series Pipeline | implementation lineage | Prometheus/TSDB design |
I19 Replicated Chunk / Block / File Storage Substrate | seminal paper / implementation lineage | GFS |
I20 Computation / Dataflow / DAG Execution | implementation lineage | MapReduce / Flink / Spark / Beam |
I21 Trust Boundary / Cryptographic Proof Substrate | protocol / implementation lineage | SPIFFE / SPIRE; adjacent: Sigstore, TUF, SLSA / in-toto |