Skip to main content
  1. System Design Components/

Infra Archetype NFR Practice Worksheet

Infra Archetype NFR Practice Worksheet #

Use this as the infra-first counterpart to the broader mixed prompt worksheet.

This note is intentionally organized by I01-I21, not by product prompt.

The goal is:

  1. start from the archetype
  2. calculate the first load-bearing variables
  3. identify the first scaling bottleneck
  4. then run a short cross-cutting NFR pass

This should be used with:


How To Use #

For a prompt:

  1. classify to the dominant infra archetype
  2. steal the sample-prompt row below
  3. calculate the listed variables before drawing boxes
  4. state the first pressure check out loud
  5. finish with the cross-cutting questions in the last column

If the system is hybrid:

  • pick the dominant row first
  • then import one adjacent row
  • do not average them into a generic worksheet

Archetype Worksheet #

ArchetypeSample promptsFirst NFRs to calculateStarter formulasFirst pressure checkCross-cutting follow-up
I01 Coordination / Consensus Metadatametadata service, leader election service, config storequorum write TPS, watch fanout, election rate, session renew ratequorum_write_RPS = metadata_mutations_per_sec; watch_delivery_RPS = watchers*updates_per_sec; renew_RPS = active_sessions/lease_secondsquorum RTT or hot metadata keyfail closed or open under quorum loss; max stale watch gap; compaction/replay window; blast radius of leader failover
I02 Claim / Lease / Exclusive Ownershiplock service, lease manager, shard owner electionclaim TPS, renew rate, hot-key contention, stale-holder fence checksclaim_RPS = claim_attempts_per_sec; renew_RPS = active_leases/lease_seconds; contention_RPS = claim_RPS*hot_key_sharehot claim key or renew stormfence token required at which downstream write; duplicate-owner damage if fencing fails; reclaim delay budget; false-expiry tolerance
I03 Due-Time Release + Claimable Runcron scheduler, delayed job queue, reminder servicedue burst RPS, runnable backlog, lateness SLA, claim TPSdue_RPS = jobs_due_in_peak_window/window_seconds; backlog_seconds = runnable_queue_depth/worker_claim_rate; claim_RPS = due_RPS*(1+retry_factor)due-time burstiness or claim backlogacceptable fire-time delay; duplicate run tolerance; clock-skew budget; replay after scheduler crash
I04 Frontier Scan + Claimable Runweb crawler, batch scanner, ETL sweep, compliance scannerfrontier claim rate, scan coverage rate, checkpoint cadence, rediscovery rateclaim_RPS = workers*batches_per_sec; coverage_seconds = frontier_items/scan_rate; checkpoint_WPS = workers/checkpoint_interval_secfrontier contention or checkpoint lagprogress cursor truth; resumability after crash; duplicate-scan budget; freshness vs scan cost
I05 Append Log + Consumer Progresspub/sub, event bus, durable queue, commit logappend throughput, partition hotness, consumer lag, replay windowappend_Bps = append_RPS*avg_record_bytes; partition_write_RPS = append_RPS*hot_partition_share; lag_seconds = consumer_backlog/consume_ratehot partition or replay lagordering scope per partition or key; commit-progress durability; replay cost after consumer loss; backpressure on slow consumers
I06 Projection / Index / Search Pipelinesearch index, materialized view, feed projector, read model buildersource ingest RPS, projection fanout, index lag, rebuild throughputprojection_WPS = source_WPS*avg_projection_updates; lag_seconds = queued_updates/projector_rate; rebuild_seconds = corpus_bytes/rebuild_Bpswrite amplification or projector lagrebuildability from source truth; freshness bound for queries; backfill isolation from live traffic; correctness under out-of-order events
I07 Cache / Origin Projection / Edge Deliverydistributed cache, CDN metadata cache, API response cacheread RPS, hit ratio, miss storm size, invalidation rate, memory footprintorigin_RPS = read_RPS*(1-hit_ratio); miss_burst_RPS = read_RPS*simultaneous_expiry_share; memory_bytes = keys*avg_entry_byteshot key or cache miss stormstale-read tolerance; invalidation propagation bound; stampede control; fail-open vs fail-through on origin error
I08 Traffic Shaping / Admission Controlrate limiter, quota manager, overload shedder, concurrency limiterevaluator RPS, budget update rate, hot-tenant rate, queue/admit deptheval_RPS = requests_per_sec; budget_WPS = eval_RPS*enforced_fraction; hot_tenant_RPS = eval_RPS*top_tenant_share; queue_wait_seconds = queued_requests/admit_rateevaluator hot path or hot budget keyfairness unit per tenant/user/request class; fail-open vs fail-closed; budget propagation lag; overload behavior under partial outages
I09 Sequence / Identifier Generationsnowflake service, monotonic ticket dispenser, order-number allocatorID allocation RPS, block lease rate, worker-id pool pressure, skew windowid_RPS = ids_per_sec; block_lease_RPS = id_RPS/block_size; worker_pool_util = active_generators/worker_id_pool; rollback_risk_window = max_clock_skew_secallocator hotspot or clock rollbackuniqueness vs monotonicity requirement; gap tolerance; allocator failover semantics; epoch rollover handling
I10 Membership / Presence / Registryservice registry, presence system, node registryregistration rate, heartbeat fan-in, false-death window, watch fanoutheartbeat_RPS = members/heartbeat_interval_sec; expiry_scan_RPS = members/expiry_scan_interval_sec; watch_push_RPS = watchers*membership_changes_per_secheartbeat fan-in or watch fanoutfalse-death budget; lookup freshness bound; ghost-member cleanup; anti-entropy after missed watches
I11 Control Plane + Snapshot Distributionfeature flag platform, config distribution, xDS-like control planeconfig mutate RPS, publication fanout, apply lag, snapshot bytesfanout_RPS = targets*updates_per_sec; update_Bps = fanout_RPS*avg_snapshot_bytes; convergence_seconds = targets/apply_ack_rateconfig fanout or slow apply convergencemonotonic version rule; rollback budget; partial rollout detection; target behavior on control-plane partition
I12 Workflow + External Side Effectpayment workflow, webhook delivery engine, provisioning workflowtransition TPS, side-effect RPS, retry rate, stuck-workflow cardinalitytransition_RPS = active_entities*transitions_per_day/86400; effect_RPS = transition_RPS*(1+retry_factor); stuck_items = workflows_in_nonterminal_state*stuck_fractionside-effect latency or retry amplificationidempotency surface; reconciliation scan cadence; exactly-once vs at-least-once effect semantics; poison workflow handling
I13 Shared Subject Coordinationcollaborative editor, whiteboard, shared cursor stateops per subject, concurrent editors, broadcast fanout, snapshot cadenceop_RPS = editors_per_subject*ops_per_editor_sec; broadcast_RPS = op_RPS*active_subscribers; replay_ops = ops_since_snapshotsingle-subject coordinator or replay costordering model per subject; merge/conflict semantics; late join replay budget; subject hotspot isolation
I14 Immutable Artifact Namespace + Deliveryartifact registry, blob/object store namespace, image/package distributionpublish RPS, fetch throughput, metadata RPS, GC backlogpublish_RPS = artifacts_published_per_sec; fetch_Bps = downloads_per_sec*avg_artifact_bytes; gc_backlog_bytes = unreferenced_bytes_pending_collectionmetadata namespace hotspot or origin bandwidthimmutability guarantee; rollback by pointer or republish; retention/GC safety; cross-region replication lag
I15 Execution Fleet + Worker Substrateserverless runtime, CI runner fleet, GPU job fleet, remote executionarrival RPS, concurrent executions, worker-slot count, heartbeat rate, cold-start rateconcurrency = arrival_RPS*avg_run_seconds; slot_count = concurrency/utilization_target; heartbeat_RPS = active_workers/heartbeat_interval_sec; cold_start_RPS = launches_per_sec*cold_start_fractionworker saturation, placement contention, or heartbeat fan-inplacement truth and fencing; preemption semantics; reclaim lag after worker loss; admission vs queueing under overload
I16 Key-Scoped Mutable State / Replicated KVkey-value store, session store, profile store, cart storeread RPS, write RPS, hot-key share, replication bandwidth, compaction/write amplificationrepl_Bps = write_RPS*avg_write_bytes*replication_factor; hot_key_RPS = read_RPS*top_key_share; storage_write_amp = logical_write_Bps/physical_write_Bpsleader hotspot, hot key, or replication pressureconsistency level per key; failover read/write semantics; anti-entropy/repair budget; compaction impact on tail latency
I17 Traffic Steering / Request Mediation PlaneAPI gateway, load balancer, service mesh router, WAFingress RPS, active connections, route table size, health-check fanout, retry amplificationconn_count = clients*avg_open_conns; health_RPS = backends*checks_per_sec; retry_RPS = ingress_RPS*retry_fraction; policy_eval_RPS = ingress_RPS*rules_checked_per_requesthot VIP, connection table pressure, or retry amplificationroute-config freshness; drain semantics; fail-open vs fail-closed policy checks; tail latency under retries and outlier ejection
I18 Telemetry / Time-Series Pipelinemetrics system, alerting platform, logs pipeline, infra monitoringingest RPS, active series/cardinality, rule eval fanout, query scan bytes, retention bytesingest_RPS = emitters*samples_per_sec; active_series = emitters*metrics_per_emitter*label_cardinality_factor; query_scan_Bps = queried_points_per_sec*bytes_per_point; retention_bytes = ingest_Bps*retention_secondshigh-cardinality blowup, ingest fan-in, or query lagtelemetry must not destabilize workload; sampling/drop policy under overload; alert delay budget; retention vs cost tradeoff
I19 Replicated Chunk / Block / File Storage Substratedistributed file system, block store, chunk store, object-storage substratemetadata ops RPS, chunk throughput, repair bandwidth, placement skew, rebuild timechunk_Bps = io_ops_per_sec*avg_chunk_bytes; repair_Bps = lost_replica_bytes/repair_window_sec; rebuild_seconds = lost_bytes/effective_repair_Bps; metadata_RPS = namespace_ops_per_secmetadata hotspot, repair bandwidth, or hot chunkreplica placement policy; durability target after correlated failure; degraded-read performance; background repair budget vs foreground IO
I20 Computation / Dataflow / DAG ExecutionMapReduce/Spark/Flink-like engine, DAG scheduler, streaming dataflow engineinput throughput, task concurrency, shuffle bytes, checkpoint bytes, output commit rate, watermark lagtask_concurrency = input_partitions*avg_parallelism_per_partition; shuffle_Bps = records_per_sec*avg_record_bytes*shuffle_fanout; checkpoint_Bps = state_bytes/checkpoint_interval_sec; watermark_lag_seconds = event_time_now - watermark_timeshuffle pressure, checkpoint I/O, hot key, or scheduler bottleneckstale attempt commit guard; exactly-once sink boundary; backpressure behavior; recovery time from latest checkpoint
I21 Trust Boundary / Cryptographic Proof Substrateworkload identity platform, artifact signing/provenance, trust-bundle distribution, revocation serviceverification RPS, signing RPS, trust-bundle fanout, revocation freshness, audit write throughputverify_RPS = protected_requests_per_sec; sign_RPS = issued_statements_per_sec; trust_bundle_fanout_RPS = verifiers*bundle_updates_per_sec; audit_WPS = verification_events_per_sec*audit_sample_rateverifier hot path, stale revocation, or trust-bundle propagation lagcredential TTL; revocation freshness SLA; issuer compromise blast radius; audit retention and tamper evidence

Cross-Cutting NFR Pass #

After you do the archetype row, force this short pass.

1. Correctness / Consistency #

Ask:

  • what invariant is load-bearing here?
  • what stale actor, stale version, or duplicate effect is unacceptable?
  • what is the minimum consistency scope: key, partition, subject, workflow, quorum?

2. Availability / Failure Policy #

Ask:

  • should this path fail closed or fail open under uncertainty?
  • what is allowed to degrade independently?
  • what is the smallest useful partial service mode?

3. Durability / Recoverability #

Ask:

  • what acknowledged state must survive crash?
  • what can be rebuilt from source truth or replay?
  • what is the acceptable loss window for transient buffers, caches, snapshots, or telemetry?

4. Tail Latency / Freshness #

Ask:

  • which path needs low p95 or p99?
  • where is bounded staleness acceptable?
  • is freshness measured in milliseconds, seconds, minutes, or rollout waves?

5. Isolation / Backpressure #

Ask:

  • can one tenant, hot key, or hot subject overload others?
  • where does backpressure appear first?
  • what is the admission or shedding rule when capacity is exhausted?

6. Cost / Repair Budget #

Ask:

  • what background work grows with success: replay, compaction, repair, scan, watch fanout, checkpointing?
  • what budget do you reserve for non-foreground work?
  • what gets worse first when repair falls behind?

Quick Archetype-to-Cross-Cutting Emphasis #

Use these as the first extra questions after the main row.

ArchetypeCross-cutting emphasis
I01correctness, quorum availability, watch freshness
I02fencing correctness, reclaim lag, fail-closed semantics
I03lateness SLA, duplicate-run tolerance, backlog recovery
I04checkpoint durability, coverage freshness, resumability
I05ordering, lag, replay cost, slow-consumer backpressure
I06freshness, rebuildability, backfill isolation
I07stale tolerance, stampede control, origin protection
I08fairness, overload policy, fail-open vs fail-closed
I09uniqueness, monotonicity, allocator failover
I10false death, lookup freshness, watch recovery
I11monotonic apply, rollout safety, rollback speed
I12idempotency, reconciliation, poison-work handling
I13per-subject ordering, hotspot isolation, replay budget
I14immutability, retention/GC safety, fetch bandwidth
I15placement fencing, cold starts, reclaim lag, admission policy
I16consistency level, replication lag, compaction cost
I17route freshness, retry amplification, drain behavior
I18cardinality control, alert delay, drop policy under overload
I19durability target, repair bandwidth, degraded-mode performance

Variable Dimensions And Estimation Rules #

Use this section when the interviewer gives you only partial inputs.

For each archetype:

  • variable dimensions tell you what dimensional form the variable should take
  • estimation rules tell you how to derive a plausible estimate quickly

I01 Coordination / Consensus Metadata #

  • Scale units:
    • quorum_write_RPS: writes/second
    • watch_delivery_RPS: watch events/second
    • renew_RPS: renewals/second
    • election_rate: elections/hour or elections/day
  • Decision rules:
    • start from number of operators, controllers, or automation jobs that mutate metadata
    • if prompt says N clients watch config or membership, assume watch fanout is N * updates_per_sec
    • if lease/session TTL is given, derive renew rate as active_sessions / TTL_seconds
    • if failover is said to be rare, model election rate as low steady-state background plus burst during incidents

I02 Claim / Lease / Exclusive Ownership #

  • Scale units:
    • claim_RPS: claim attempts/second
    • renew_RPS: renewals/second
    • contention_RPS: contended claim attempts/second on hottest key or shard
    • reclaim_delay: seconds
  • Decision rules:
    • estimate total claim attempts from workers, jobs, or contenders entering the system per second
    • if TTL is provided, renew rate is active_leases / TTL_seconds
    • if prompt says a small subset of resources are popular, apply a hot-key share to total claim volume
    • set reclaim delay from lease TTL + detection lag + reassignment lag

I03 Due-Time Release + Claimable Run #

  • Scale units:
    • due_RPS: newly due jobs/second
    • backlog_seconds: seconds of runnable backlog
    • claim_RPS: claims/second
    • lateness_SLA: seconds or minutes
  • Decision rules:
    • convert jobs/day or jobs/hour into average rate, then separately estimate the peak due bucket
    • if due times cluster on minute boundaries, model a peak-to-average multiplier rather than using average only
    • compute backlog in time, not just count, using queue_depth / effective_claim_or_execute_rate
    • if interviewer says reminders should feel near real time, keep lateness in single-digit seconds; if it is batch, use minutes

I04 Frontier Scan + Claimable Run #

  • Scale units:
    • claim_RPS: frontier claims/second
    • coverage_seconds: seconds or hours to revisit the full frontier
    • checkpoint_WPS: checkpoints/second
    • rediscovery_rate: items rediscovered/second or rediscovery ratio
  • Decision rules:
    • derive claim rate from worker count and batch size/frequency
    • derive coverage time from total frontier items / effective scan rate
    • checkpoint frequency should scale with amount of work you can afford to replay after crash
    • if prompt emphasizes freshness, push coverage interval down; if it emphasizes cost, allow longer revisit intervals

I05 Append Log + Consumer Progress #

  • Scale units:
    • append_Bps: bytes/second
    • append_RPS: records/second
    • partition_write_RPS: writes/second on hottest partition
    • lag_seconds: seconds
    • replay_window: hours or days
  • Decision rules:
    • start from producer count and per-producer event rate
    • estimate bytes separately from record count because batching/compression can change bottlenecks
    • if key distribution is skewed, apply a hot-partition share rather than assuming uniform spread
    • use consumer backlog / consumer throughput to express lag in seconds since that maps better to SLAs

I06 Projection / Index / Search Pipeline #

  • Scale units:
    • source_WPS: source writes/second
    • projection_WPS: derived writes/second
    • lag_seconds: seconds or minutes
    • rebuild_seconds: minutes, hours, or days
  • Decision rules:
    • identify the canonical source-of-truth mutation rate first
    • multiply by average number of downstream index or projection updates per source mutation
    • if prompt involves fanout by follower/subscriber/tag, model projection amplification explicitly
    • set rebuild time from total corpus size and realistic sustained rebuild bandwidth, not peak hardware bandwidth

I07 Cache / Origin Projection / Edge Delivery #

  • Scale units:
    • read_RPS: reads/second
    • origin_RPS: cache misses/second reaching origin
    • miss_burst_RPS: miss burst requests/second
    • invalidation_rate: invalidations/second
    • memory_bytes: bytes
  • Decision rules:
    • derive read rate from active clients times per-client request rate
    • derive origin load from read_RPS * (1 - hit_ratio)
    • if TTL expiry is synchronized, estimate miss burst separately from steady-state miss rate
    • size memory from hot working set, not total corpus, unless prompt says full-cache mirror

I08 Traffic Shaping / Admission Control #

  • Scale units:
    • eval_RPS: decisions/second
    • budget_WPS: budget mutations/second
    • hot_tenant_RPS: requests/second for hottest tenant
    • queue_wait_seconds: seconds
  • Decision rules:
    • start with total incoming requests on the guarded path
    • only a fraction of requests may mutate budget state; separate evaluation from mutation
    • if prompt is multi-tenant, explicitly estimate top-tenant share
    • if system queues before admit/reject, express overload in queue-wait time rather than raw queue length

I09 Sequence / Identifier Generation #

  • Scale units:
    • id_RPS: IDs/second
    • block_lease_RPS: block leases/second
    • worker_pool_util: fraction or percent
    • rollback_risk_window: milliseconds or seconds
  • Decision rules:
    • derive ID rate from operations that need new IDs, not from all requests
    • if using range leasing, convert allocation rate into coordinator lease rate via id_RPS / block_size
    • if generators need unique worker IDs, compare active generators to available ID slots
    • if prompt requires monotonicity, ask or assume a maximum tolerable clock rollback window

I10 Membership / Presence / Registry #

  • Scale units:
    • heartbeat_RPS: heartbeats/second
    • expiry_scan_RPS: expiry checks/second
    • watch_push_RPS: membership updates delivered/second
    • false_death_window: seconds
  • Decision rules:
    • derive heartbeat rate from member count and heartbeat interval
    • if expiry scanning is centralized, estimate check volume per scan interval
    • watch push load is watchers * meaningful membership changes per second
    • set false-death window from heartbeat interval, missed-heartbeat threshold, and network jitter budget

I11 Control Plane + Snapshot Distribution #

  • Scale units:
    • fanout_RPS: target updates/second
    • update_Bps: bytes/second distributed
    • convergence_seconds: seconds or minutes
    • config_mutate_RPS: control writes/second
  • Decision rules:
    • derive mutate rate from humans, deploy controllers, or automation systems changing truth
    • turn rollout size into fanout by dividing targets by desired rollout interval
    • multiply target update rate by average snapshot or delta size to get distribution bandwidth
    • derive convergence from fleet size and realistic ack/apply throughput, not ideal broadcast speed

I12 Workflow + External Side Effect #

  • Scale units:
    • transition_RPS: state transitions/second
    • effect_RPS: side-effect attempts/second
    • retry_rate: retries/second or retry fraction
    • stuck_items: count
  • Decision rules:
    • estimate transition rate from active entities and transitions per entity per unit time
    • side-effect rate is usually transitions * (1 + retry_factor) rather than equal to transition rate
    • estimate stuck-work count from timeout rate, provider failure rate, or reconciliation lag
    • if prompt is payment/provisioning/booking, assume idempotency and retries are first-class, not edge cases

I13 Shared Subject Coordination #

  • Scale units:
    • op_RPS: operations/second per subject
    • broadcast_RPS: fanout deliveries/second
    • replay_ops: operations
    • snapshot_cadence: seconds, minutes, or ops between snapshots
  • Decision rules:
    • derive per-subject op rate from concurrent editors times per-editor operation frequency
    • fanout is per-subject op rate times active subscribers, not total system users
    • size replay budget by the maximum join latency or reconnect cost you can tolerate
    • if prompt has hot rooms/docs/canvases, model hottest-subject load separately from average subject load

I14 Immutable Artifact Namespace + Delivery #

  • Scale units:
    • publish_RPS: publishes/second
    • fetch_Bps: bytes/second
    • metadata_RPS: metadata ops/second
    • gc_backlog_bytes: bytes
  • Decision rules:
    • estimate publish rate from build, release, or upload frequency
    • estimate fetch throughput from download concurrency and average artifact size
    • separate metadata ops from bulk bytes because metadata hotspots often break first
    • derive GC backlog from publish churn times retention window before unreachable data can be removed

I15 Execution Fleet + Worker Substrate #

  • Scale units:
    • arrival_RPS: executions/second arriving
    • concurrency: concurrent running executions
    • slot_count: worker slots
    • heartbeat_RPS: heartbeats/second
    • cold_start_RPS: cold starts/second
  • Decision rules:
    • derive arrival rate from incoming jobs, invocations, or tasks
    • derive concurrency using Little’s Law: arrival_rate * average_run_time
    • derive slot count by dividing concurrency by target utilization, not by theoretical max
    • estimate heartbeat fan-in from active workers and interval
    • if prompt is bursty serverless or CI, estimate cold starts separately from average launches

I16 Key-Scoped Mutable State / Replicated KV #

  • Scale units:
    • read_RPS: reads/second
    • write_RPS: writes/second
    • hot_key_RPS: hottest-key reads or writes/second
    • repl_Bps: replication bytes/second
    • storage_write_amp: ratio
  • Decision rules:
    • estimate read/write rate from active clients and per-client operation rate
    • apply a skew factor for hottest keys rather than assuming uniform traffic
    • derive replication bandwidth from write rate, average mutation size, and replica count
    • if storage engine details matter, separate logical writes from physical storage amplification

I17 Traffic Steering / Request Mediation Plane #

  • Scale units:
    • ingress_RPS: requests/second
    • conn_count: active connections
    • health_RPS: health checks/second
    • retry_RPS: retries/second
    • policy_eval_RPS: policy evaluations/second
  • Decision rules:
    • derive ingress from clients or upstream services and per-client request rate
    • derive active connections from open-session model, not from request rate alone
    • derive health-check load from backend count and health-check cadence
    • estimate retries as a fraction of ingress under normal and degraded conditions separately

I18 Telemetry / Time-Series Pipeline #

  • Scale units:
    • ingest_RPS: samples or events/second
    • active_series: count
    • query_scan_Bps: bytes/second scanned
    • retention_bytes: bytes
    • alert_delay: seconds
  • Decision rules:
    • start from emitter count and per-emitter metrics/log events rate
    • derive active series from metric names times label combinations, not just host count
    • query scan cost should be estimated from points touched per query and concurrent dashboards/alerts
    • retention bytes comes from ingest throughput times retention duration after compression assumptions

I19 Replicated Chunk / Block / File Storage Substrate #

  • Scale units:
    • metadata_RPS: metadata operations/second
    • chunk_Bps: chunk/block bytes/second
    • repair_Bps: repair bytes/second
    • rebuild_seconds: seconds, hours, or days
    • placement_skew: fraction or percent imbalance
  • Decision rules:
    • estimate metadata rate from namespace ops like create/open/list/rename/attach
    • estimate chunk throughput from foreground IO volume and average block/chunk size
    • derive repair bandwidth from durability target and time-to-repair requirement after replica loss
    • if prompt mentions rack/AZ awareness, explicitly model placement skew and correlated failure domains

Capacity Estimation From NFR Targets #

This is the sizing section.

Use it when the interviewer asks:

  • how many nodes or shards do you need?
  • how much bandwidth or storage do you need?
  • how many workers, consumers, or replicas does the NFR imply?

For each archetype:

  • capacity units tell you what you are sizing
  • capacity rules tell you how to derive a first-cut number from the NFR target

I01 Coordination / Consensus Metadata #

  • Capacity units:
    • quorum groups
    • metadata partitions
    • watch fanout replicas
    • network bandwidth for watch delivery
  • Capacity rules:
    • required_quorum_groups = ceil(quorum_write_RPS / sustainable_write_RPS_per_group)
    • required_watch_replicas = ceil(watch_delivery_RPS / sustainable_watch_events_per_replica)
    • if watch fanout load dominates but write rate is low, size separate fanout replicas rather than more quorum writers
    • if one metadata domain exceeds per-group latency or write target, split into another partition rather than stretching one quorum group

I02 Claim / Lease / Exclusive Ownership #

  • Capacity units:
    • lease-service partitions
    • renew-handling replicas
    • reclaim workers
  • Capacity rules:
    • required_partitions = ceil(claim_RPS / sustainable_claim_RPS_per_partition)
    • required_renew_capacity = ceil(renew_RPS / sustainable_renew_RPS_per_replica)
    • required_reclaim_workers = ceil(expired_claims_per_sec / reclaims_per_worker_sec)
    • if hottest-key contention exceeds single-partition capacity, no amount of replica scaling fixes it; size around smaller claim domains instead

I03 Due-Time Release + Claimable Run #

  • Capacity units:
    • releaser/scanner workers
    • runnable queue partitions
    • execution workers
  • Capacity rules:
    • required_releasers = ceil(due_RPS / jobs_released_per_worker_sec)
    • required_workers = ceil(due_RPS / jobs_completed_per_worker_sec)
    • required_queue_partitions = ceil(claim_RPS / sustainable_claim_RPS_per_partition)
    • if lateness SLA is L, keep backlog_seconds < L; otherwise add releaser throughput or execution throughput depending on which stage is saturated

I04 Frontier Scan + Claimable Run #

  • Capacity units:
    • scan workers
    • frontier shards
    • checkpoint write throughput
  • Capacity rules:
    • required_scan_workers = ceil(frontier_items / target_coverage_seconds / items_scanned_per_worker_sec)
    • required_frontier_shards = ceil(claim_RPS / sustainable_claim_RPS_per_shard)
    • required_checkpoint_WPS = workers / checkpoint_interval_sec
    • if target coverage interval tightens, worker count scales roughly inversely with allowed revisit time

I05 Append Log + Consumer Progress #

  • Capacity units:
    • partitions
    • brokers
    • consumer workers
    • storage bytes
  • Capacity rules:
    • required_partitions = max(ceil(append_RPS / target_records_per_partition_sec), ceil(append_Bps / target_bytes_per_partition_sec))
    • required_brokers = ceil(required_partitions / target_partitions_per_broker)
    • required_consumers = ceil(append_RPS / records_processed_per_consumer_sec)
    • required_storage = append_Bps * retention_seconds * replication_factor
    • if lag SLA is L, consumer throughput must exceed ingest enough that steady-state lag_seconds < L

I06 Projection / Index / Search Pipeline #

  • Capacity units:
    • projector/indexer workers
    • index shards
    • rebuild lanes
    • query replicas
  • Capacity rules:
    • required_projectors = ceil(projection_WPS / writes_applied_per_projector_sec)
    • required_query_replicas = ceil(query_RPS / queries_served_per_replica_sec)
    • required_shards = ceil(index_bytes / target_bytes_per_shard) or ceil(query_fanout / acceptable_shard_fanout)
    • if freshness SLA is F, provision projector throughput so queued_updates / projector_rate < F

I07 Cache / Origin Projection / Edge Delivery #

  • Capacity units:
    • cache nodes
    • memory bytes
    • POPs or cache tiers
    • origin capacity behind misses
  • Capacity rules:
    • required_memory = hot_working_set_bytes / target_memory_utilization
    • required_cache_nodes = ceil(required_memory / usable_memory_per_node)
    • required_origin_RPS = read_RPS * (1 - hit_ratio_target)
    • if miss-storm peak is the real NFR, size origin and cache fill path for miss_burst_RPS, not average miss rate

I08 Traffic Shaping / Admission Control #

  • Capacity units:
    • evaluator replicas
    • policy distribution replicas
    • budget-store partitions
    • queue slots
  • Capacity rules:
    • required_evaluators = ceil(eval_RPS / decisions_per_replica_sec)
    • required_budget_partitions = ceil(budget_WPS / budget_updates_per_partition_sec)
    • required_queue_slots = queue_wait_SLA_seconds * admit_rate
    • if fail-closed decision latency must stay under p99, evaluator capacity must be sized from peak eval_RPS, not average

I09 Sequence / Identifier Generation #

  • Capacity units:
    • allocator replicas
    • worker-id space
    • lease block size
  • Capacity rules:
    • required_allocator_RPS = id_RPS / block_size
    • required_allocator_replicas = ceil(required_allocator_RPS / leases_per_allocator_sec)
    • required_worker_id_space >= peak_concurrent_generators
    • if allocator path is hot, the first capacity move is larger block size because it lowers central lease traffic directly

I10 Membership / Presence / Registry #

  • Capacity units:
    • registry write partitions
    • watch fanout replicas
    • read replicas or caches
  • Capacity rules:
    • required_registry_capacity = ceil(heartbeat_RPS / heartbeats_processed_per_replica_sec)
    • required_watch_replicas = ceil(watch_push_RPS / pushes_per_replica_sec)
    • required_read_replicas = ceil(lookup_RPS / lookups_per_replica_sec) when lookup path is separate
    • if freshness NFR is loose, heartbeat interval can be increased and directly lowers write capacity needed

I11 Control Plane + Snapshot Distribution #

  • Capacity units:
    • control-plane writers
    • fanout workers
    • distribution bandwidth
    • rollout waves
  • Capacity rules:
    • required_fanout_workers = ceil(fanout_RPS / updates_pushed_per_worker_sec)
    • required_bandwidth_Bps = fanout_RPS * avg_snapshot_or_delta_bytes
    • max_targets_per_wave = rollout_SLA_seconds * apply_ack_rate
    • if convergence SLA is tight, size by peak wave rather than by average update cadence

I12 Workflow + External Side Effect #

  • Capacity units:
    • workflow workers
    • side-effect workers
    • reconciliation scanners
    • idempotency-store throughput
  • Capacity rules:
    • required_workers = ceil(effect_RPS / effects_processed_per_worker_sec)
    • required_reconcilers = ceil(stuck_items / target_reconciliation_window_seconds / items_scanned_per_reconciler_sec)
    • required_idempotency_store_RPS = effect_RPS * idempotency_reads_writes_per_effect
    • if retry amplification dominates, size from peak retry scenario, not clean-path effect rate

I13 Shared Subject Coordination #

  • Capacity units:
    • subject coordinators
    • broadcast workers
    • snapshot storage/write throughput
  • Capacity rules:
    • required_subject_capacity = hottest_subject_op_RPS / ops_ordered_per_coordinator_sec
    • required_broadcast_workers = ceil(broadcast_RPS / fanout_events_per_worker_sec)
    • required_snapshot_WPS = hot_subjects / snapshot_interval_sec
    • size from hottest subject, not average subject, if the NFR is per-document or per-room responsiveness

I14 Immutable Artifact Namespace + Delivery #

  • Capacity units:
    • metadata partitions
    • origin storage bandwidth
    • edge cache/mirror nodes
    • GC workers
  • Capacity rules:
    • required_metadata_partitions = ceil(metadata_RPS / metadata_ops_per_partition_sec)
    • required_origin_Bps = fetch_Bps * miss_ratio_to_origin
    • required_edge_nodes = ceil(fetch_Bps / sustainable_edge_Bps_per_node)
    • required_gc_workers = ceil(gc_backlog_bytes / target_gc_window_seconds / bytes_reclaimed_per_worker_sec)

I15 Execution Fleet + Worker Substrate #

  • Capacity units:
    • worker slots
    • scheduler replicas
    • warm pool instances
    • heartbeat-processing capacity
  • Capacity rules:
    • required_slots = concurrency / target_utilization
    • required_scheduler_replicas = ceil(arrival_RPS / placements_per_scheduler_sec)
    • required_warm_pool = cold_start_sensitive_arrival_RPS * warm_window_seconds
    • required_heartbeat_capacity = ceil(heartbeat_RPS / heartbeats_processed_per_replica_sec)
    • if queueing SLA is tight, size slots from peak burst concurrency, not average concurrency

I16 Key-Scoped Mutable State / Replicated KV #

  • Capacity units:
    • shards
    • replicas
    • disk/network throughput
    • cache capacity for hot keys
  • Capacity rules:
    • required_shards = max(ceil(write_RPS / writes_per_shard_sec), ceil(data_bytes / target_bytes_per_shard))
    • required_replicas = durability_target_implied_replica_count
    • required_repl_bandwidth = repl_Bps
    • required_hot_key_cache = hot_key_working_set_bytes if hot reads dominate
    • if p99 write latency is capped tightly, size shard count from hottest shard write rate, not average shard rate

I17 Traffic Steering / Request Mediation Plane #

  • Capacity units:
    • proxy/gateway instances
    • connection table entries
    • health-check workers
    • route/policy distribution replicas
  • Capacity rules:
    • required_instances = max(ceil(ingress_RPS / requests_per_instance_sec), ceil(conn_count / connections_per_instance))
    • required_health_capacity = ceil(health_RPS / checks_per_worker_sec)
    • required_policy_capacity = ceil(policy_eval_RPS / policy_evals_per_replica_sec)
    • if p99 latency is the main NFR, size from connection-heavy and retry-heavy peak, not clean-path request average

I18 Telemetry / Time-Series Pipeline #

  • Capacity units:
    • ingest shards
    • query replicas
    • storage bytes
    • rollup workers
  • Capacity rules:
    • required_ingest_shards = ceil(ingest_RPS / samples_per_shard_sec)
    • required_query_replicas = ceil(query_scan_Bps / bytes_scanned_per_replica_sec)
    • required_storage = retention_bytes
    • required_rollup_workers = ceil(active_series / series_aggregated_per_worker_sec)
    • if cost ceiling is part of the NFR, solve for max ingest or retention under that storage budget before sizing hardware

I19 Replicated Chunk / Block / File Storage Substrate #

  • Capacity units:
    • metadata partitions
    • storage nodes/disks
    • repair workers/bandwidth
    • replica bytes
  • Capacity rules:
    • required_metadata_partitions = ceil(metadata_RPS / metadata_ops_per_partition_sec)
    • required_storage_nodes = ceil(chunk_Bps / usable_Bps_per_node) and separately ceil(total_stored_bytes / usable_bytes_per_node)
    • required_repair_Bps = lost_replica_bytes / target_repair_window_seconds
    • required_total_storage = logical_data_bytes * replication_factor / usable_storage_fraction
    • if durability NFR says repair within T after one-node loss, size repair bandwidth directly from that target rather than from foreground IO average

I20 Computation / Dataflow / DAG Execution #

  • Capacity units:
    • scheduler throughput
    • worker task slots
    • shuffle bandwidth/storage
    • checkpoint bandwidth/storage
    • sink commit throughput
  • Capacity rules:
    • required_task_slots = ceil(task_concurrency / target_slot_utilization)
    • required_scheduler_capacity = ceil(task_launch_RPS / launches_per_scheduler_sec)
    • required_shuffle_Bps = shuffle_Bps
    • required_checkpoint_Bps = checkpoint_Bps
    • required_sink_commit_capacity = ceil(output_commit_RPS / commits_per_committer_sec)
    • if correctness depends on exactly-once output, size from checkpoint and commit boundaries, not only from clean-path operator throughput

I21 Trust Boundary / Cryptographic Proof Substrate #

  • Capacity units:
    • verifier replicas/cache capacity
    • signer/HSM throughput
    • trust-bundle distribution fanout
    • revocation publication latency
    • audit log partitions
  • Capacity rules:
    • required_verifiers = ceil(verify_RPS / verifications_per_verifier_sec)
    • required_signers = ceil(sign_RPS / signatures_per_signer_sec)
    • required_bundle_fanout_workers = ceil(trust_bundle_fanout_RPS / bundle_updates_per_worker_sec)
    • required_audit_partitions = ceil(audit_WPS / writes_per_partition_sec)
    • if stale revocation is the critical risk, size freshness from revocation propagation SLA rather than average issuer throughput

Interview One-Liner #

For infra prompts, I would first classify the dominant archetype, compute the first load-bearing variables for that shape, name the first bottleneck, and then run a short cross-cutting pass over correctness, availability, durability, freshness, isolation, and repair budget.