Infra HLD Diagram Discipline Cheat Sheet
Infra HLD Diagram Discipline Cheat Sheet #
Use this note as the infra-side counterpart to the product HLD diagram discipline note.
The main rule:
Draw the diagram around the contract, invariants, and ownership of critical state. Then attach the execution paths, workers, and projections.
Do not start with:
KafkaRedisRaftElasticsearch- generic cloud boxes
Start with:
- the contract
- the critical state
- the hard path
1. Start from the contract #
Before drawing anything, write the system promise.
Examples:
Lock service #
- one valid holder at a time
- lease-based ownership
- stale holders must be fenced
Queue #
- durable enqueue
- at-least-once delivery
- visibility timeout
Scheduler #
- jobs become runnable at or after
run_at - one worker should own a run at a time
- retry is supported
Why:
- infra diagrams are driven by semantics more than user flow
- the contract tells you which boxes must exist
2. Write the invariants next to the diagram #
For infra systems, invariants should be visible.
Examples:
- only one holder may act
- no stale config may override newer config
- offset must not advance before durable processing
- frontier must not skip uncovered work
Why:
- these invariants explain why you need certain stores and checks
- they help the interviewer follow the design
3. Draw the canonical state owner first #
The first real box should usually be the thing that owns critical truth.
Examples:
Metadata StoreLease StoreQueue LogSchedule StoreConfig StoreFrontier Store
Rule:
- the source of truth should be obvious
- derived views should come later
4. Draw the control path before the data path #
In infra interviews, correctness often lives in the control path.
Examples:
Lock service #
- client
- lock API
- metadata store
- watch / renew path
Config service #
- admin
- config API
- config store
- snapshot publisher
- local evaluators
Scheduler #
- scheduler API
- schedule store
- due scanner
- runnable queue
- workers
Why:
- infra systems often have one small critical control path and one large execution path
5. Put the execution plane in a separate lane #
Separate:
- control plane
- execution plane
Examples:
Queue #
- control-ish state:
- message durability
- visibility timeout
- consumer progress
- execution plane:
- consumers
- handlers
Scheduler #
- control:
- schedule metadata
- due scan
- claim state
- execution:
- workers
- downstream job handlers
This keeps the diagram readable.
6. Draw ownership, leases, and epochs explicitly #
For claim/lease systems, always show:
- who owns what
- where lease state lives
- where fencing token / epoch comes from
Examples:
LeaseStateShardOwnershipOwnerEpoch
If stale actor risk matters, the token must appear in the diagram.
7. Show async boundaries clearly #
Use different arrows or labels for:
- sync metadata write
- async scan
- async publish
- worker claim
- replay / watch
Examples:
- config store -> snapshot publisher
- schedule store -> due scanner
- queue log -> consumer
- metadata store -> watch clients
Do not blur sync and async arrows.
8. Draw replay and recovery paths if they matter #
Infra designs often need repair paths in the HLD.
Examples:
- reconciliation worker
- reindexer
- lease expiry scanner
- replay from log
- snapshot rebuild
If the main correctness story depends on repair, show the repair box.
9. Separate canonical truth from derived state #
Examples:
Config StorevsLocal SnapshotQueue LogvsConsumer CacheSource MetricsvsAggregated DashboardMembership TruthvsPresence View
Reason:
- infra questions often hinge on whether the interviewer understands what is authoritative
10. Draw hot partitions or hot keys if they are central #
Examples:
- due-time bucket in scheduler
- hot tenant in rate limiter
- hot lock id
- hot queue partition
- celebrity fanout equivalent in infra: hot config rollout or hot topic partition
If scale is a major deep dive, the hotspot should appear in the diagram.
11. Canonical drawing sequence #
Use this order every time:
- write contract and invariants
- draw client / caller
- draw API / control service
- draw canonical metadata or source-truth store
- draw async scanner / publisher / worker
- draw execution plane or downstream handlers
- draw derived views or local snapshots
- draw repair / reconciliation path
- mark the hard spot for deep dive
This prevents infra diagrams from becoming tool soup.
12. Default skeletons by infra system type #
Coordination / lock service #
ClientLock APIMetadata StoreWatch / Renew PathDownstream Protected Resource
Optional:
Fencing Token Validator
Queue #
ProducerBroker APIMessage LogVisibility Timeout ManagerConsumerDLQ
Scheduler #
Scheduler APISchedule StoreDue ScannerRunnable QueueWorkerRun State Store
Config / feature flag / policy #
Admin / Control ClientConfig APIConfig StoreSnapshot PublisherAgents / SDKsLocal Snapshot Cache
Rate limiter #
ClientDecision ServiceBudget / Token Store- maybe
Local Token Cache Reconciliation / Refill Path
Metrics / tracing #
Agent / EmitterIngest ServiceDurable Log / Time-Series StoreAggregation PipelineQuery ServiceDashboard / Alert Engine
13. Questions to ask for every box #
For each box, answer:
- what contract does this box enforce?
- what state does it own?
- is it source truth or derived state?
- is it sync or async?
- what invariant would break if this box disappeared?
If you cannot answer these, simplify.
14. What to say while drawing #
Use lines like:
This box owns the canonical lease state.This path is synchronous because correctness depends on it.This worker is asynchronous because bounded lag is acceptable here.These local snapshots are derived; the control store remains canonical.I’m showing the reconciliation loop because repair is part of the design, not an afterthought.
15. Common mistakes #
- drawing Raft/Kafka/Redis before defining the contract
- not showing the canonical state owner
- hiding lease / epoch / offset state
- mixing control truth with execution state
- skipping reconciliation or replay paths
- drawing every operational detail instead of the correctness-critical pieces
- not distinguishing sync control path from async execution path
16. Interview one-liner #
For infra HLDs I start from the contract and invariants, then draw the canonical state owner, then the control path, then the execution path, and finally the repair or replay path. That keeps the diagram focused on correctness instead of tools.