Distributed Systems Design Space #
Status: Archive Candidate
Use design_space_v2.md instead for the newer third-layer mechanism framing.
A reference for thinking about distributed-system design through dimensional decomposition. Built up across many iterations of pressure-testing for senior/staff-level system design interviews.
What this is, honestly #
This is a vocabulary harvesting tool at one altitude, not a closed ontology. The named cells below are points in unstated generative spaces. The vocabulary is useful for fluent recall, recognition, and shared language. It is not the complete structure of system design.
Three things to understand before using it:
The cells aren’t exhaustive. Every dimension keeps revealing new cells under pressure. The lists below are the production-common shapes; new shapes appear as systems evolve. When you encounter a system that doesn’t fit, name the underlying property rather than forcing it into an existing slot.
The dimensions aren’t generatively decomposed. Each named dimension is itself a list of observed cells, not a cross-product of generative axes. The structural axes that would close each dimension are sketched in the Generative Structure section below.
The dimensions aren’t fully orthogonal. Several pairs (lifecycle ↔ durability, mutability ↔ atomicity, order ↔ coordination) share underlying structural concerns. The framework’s “DAG-ordered selection across decomposed dimensions” claim is partly false — dimensions are correlated through structural coupling, not just through explicit constraints.
Despite these limits, the vocabulary is sufficient for ~80–90% of senior-IC interview surface area. Use it for what it does. Don’t use it for what it doesn’t.
Domain coverage #
| Domain | Status | Center of gravity |
|---|---|---|
| Storage / Data-Model | Built | Entity shape, partition, replication, order, coordination |
| Process / Control | Built | Operation lifecycle, role topology, authority delegation |
| Computation / Dataflow | Partial via I20 | Computation graph, operator semantics, data movement |
| Resource / Capacity | Missing | Allocation policy, preemption, reclamation |
| Network / Routing | Missing | Routing topology, channel, negotiation, secrecy |
| Decision / ML | Missing | Decision shape, latency regime, update cadence |
| Document / Collaboration | Missing | Document model, edit semantics, presence |
The built domains plus the partial I20 computation/dataflow archetype cover most senior-IC interview surface area, but this is still not a closed ontology.
Storage / Data-Model Domain #
Dimensions #
Structural Tier #
writeModel (Layer 0) — what the entity intrinsically is, in terms of how new state gets created
event— append-only events, state derived by foldsnapshot— overwrite the latest statelog— append-only physical log, possibly compactedcommutative— operations that commute regardless of orderderived— projection from another source of truth
semanticRole (Layer 0) — what the entity points to or contains
value— the entity is the datapointer— names another entityprojection— derived view of underlying stateblob— opaque bytes
lifecycle (Layer 0) — how long the entity is intended to exist
ephemeral— exists for a session or lessleased— exists for a bounded TTLdurable— persists until explicitly deletedimmutable— write-once, never modifiedarchived— preserved long-term, cold storage
mutability (Layer 1) — how the entity changes once created
immutable— never changes after writemonotonic— changes only in one direction (CRDT-style)overwriteable— fully replaceableversioned— old versions retained alongside newtombstone— deleted via a tombstone marker
partition (Layer 2) — how data is divided across nodes
single— no partitioninghash— by hash of keyrange— by sorted rangesid-space— by structured ID space (Snowflake, UUID)tenant— by customer/tenant boundarytime— by time bucketlocality— by geographic or network proximitycontent— by hash of contentload— adaptive based on access patterns (Slicer)
replication (Layer 2) — how data is copied for fault tolerance
single— no replicationleader-follower— single-leader async or syncmulti-leader— multiple acceptorsleaderless— quorum-based (Dynamo-style)consensus— quorum-replicated state machine (Raft, Paxos)chain— chain replicationerasure— erasure-coded fragments
Mechanism Tier #
order (Layer 3) — what ordering guarantees the system provides
linearizable— global real-time-aligned total ordersequential— global total order, not real-time-alignedcausal— partial order respecting happens-beforeper-key— per-key total orderper-partition— per-partition total orderper-producer— per-producer FIFOeventual— no order, replicas convergek-sortable— bounded-skew total ordercommutative— operations commute, order irrelevant
coordination (Layer 3) — how concurrent access is regulated
none— no coordinationoptimistic— assume no conflict, validate at commit (OCC)pessimistic— locks held during work (2PL)fenced— epoch/fence tokens prevent stale opsconsensus— quorum agreement before commitsnapshot— read from a consistent snapshot (MVCC)
atomicity (Layer 3) — granularity of atomic operations
single-key— atomic per recordmulti-key-local— atomic across keys on one shardmulti-key-distrib— distributed atomicity across shards (2PC)saga— compensating transactions, no atomicitybest-effort— no atomicity guarantees
Operational Tier (largely DERIVED, not freely chosen) #
delivery (Layer 4) — message/operation delivery guarantees
at-most-once— may lose, never duplicateat-least-once— never lose, may duplicateexactly-once— neither lose nor duplicatebounded-retry— retry up to N times then give up
failure (Layer 4) — how node failures are detected
timeout— request timeoutheartbeat— periodic liveness pinglease— TTL-based livenessprobe— active health checkquorum-fd— quorum-based failure detector (Raft)explicit— explicit registration/deregistration
durability (Layer 4) — what survives crashes
volatile— RAM onlybuffered— written to disk but not fsyncedfsynced— fsynced to local diskreplicated— replicated to multiple nodesarchived— geo-replicated cold storage
Storage Clusters (canonical fingerprints) #
| Cluster | Canonical signature |
|---|---|
| Event-Sourced | event + value/projection + immutable + per-key + log-shipping |
| Content-Addressed | snapshot + blob/value + immutable + content + leaderless |
| Object-Storage | snapshot + blob + durable/archived + hash/locality + erasure or replicated |
| Mutable-Pointer | snapshot + pointer + leaderless or consensus + linearizable |
| Operational-State | snapshot + value + leader-follower or consensus + linearizable |
| Consensus-State | log/snapshot + value + consensus + linearizable + fsynced+ |
| Eventually-Consistent KV | snapshot + value + leaderless + eventual + read-repair |
| CRDT | commutative + value + leaderless + commutative + monotonic |
Canonical storage systems #
| System | Cluster | Distinguishing fingerprint |
|---|---|---|
| Postgres single-instance | Operational-State | snapshot + value + single + linearizable + MVCC |
| Postgres + sync replica | Operational-State | + leader-follower + replicated |
| Spanner directory | Operational-State | consensus + TrueTime-aligned + multi-key-distrib |
| CockroachDB | Operational-State | range + consensus + serializable |
| Cassandra | Eventually-Consistent KV | hash + leaderless + eventual + LWW |
| DynamoDB | Eventually-Consistent KV | hash + leaderless + per-key linearizable available |
| DynamoDB Global Tables | Eventually-Consistent KV | + multi-leader + LWW conflict resolution |
| Riak with CRDTs | CRDT | hash + leaderless + commutative + monotonic |
| Automerge | CRDT | event + commutative + multi-leader |
| Datomic | Event-Sourced | event + immutable + log + per-key linearizable |
| EventStoreDB | Event-Sourced | event + per-stream FIFO + log |
| Kafka topic-partition | Event-Sourced | log + per-partition total + leader-follower (ISR) |
| Git object store | Content-Addressed | content-hash + immutable + blob |
| IPFS | Content-Addressed | content-hash + immutable + DHT routing |
| S3 | Object-Storage | hash + erasure + per-key linearizable strong-read |
| S3 Glacier | Object-Storage | + archived |
| Azure Blob LRS/ZRS/GRS | Object-Storage | (configurable replication scope) |
| Redis (default) | Mutable-Pointer | hash + single-leader + per-key linearizable + volatile |
| Memcached | Operational-State | hash + single + per-key + volatile |
| etcd | Consensus-State | range + consensus + linearizable + fsynced+replicated |
| ZooKeeper | Consensus-State | range + consensus + linearizable + fsynced+replicated |
| Consul KV | Consensus-State | range + consensus + linearizable |
| TigerBeetle | Consensus-State | + strict-serializable + double-entry |
| FoundationDB | Consensus-State | range + consensus + serializable + multi-key-distrib |
| MongoDB (default) | Operational-State | hash + leader-follower + causal + replicated |
| MongoDB w/ majority writes | Operational-State | + linearizable + replicated |
| HBase | Operational-State | range + leader-per-region + per-key linearizable |
| BigTable | Operational-State | range + per-key linearizable |
| ClickHouse | Operational-State | + columnar + range + bulk-loaded |
| Elasticsearch | Operational-State | hash + leader-follower + eventual reads |
| Snowflake | Operational-State | range + multi-version + serializable + columnar |
| Aurora DSQL | Operational-State | + serializable + globally-distributed |
| YugabyteDB | Consensus-State | range + consensus + serializable |
| ScyllaDB | Eventually-Consistent KV | hash + leaderless + eventual |
| Couchbase | Eventually-Consistent KV | hash + multi-leader + LWW |
| Backblaze Vault | Object-Storage | + erasure (17+3) + locality |
| Ceph RADOS | Object-Storage | hash (CRUSH) + replicated or erasure |
| HDFS | Object-Storage | block-hash + replicated 3x or erasure |
| Glacier Deep Archive | Object-Storage | + archived + erasure |
Process / Control Domain #
Dimensions #
Structural Tier #
lifecycleShape (Layer 0) — the state-machine spine of one unit of work
claim-execute-release— claim resource, do work, release (locks, leases)request-reply— synchronous bounded interaction (RPC)propose-vote-commit— multi-party agreement (Paxos, Raft, 2PC)submit-progress-complete— long-running with progress (workflows, jobs)observe-decide-act— control loop, continuous reconciliationfire-forget— emit and don’t trackpull-batch-checkpoint— pull work in batches, checkpoint progress
roleTopology (Layer 0) — how participants relate
client-server— asymmetric two-partyleader-led— leader-driven coordinationpeer-to-peer— symmetric peersbroker-mediated— through a queue/brokerhierarchical— multi-level (cluster → node → pod)gossip— symmetric eventual propagation
authorityDelegation (Layer 1) — where the right to act comes from
static-assigned— configured at deploylease-acquired— granted by an authority service for a TTLclaim-acquired— claimed without prior authority (CAS-style)quorum-elected— elected by quorum vote (Raft leader)gossip-elected— elected through gossip convergencehuman-assigned— manual operator assignment
participantCardinality (Layer 1) — how many parties
single-actor— solotwo-party— exactly twon-fixed— fixed N participantsn-dynamic— N varies at runtimebroadcast— all listeners
Mechanism Tier #
reconciliation (Layer 2) — how drift is corrected
none— fire-and-forget, no reconciliationevent-driven— react to changesperiodic-resweep— sweep on scheduledrift-correct— detect mismatch, correctreactive— only on failurecontroller-loop— continuous observe-decide-act
recovery (Layer 2) — what happens after failure
retry-in-place— same actor retriesrequeue— return to queue, different actor picks upfailover— different node takes overcompensate— saga-style rollbackmanual— alert, human intervenesidempotent-replay— replay from logabandon— drop and log
progressVisibility (Layer 2) — how observable is in-flight work
heartbeat-pull— periodic pingslease-presence— alive iff lease is heldcheckpoint-write— durable checkpointsexternal-poll— observer queries statusevent-emission— push events on transitionsnone— opaque
coordAuthority (Layer 3) — where coordination state lives
central-coord— single coordinator nodeconsensus-state— replicated via consensuslease-distributed— leases issued to workersgossip-converged— gossip-propagated, eventually agreedclient-driven— no server-side coordination
Operational Tier (DERIVED) #
latencyBudget (Layer 4) — sub-ms, interactive, bulk, batch, eventually
throughputPattern (Layer 4) — single-flight, bounded-parallel, unbounded-parallel, windowed-batch, streaming
blastRadius (Layer 4) — single-op, worker-pool, region, tenant, global
Process Clusters #
| Cluster | Canonical signature |
|---|---|
| Claim-Lease | claim-execute-release + broker-mediated + lease-acquired + requeue |
| Scheduled-Job | submit-progress-complete + leader-led + scheduled + requeue |
| Frontier-Scan | pull-batch-checkpoint + leader-led + checkpoint-write |
| Workflow | submit-progress-complete + leader-led + lease-acquired + idempotent-replay + event-driven |
| Execution-Fleet | claim-execute-release + leader-led + lease-acquired + heartbeat-pull |
| Control-Loop | observe-decide-act + leader-led + controller-loop + continuous |
| Consensus-Protocol | propose-vote-commit + n-fixed + quorum-elected + consensus-state |
| Gossip-Convergence | observe-decide-act + gossip + gossip-elected + gossip-converged |
Canonical process systems #
| System | Cluster | Distinguishing fingerprint |
|---|---|---|
| AWS SQS | Claim-Lease | broker + claim + lease + requeue |
| Redis Redlock | Claim-Lease | claim + multi-broker + lease (controversial safety) |
| Postgres SKIP LOCKED | Claim-Lease | claim + db-as-broker + lease via tx |
| HashiCorp Vault leases | Claim-Lease | leader-led + sequence-fenced |
| AWS EventBridge Scheduler | Scheduled-Job | leader-led + central-scheduler |
| Apache Airflow | Scheduled-Job | leader-led + scheduler + DAG workflows |
| K8s CronJob | Scheduled-Job | hierarchical + controller-driven |
| Quartz | Scheduled-Job | leader-led + central-scheduler |
| Cassandra repair | Frontier-Scan | per-partition + checkpoint-write |
| Temporal/Cadence | Workflow | history-server + replay + dedupe-key |
| AWS Step Functions | Workflow | central-scheduler + replay |
| Argo Workflows | Workflow | leader-led + DAG + event-driven |
| K8s scheduler+kubelet | Execution-Fleet | controller + lease + heartbeat |
| Apache Nomad | Execution-Fleet | leader-led + lease |
| Apache Mesos | Execution-Fleet | hierarchical (master + frameworks + agents) |
| Spark driver/executor | Execution-Fleet | submit-progress + leader-driven + replay |
| Apache Flink | Execution-Fleet | streaming + leader + checkpointing |
| K8s controllers | Control-Loop | controller + leader-led + continuous |
| Istio Pilot | Control-Loop | controller + xDS push |
| Envoy xDS | Control-Loop | controller + dynamic config |
| Raft (etcd, Consul) | Consensus-Protocol | propose-vote-commit + quorum |
| Multi-Paxos | Consensus-Protocol | propose-vote-commit + quorum |
| Cassandra gossip | Gossip-Convergence | peer + gossip-elected |
| SWIM (HashiCorp serf) | Gossip-Convergence | peer + gossip + failure detection |
Generative Structure (the structure beneath the vocabulary) #
The named cells above are popular points in unstated generative spaces. The structural axes that would close each dimension:
Partition function f: K → Π #
- Input domain — intrinsic-key | extrinsic-attribute | value-content | workload-state | infrastructure-state | graph-structure
- Function form — algorithmic-deterministic | table-lookup | inferential | rule-based | oracle
- Temporal stability — static | boundary-event | continuous
- Output structure — flat-fixed | flat-elastic | hierarchical | overlapping
Cross-product = ~360 cells. Named values like P-hash, P-range, P-load, P-lookup, P-graph are points. Missing-but-real shapes include: workload-adaptive content-routing (Slicer-style), graph-cut partitioning, criticality-tier partitioning, trust-boundary partitioning.
Replication scheme #
- Write authority topology — single | quorum | distributed | chain
- Acknowledgment threshold — N=1 | N=majority | N=W | N=all | N=ISR
- Read consultation pattern — leader-only | leader-lease | any-one | quorum-R | all-replicas
- Conflict admissibility — impossible | by-timestamp | by-version-vector | by-merge | by-fail-and-retry
- Convergence strategy — log-shipping | state-transfer | read-repair | anti-entropy | deterministic-replay
- Membership change — static | view-change | joint-consensus | gossip-based | external-coordinator
- Replica symmetry — symmetric | asymmetric-by-position | asymmetric-by-capability | asymmetric-by-content
Cross-product = ~25,000 cells (modulo constraints). Named topologies are regions in this space. The asymmetric-by-capability axis (witness replicas, learners, observers) is what the named-cell taxonomy most often misses.
Order (partial relation R over events) #
- Scope of relation — global | per-key | per-partition | per-stream | per-causal-cone | none
- Completeness within scope — total | partial | vacuously total
- Real-time alignment — aligned | session-aligned | bounded-skew | unaligned | inapplicable
- Happens-before preservation — preserved | not-preserved
- Observability across observers — agreed | agreed-eventually | per-observer
- Persistence of order — monotonic | reversible
- Conflict resolution — impossible | by-timestamp | by-version-vector | by-merge | by-fail-and-retry
Cross-product = ~5,400 cells (heavily constrained). Order has the largest gap between named-cell taxonomy and structural reality among all dimensions — the consistency-models literature has been carving distinctions for 40 years without exhausting the space.
Process (the dimensions themselves are clusters of structural concerns) #
The 11 process dimensions aren’t truly orthogonal. They project onto roughly 8 underlying axes:
- Work-unit lifecycle topology — linear / branching / cyclic / infinite
- Termination property — must-terminate / need-not-terminate / contingent
- Participant structure — count × symmetry
- Authority distribution — conferral × time-bound × revocability
- Feedback mechanism — trigger × direction × granularity × correction
- Failure handling — action × scope × idempotence assumption
- Idempotence and replayability — replay model × determinism
- Coordination-state consistency — strong / bounded / eventual / per-client
The 11 named dimensions are themselves named-cell vocabularies for sub-spaces of these 8 axes.
Known structural limits (dimensional overlaps) #
Pairs that are highly coupled (probably should be merged or flagged as one concern):
- lifecycle ↔ durability — both project from “retention-and-reliability commitment”; the shared
archivedvalue is a clear redundancy - mutability ↔ atomicity — both project from “concurrency exposure”
- order ↔ coordination — strong forced relationship between order strength and coordination floor
Pairs that are partly coupled (handleable via constraints):
- partition ↔ replication (failure-domain interactions)
- replication ↔ failure (failure detection is a function of replication topology)
- order ↔ delivery (semantics interact)
- authorityDelegation ↔ coordAuthority (overlapping concerns about coordination state)
Pairs that are genuinely orthogonal:
- partition ↔ order
- partition ↔ mutability
- writeModel ↔ partition
- semanticRole ↔ partition
About half the dimension pairs are not fully orthogonal. The framework’s claim of “12 dimensions, 1B cell cartesian space” is somewhat inflated — the real design space, after removing redundancy, is closer to 6–8 truly independent axes.
What falls outside the framework #
Concrete things the framework cannot describe well:
Routing layer — anycast, DNS-balanced load balancing, consistent-hash routing, proxy-routed (Vitess vtgate), broadcast scatter-gather. The framework has no “routing topology” dimension. Real systems (CDNs, service meshes, public DNS) need this dimension as first-class.
Multi-aspect concerns — Real concerns (anycast, replication, partitioning, service discovery, consensus) have multiple aspects: function (what mapping it implements), process (what lifecycle it requires), protocol (what messages flow), resource (what it consumes). The framework forces concerns into a single domain. The clean storage/process split is a structural simplification that loses information.
Composition across domains — A workflow engine has process shape AND uses an event log (storage shape). A CDN has routing AND replication AND caching. The framework has no first-class composition operator.
Out-of-scope domains entirely:
- Computation/dataflow (Spark, Flink, TensorFlow at the graph level)
- Resource/capacity (K8s scheduler at the bin-packing level, GPU schedulers)
- Network/protocol (TCP, gRPC, mTLS, BGP)
- Decision/ML (recommenders, fraud detection, model serving)
- Document/collaboration (Google Docs, Figma, Notion, OT/CRDT systems)
Quick lookup: when something doesn’t fit #
When an interviewer asks about a system whose design doesn’t map cleanly to the framework, the right move depends on why it doesn’t fit:
| Symptom | Likely cause | What to do |
|---|---|---|
| “Doesn’t match any named cell” | New cell in an existing dimension | Name the underlying property (e.g., “this partitions on regulatory boundary, which is a new flavor of P-tenant”) |
| “Has multiple aspects fighting” | Multi-aspect concern (anycast, routing) | Acknowledge the multi-aspect nature; describe each aspect on its own terms |
| “Whole class missing” | Out-of-scope domain | Flag the domain gap explicitly (“this is a dataflow problem, the storage/process lens captures only the persistence layer”) |
| “Two named cells both seem to fit” | Dimension overlap | Pick one, explain the coupling (“technically L-archived and Du-archived both apply, since the retention and durability tiers are coupled here”) |
| “All choices seem locked once one is picked” | Constraint cascade | Surface the cascade explicitly — that’s a strong signal of structural derivation |
Strong interview move: after answering, name which dimension you used and what cell within it. “I’d choose R-consensus + O-linearizable + Co-consensus + Du-replicated for the metadata service. The consensus replication forces the linearizability and the consensus coordination — it’s one structural choice expressed across three dimensions.”
Reading list (organized by dimension) #
Each dimension has a recommended primary source, plus depth options. For each, the design-focused entry point is listed first, then theoretical depth.
Foundational (read first) #
- Kleppmann, Designing Data-Intensive Applications (DDIA), O’Reilly 2017 — the field’s standard for design-decision framing across most dimensions
- Helland, “Life Beyond Distributed Transactions”, CIDR 2007 — introduces entity/activity primitives that underlie most subsequent decompositions
Order and consistency #
- DDIA Chapter 9 (Consistency and Consensus) — design-focused, primary source
- Jepsen consistency map at jepsen.io/consistency — interactive reference for all named consistency models in 2026
- TigerBeetle docs on consistency — modern design-focused treatment
- Burckhardt, Principles of Eventual Consistency (2014, free PDF) — generative structural treatment, mathematical
- Viotti & Vukolić, “Consistency in Non-Transactional Distributed Storage Systems” (CSUR 2016) — best survey, partial-order lattice diagram
- Adya, “Weak Consistency” (MIT PhD thesis 1999) — generative isolation hierarchy via forbidden phenomena
- Bailis et al., “Highly Available Transactions” (VLDB 2014) — unifies isolation, consistency, session guarantees
Coordination and concurrency #
- DDIA Chapters 7-9 — primary design source
- Kleppmann, “How to do distributed locking” (2016 blog) — single most-referenced piece on production locking; introduces fencing token vocabulary
- Hellerstein & Alvaro, “Keeping CALM” (CACM 2020) — coordination avoidance via monotonicity
- Bailis, Coordination Avoidance in Distributed Databases (Berkeley PhD thesis 2015) — when coordination is necessary vs convention
- Bernstein, Hadzilacos, Goodman, Concurrency Control and Recovery in Database Systems (1987, free PDF) — classical depth on 2PL, OCC, MVCC
- Weikum & Vossen, Transactional Information Systems (2002) — modern comprehensive transaction-processing reference
- Junqueira & Reed, ZooKeeper (O’Reilly 2013) — coordination-kernel design philosophy
- Richardson, Microservices Patterns Chapters 4-5 (2018) — saga pattern, the modern alternative to 2PC
Replication #
- DDIA Chapter 5 — primary design source
- Schneider, “Implementing Fault-Tolerant Services Using the State Machine Approach” (CSUR 1990) — structural decomposition of replication
- Howard, Distributed Consensus Revised (Cambridge PhD 2019) — modern consensus protocol family treatment
- Howard et al., “Flexible Paxos” (OPODIS 2016) — generalizes quorum requirements
- Petrov, Database Internals Chapter 11 — implementation-focused complement to DDIA
- Decandia et al., “Dynamo” (SOSP 2007) — production engineering of leaderless replication
Partition #
- DDIA Chapter 6 — design-focused
- Adams et al., “Slicer” (OSDI 2016) — workload-adaptive sharding (
P-lookup) - Curino et al., “Schism” (VLDB 2010) — workload-driven partitioning as graph problem
- Pavlo, Curino, Zdonik, “Horticulture” (SIGMOD 2012) — skew-aware automatic partitioning
- Mahmud et al., “A Survey of Data Partitioning” (Big Data Mining 2020) — taxonomy survey
Delivery semantics #
- Akidau, Chernyak, Lax, Streaming Systems Chapter 5 (O’Reilly 2018) — primary, especially exactly-once-effect vs exactly-once-delivery
- DDIA Chapters 11-12 — storage-system perspective on delivery
- Richardson, Microservices Patterns Chapters 3, 7 — outbox pattern, transactional outbox with CDC
- Stripe idempotency-key documentation — practical API-boundary pattern
- Confluent delivery semantics docs at docs.confluent.io — Kafka exactly-once current state
- Morling’s blog (morling.dev) — Debezium creator on dual writes and CDC
Reconciliation regime #
- Ibryam & Huß, Kubernetes Patterns (2nd ed, 2023) Chapters 24-25 — primary, Operator and Controller patterns
- Burns et al., Kubernetes: Up and Running (3rd ed) Chapter 17 — extending Kubernetes
- The Kubebuilder Book at book.kubebuilder.io — practical reconciler engineering
- Beyer et al., Site Reliability Engineering Chapters 7, 9 — operational philosophy of reconciliation
- Devismes et al., Introduction to Distributed Self-Stabilizing Algorithms (Morgan & Claypool 2019) — theoretical foundation
- Dijkstra, “Self-Stabilizing Systems in Spite of Distributed Control” (CACM 1974) — foundational paper, 2 pages
Erasure coding #
- Plank, “Erasure Codes for Storage Systems” (USENIX
;login:2013) — primary engineering primer - Plank & Huang, FAST 2013 tutorial slides — production-grounded
- Huang et al., “Erasure Coding in Windows Azure Storage” (USENIX ATC 2012) — first production LRC
- Sathiamoorthy et al., “XORing Elephants” (VLDB 2013) — Facebook LRC for HDFS
- Balaji et al., “Erasure Coding for Distributed Storage: An Overview” (arXiv 1806.04437, 2018) — structured survey including regenerating codes
- Backblaze Reed-Solomon library + blog post — pedagogical, free
- Russ Cox, “Finite Field Arithmetic and Reed-Solomon Coding” (research.swtch.com/field) — clearest accessible treatment
- Roth, Introduction to Coding Theory (Cambridge 2006) — rigorous textbook
Content hashing / addressing #
- Pro Git Chapter 10 (free at git-scm.com) — best practical introduction; build a tiny CAS by hand
- GitHub blog “Git’s Database Internals” series — production engineering depth
- IPFS Merkle DAG documentation — modern canonical treatment
- Katz & Lindell, Introduction to Modern Cryptography Chapter 5 — collision resistance, Merkle-Damgård, formal foundations
- Boneh & Shoup, A Graduate Course in Applied Cryptography (free at toc.cryptobook.us) — more rigorous alternative
- Muthitacharoen et al., “A Low-Bandwidth Network File System” (SOSP 2001) — foundational content-defined chunking (LBFS)
- Xia et al., “FastCDC” (USENIX ATC 2016, free) — modern fast chunking
- Quinlan & Dorward, “Venti” (USENIX 2002) — Plan 9’s CAS, predates Git
Process / control / coordination #
- Burckhardt et al., “Replicated Data Types: Specification, Verification, Optimality” (POPL 2014) — algebraic decomposition of CRDTs
- Aguilera & Walfish, “No Time for Asynchrony” (HotOS 2009) — timing assumptions as structural axis
- Cachin, Guerraoui, Rodrigues, Introduction to Reliable and Secure Distributed Programming (2nd ed 2011) — abstraction-stack approach
- Helland, “Heisenberg Was on the Write Track” (CIDR 2015) — coordination under uncertainty
Reading-time recommendations #
| Time budget | Reading priority |
|---|---|
| ~10 hours | DDIA Chapters 5, 7-9, 11-12 (the core framework in single text) |
| ~25 hours | + Akidau Streaming Systems Ch. 5 + Kleppmann locking blog + Jepsen consistency map + Plank ;login: primer |
| ~50 hours | + Richardson Microservices Patterns Ch. 3-5, 7 + Kubernetes Patterns Ch. 24-25 + Helland’s “Life Beyond Distributed Transactions” + Bailis HAT paper |
| ~120 hours | + Burckhardt’s Principles of Eventual Consistency + Adya thesis + Bailis Coordination Avoidance thesis + Self-Stabilizing Algorithms textbook |
For interview prep specifically, the ~25 hour budget is sufficient. Beyond that is intellectual depth rather than interview readiness.
Honest closing note #
This framework is a teaching taxonomy with constraint structure, not a working ontology. Its value is in giving you internalized structure for recognizing common patterns and deriving consequences from initial choices. It is not the complete structure of system design, and shouldn’t be used as one.
Three durable warnings:
When the framework gives you a confident answer to a structural question, double-check whether the dimensions you used are actually orthogonal. Half of the dimension pairs are coupled.
When something doesn’t fit any named cell, the framework is telling you something honest — either you’re at a missing cell, an out-of-scope domain, or a multi-aspect concern. Don’t force the fit.
The named-cell vocabulary is what you’ll use in interviews because that’s how engineers actually talk. The generative axes are what you should know in case an interviewer probes deeper. Most won’t.
Good luck.