Skip to main content
  1. System Design Components/

Dynamo Process And Storage Traces #

Source: dynamo.pdf

Archetype Read #

Dynamo is primarily a storage system.

Dominant archetypes:

  • I16 Replicated KV
  • I09 Shared Subject + Mergeable Edits for divergent versions and reconciliation

Supporting process shapes:

  • membership view
  • snapshot/distribution apply for routing and membership propagation
  • repair / anti-entropy batch
  • hinted handoff as a deferred transfer workflow

So unlike Borg:

  • Borg is mostly process, with storage as control substrate
  • Dynamo is mostly storage, with process as replication/repair substrate

Main Judgment #

The main storage truth in Dynamo is:

  • a key’s replicated multi-version state
  • plus causal metadata
  • plus quorum visibility rules

The main process logic around it is:

  • coordinator selection
  • sloppy quorum execution
  • hinted handoff
  • gossip membership
  • anti-entropy / read repair

1. Storage Trace: Replicated Key Version Under Quorum #

This is the core Dynamo trace.

Core lines #

  • coordinator_visible_version
  • replica_versions
  • write_quorum_state
  • read_visible_versions
  • client_trying_put_or_get

Reusable frame mapping #

  • truth locus -> coordinator_visible_version plus replica_versions
  • replica/derived state -> per-replica object copies
  • durability boundary -> write_quorum_state
  • visibility -> read_visible_versions
  • actor trying to write/read/apply -> client_trying_put_or_get

Trace #

time →

coordinator_visible_version  D1 -------------- D2 pending -------- D2 ---------------
replica_A_version            D1 -------------- D2 --------------- D2 ---------------
replica_B_version            D1 -------------- D1 --------------- D2 ---------------
write_quorum_state           W@D1 ------------ W pending -------- W@D2 -------------
read_visible_versions        {D1} ------------ {D1} ------------ {D2} -------------
client_trying_put_or_get     none ------------ put(D2) --------- get()->{D2} ------

What it teaches #

  • a write can exist on some replicas before it reaches the configured write quorum
  • read visibility follows the quorum/read path, not just one replica’s local state
  • storage truth is a replicated key-version state, not a single leader value

State loci #

  • authoritative:
    • replica set for the key under quorum policy
  • local/replica:
    • each replica node’s local object version
  • cached/derived:
    • coordinator’s gathered version set, client read result
  • repair:
    • replication catch-up / read repair

Core invariant #

  • a version must not be treated as successfully durable/visible under the configured policy before enough replicas have accepted it

2. Storage Trace: Divergent Versions And Vector Clock Causality #

This is the distinctive Dynamo storage shape.

Core lines #

  • object_versions
  • vector_clock_frontier
  • replica_A_versions
  • replica_B_versions
  • client_trying_reconcile

Reusable frame mapping #

  • truth locus -> object_versions
  • replica/derived state -> replica_A_versions, replica_B_versions
  • durability/currentness boundary -> vector_clock_frontier
  • visibility -> versions surfaced to the client on read
  • actor trying to write/read/apply -> client_trying_reconcile

Trace #

time →

object_versions            {D1} ------------ {D2,D3} ---------------- {D4} -----------
vector_clock_frontier      [x:1] ----------- [x:2] || [y:1] ---------- [x:2,y:1,z:1]
replica_A_versions         {D2} ------------ {D2} -------------------- {D4} -----------
replica_B_versions         {D1} ------------ {D3} -------------------- {D4} -----------
client_trying_reconcile    none ------------ get()->{D2,D3} ---------- merge put(D4) --

What it teaches #

  • currentness is not always a single scalar version
  • the truth object may contain concurrent branches
  • reads can surface multiple versions that the client must reconcile

State loci #

  • authoritative:
    • distributed per-key version set with vector clocks
  • local/replica:
    • each replica’s local branch/head set
  • cached/derived:
    • client-visible returned versions and reconcile context
  • repair:
    • client merge followed by writeback

Core invariant #

  • versions that are causally unrelated must not be silently discarded as if one subsumed the other

3. Storage Trace: Read Repair #

Dynamo uses reads as an opportunistic repair moment.

Core lines #

  • coordinator_returned_version
  • stale_replica_version
  • fresh_replica_version
  • repair_writeback_state
  • coordinator_trying_read_repair

Reusable frame mapping #

  • truth locus -> coordinator_returned_version
  • replica/derived state -> stale_replica_version, fresh_replica_version
  • durability/freshness boundary -> causally newer version already known by coordinator
  • visibility -> read-returned value
  • actor trying to write/read/apply -> coordinator_trying_read_repair

Trace #

time →

coordinator_returned_version  D4 ------------- D4 ------------- D4
stale_replica_version         D2 ------------- D2 ------------- D4
fresh_replica_version         D4 ------------- D4 ------------- D4
repair_writeback_state        none ----------- repairing ------ repaired
coordinator_trying_read_repair get()->{D4} --- writeback D4 --- done

What it teaches #

  • a read may reveal stale replicas
  • coordinator can repair stale copies while serving the read
  • visibility and repair are coupled

State loci #

  • authoritative:
    • freshest causally acceptable replica set
  • local/replica:
    • stale and fresh replicas holding different versions
  • cached/derived:
    • coordinator’s assembled read result
  • repair:
    • read-repair writeback from coordinator

Core invariant #

  • stale replicas discovered on a read must not remain authoritative if a causally newer version is already known

4. Storage Trace: Hinted Handoff Replica Durability #

This is a storage-support trace because the system temporarily stores a replica on the wrong node.

Core lines #

  • intended_replica_set
  • reachable_write_set
  • hinted_replica_state
  • durable_replica_count
  • coordinator_trying_handoff

Reusable frame mapping #

  • truth locus -> intended_replica_set
  • replica/derived state -> reachable_write_set, hinted_replica_state
  • durability boundary -> durable_replica_count
  • visibility -> eventual intended replica durability after handoff
  • actor trying to write/read/apply -> coordinator_trying_handoff

Trace #

time →

intended_replica_set       {A,B,C} ---------- {A,B,C} ---------- {A,B,C}
reachable_write_set        {B,C,D} ---------- {B,C,D} ---------- {A,B,C}
hinted_replica_state       none ------------- D holds hint(A) --- delivered to A
durable_replica_count      2+hint ----------- 2+hint ----------- 3 intended
coordinator_trying_handoff put via D -------- retain hint ------ transfer then delete

What it teaches #

  • sloppy quorum changes where durability temporarily lives
  • the hinted copy is part of the durability story until returned
  • replica locality and durability locality can diverge

State loci #

  • authoritative:
    • intended replica set for the key
  • local/replica:
    • reachable nodes including hint holder
  • cached/derived:
    • hint metadata showing intended recipient
  • repair:
    • hinted handoff transfer path

Core invariant #

  • a hinted replica must preserve durability until the intended replica safely receives the object

5. Storage Trace: Merkle Anti-Entropy #

This is Dynamo’s background replica synchronization substrate.

Core lines #

  • range_merkle_root_A
  • range_merkle_root_B
  • divergence_scope
  • repair_transfer_state
  • actor_trying_anti_entropy

Reusable frame mapping #

  • truth locus -> converged key-range contents summarized by Merkle roots
  • replica/derived state -> range_merkle_root_A, range_merkle_root_B
  • durability/freshness boundary -> anti-entropy convergence over divergent scope
  • visibility -> synchronized replica range
  • actor trying to write/read/apply -> actor_trying_anti_entropy

Trace #

time →

range_merkle_root_A       h100 ------------ h100 ------------- h101
range_merkle_root_B       h100 ------------ h095 mismatch ---- h101
divergence_scope          none ------------ subtree/key range - none
repair_transfer_state     idle ------------ syncing ---------- synchronized
actor_trying_anti_entropy compare roots ---- descend tree ---- transfer missing keys

What it teaches #

  • anti-entropy is a range-level repair process over storage truth
  • Merkle roots compress the comparison frontier
  • repair targets only divergent subtrees/keys

State loci #

  • authoritative:
    • actual key-range contents on each replica
  • local/replica:
    • each node’s Merkle tree for the range
  • cached/derived:
    • exchanged root/subtree hashes
  • repair:
    • anti-entropy subtree walk and key transfer

Core invariant #

  • replicas for a key range must eventually converge by transferring only the divergent keys or subranges

6. Process Trace: Put Coordination Under Sloppy Quorum #

This is the main Dynamo process trace.

Core lines #

  • request_state
  • coordinator_node
  • reachable_preference_nodes
  • write_ack_progress
  • actor_trying_to_coordinate

Reusable frame mapping #

  • state -> request_state
  • owner/view -> coordinator_node, reachable_preference_nodes
  • monotonic marker -> write_ack_progress
  • validity boundary -> configured W threshold under current reachable set
  • actor trying to advance -> actor_trying_to_coordinate

Trace #

time →

request_state              RECEIVED -------- FORWARDED -------- ACKED ---------- REPAIR_PENDING
coordinator_node           Sx -------------- Sx -------------- Sx ------------- Sx
reachable_preference_nodes {A,B,D} --------- {A,B,D} --------- W satisfied ---- hinted handoff left
write_ack_progress         0/2 ------------- 1/2 ------------ 2/2 ------------ 2/2 + hint
actor_trying_to_coordinate accept put ------ send to replicas - reply success -- schedule handoff

What it teaches #

  • the coordinator is a process role, not the long-term truth locus
  • write success is tied to W acknowledgements, not necessarily the full intended replica set
  • sloppy quorum leaves deferred repair work behind

State loci #

  • authoritative:
    • request outcome as determined by coordinator plus replica acknowledgements
  • local/execution:
    • coordinator node handling the put
  • cached/derived:
    • reachable preference-list view during the request
  • repair:
    • hinted handoff / later replica synchronization

Core invariant #

  • a put must not be acknowledged as successful before the configured write threshold is met

7. Process Trace: Membership View And Preference List #

Dynamo relies on gossip-based membership and eventually consistent routing views.

Core lines #

  • member_set
  • membership_epoch_or_history_tip
  • preference_list_for_key
  • staleness_boundary
  • actor_trying_to_route_request

Reusable frame mapping #

  • state -> local routing/membership state for a key
  • owner/view -> member_set, preference_list_for_key
  • monotonic marker -> membership_epoch_or_history_tip
  • validity boundary -> staleness_boundary
  • actor trying to advance -> actor_trying_to_route_request

Trace #

time →

member_set                 {A,B,C,D} -------- {A,B,C,D,E} ------ {A,B,C,D,E}
membership_history_tip     m40 -------------- m41 -------------- m41
preference_list_for_key    [A,B,C] ---------- [A,B,E] ---------- [A,B,E]
staleness_boundary         fresh ------------ stale at client --- refreshed
actor_trying_to_route_request client->A ----- client uses old C -- client refreshes

What it teaches #

  • membership and routing views can lag
  • clients may briefly route using stale membership
  • preference list is a derived control view from membership/ring state

State loci #

  • authoritative:
    • persisted membership change history across nodes
  • local/execution:
    • client or node using its current preference-list view
  • cached/derived:
    • local routing table / preference list cache
  • repair:
    • membership refresh / gossip reconciliation

Core invariant #

  • request routing must eventually converge to the current preference list, and stale routing views must be refreshable

8. Process Trace: Hinted Handoff Workflow #

This is a good process-shaped Dynamo workflow.

Core lines #

  • hint_state
  • holder_node
  • intended_recipient
  • delivery_deadline_or_retry
  • actor_trying_to_transfer_hint

Reusable frame mapping #

  • state -> hint_state
  • owner/view -> holder_node, intended_recipient
  • monotonic marker -> hint lifecycle progression
  • validity boundary -> delivery_deadline_or_retry
  • actor trying to advance -> actor_trying_to_transfer_hint

Trace #

time →

hint_state                  CREATED -------- STORED ---------- IN_TRANSFER ----- CLEARED
holder_node                 D -------------- D --------------- D -------------- none
intended_recipient          A -------------- A --------------- A -------------- A
delivery_deadline_or_retry  t10 ------------ t20 ------------ retry now ------ -
actor_trying_to_transfer_hint none --------- wait for A ------ send replica ---- delete hint

What it teaches #

  • hinted handoff is a deferred repair workflow over storage truth
  • the hint has its own lifecycle
  • correct deletion happens only after successful transfer

State loci #

  • authoritative:
    • hint state on the holder until transfer completes
  • local/execution:
    • node currently storing and sending the hint
  • cached/derived:
    • intended-recipient metadata and retry state
  • repair:
    • handoff retry loop until recipient accepts

Core invariant #

  • a hint must not be dropped before the intended replica safely receives the data

9. Process Trace: Gossip Membership Reconciliation #

This is the main decentralized process substrate.

Core lines #

  • local_membership_history
  • peer_membership_history
  • reconciled_history_tip
  • next_gossip_tick
  • actor_trying_to_gossip

Reusable frame mapping #

  • state -> local reconciliation progress between peers
  • owner/view -> local_membership_history, peer_membership_history
  • monotonic marker -> reconciled_history_tip
  • validity boundary -> next_gossip_tick
  • actor trying to advance -> actor_trying_to_gossip

Trace #

time →

local_membership_history   m40 ------------- m40,m41 --------- m41
peer_membership_history    m40 ------------- m40 ------------ m40,m41
reconciled_history_tip     m40 ------------- reconciling ----- m41
next_gossip_tick           t1 -------------- t2 ------------- t3
actor_trying_to_gossip     A<->B ---------- exchange -------- merge histories

What it teaches #

  • membership is maintained as propagated history
  • nodes reconcile histories, not just one current scalar bit
  • process truth is eventually converged through repeated pairwise sync

State loci #

  • authoritative:
    • distributed merged membership history
  • local/execution:
    • each node’s current local history view
  • cached/derived:
    • peer snapshot exchanged during gossip
  • repair:
    • periodic gossip reconciliation

Core invariant #

  • membership change histories must merge monotonically and must not lose previously observed changes

Minimum Dynamo Trace Set #

If you only want the representative Dynamo traces, drill these:

  • storage:

    • Replicated Key Version Under Quorum
    • Divergent Versions And Vector Clock Causality
    • Hinted Handoff Replica Durability
    • Merkle Anti-Entropy
  • process:

    • Put Coordination Under Sloppy Quorum
    • Hinted Handoff Workflow
    • Membership View And Preference List

That is enough to capture the main Dynamo shapes.