Distributed Lock Service (ZooKeeper / etcd-class) #

This note models a distributed lock service where clients acquire, renew, release, and observe locks safely across many nodes, with lease semantics, fencing tokens, and crash recovery.

Step 1 - Normalize #

Assume the baseline prompt is:

design a distributed lock service
clients acquire and release named locks
locks should auto-expire if holder crashes
clients may watch lock ownership changes
system scales across many locks and clients

Normalize into state-affecting paths.

Requirement	Actor	Operation	State touched	Priority
Client acquires lock	Client	state transition	`S1` `update target` `LockState`	C1
Client renews lock lease	Client	state transition	`S1` `update target` `LockState`	C1
Client releases lock	Client	state transition	`S1` `update target` `LockState`	C1
System expires stale lock	System	async process	`S1` `hidden write target` `LockState`	C1
Client reads current lock holder	Client	read source	`S1` `read source target` `LockState`	R1
Client registers watch on lock	Client	append event	`S1` `create target` `LockWatchRegistration`	R1
System emits lock-change watch event	System	async process	`S1` `hidden write target` `LockWatchEvent`	R1
System routes lock key to current owner/leader	System	read source	`S1` `read source target` `PartitionMap`	C1
System reassigns shard ownership after node failure	System	state transition	`S1` `update target` `PartitionOwnership`	C1

Notes on normalization #

Important choices:

acquire/renew/release are state transition
- lock lifecycle changes over time
stale expiry is explicit
- crash recovery is central correctness logic
current lock holder read is a source read
watch registration and watch events are distinct from lock truth

This system is fundamentally:

exclusive claim + lease + fencing

not:

generic config storage
queue delivery

Step 2 - Critical Path Selection #

Requirement	Priority class	Why
Acquire lock	C1	duplicate holders break correctness immediately
Renew lock lease	C1	stale or lost renewal changes holder validity
Release lock	C1	lock handoff depends on correct release
Expire stale lock	C1	crash recovery and stale-holder cleanup are core correctness
Read current lock holder	R1	important serving path for clients
Register / deliver watch	R1	useful but downstream of correct lock truth
Route lock key to shard owner	C1	wrong routing can split lock truth
Reassign shard ownership	C1	failover must preserve exclusivity and fencing

Baseline critical paths #

Main C1 paths:

P1 acquire lock
P2 renew lock
P3 release lock
P4 expire stale lock
P5 route to shard owner
P6 reassign shard ownership

Main R1 paths:

P7 read lock state
P8 watch registration and delivery

This design is driven by:

one valid holder at a time
lease expiry on crash
fenced transitions so stale holders cannot act

Step 3 - Primary State Extraction #

For a distributed lock service, the minimal primary state is the current lock lifecycle, client/session lifecycle, and routing/ownership state.

Candidate object label	Candidate source	Candidate needed for C1/R1?	Candidate decomposition action	Class	Primary?	Owner	Evolution	Scope kind	Scope value
LockState	direct noun	Yes	keep as candidate	process	Yes	service	state machine	instance	lock_key
ClientSession	hidden write target	Yes	keep as candidate	process	Yes	service	state machine	instance	session_id
LockWatchRegistration	direct noun	Yes	keep as candidate	relationship	Yes	service	append-only	relation	client_id + lock_key
LockWatchEvent	hidden write target	No	keep as candidate	event	No	derived	append-only	collection	lock_key
PartitionOwnership	hidden write target	Yes	keep as candidate	process	Yes	service	state machine	instance	shard_id
PartitionMap	hidden write target	Yes	keep as candidate	entity	Yes	service	overwrite	collection	lock shards
LockStatusView	derived read model	No	reject as UI artifact	projection	No	derived	overwrite	collection	tenant or shard

Important modeling choices #

`LockState` #

This is the central correctness object.

Likely fields:

lock_key
holder_session_id
holder_client_id
epoch or fencing token
expiry
state

States:

FREE
HELD
EXPIRED

`ClientSession` #

Primary because:

real lock services usually tie lock validity to a session/lease lifecycle
if the session dies, held locks must eventually expire

`LockWatchRegistration` #

Kept explicit because:

watches are part of the product surface
registration may be tied to current session

Minimal strict primary set #

The strongest minimal set is:

LockState
ClientSession
PartitionOwnership
PartitionMap

With:

LockWatchRegistration as an optional but useful explicit primary object

Step 4 - Hard Invariants #

For a ZooKeeper/etcd-class lock service, the hard invariants are about one valid holder per lock, valid renew/release only by the current holder, and safe expiry plus fencing.

Path	Tier	Type	Invariant statement
`P1` acquire lock	HARD	uniqueness	Key `lock_key` maps to at most one logical outcome `current valid lock holder` within lock scope.
`P1` acquire lock	HARD	eligibility	Action `acquire_lock` is valid only if current `LockState(lock_key)` is acquirable and current session is active at decision time.
`P2` renew lock	HARD	eligibility	Action `renew_lock` is valid only if `LockState(lock_key)` is currently held by the same session and epoch/token matches at decision time.
`P3` release lock	HARD	eligibility	Action `release_lock` is valid only if `LockState(lock_key)` is currently held by the same session and epoch/token matches at decision time.
`P4` expire stale lock	HARD	eligibility	Action `expire_lock` is valid only if current `LockState(lock_key)` is still held, expiry has passed, and epoch/token is unchanged at decision time.
`P5` route to shard owner	HARD	uniqueness	Key `shard_id` maps to at most one logical outcome `current authoritative owner` within `shard_id`.
`P6` reassign shard ownership	HARD	eligibility	Action `reassign_shard` is valid only if `current owner is failed or relinquished and candidate owner is eligible and sufficiently current` on `shard_id` at decision time.
`P7` read lock state	HARD	freshness	Read path reflects authoritative lock and session state within configured consistency bound.
`P8` watch delivery	SOFT	freshness	Watch stream reflects authoritative lock-state changes within watch propagation bound.

What matters most #

1. One valid holder per lock #

This is the primary correctness rule.

2. Renew/release are fenced #

Only the current holder with the current epoch/token may extend or release the lock.

3. Expiry must be revalidated #

A timeout worker cannot blindly expire a lock without checking that the lock state is still unchanged.

4. Watch delivery is secondary to lock truth #

Watches are important, but source truth is the lock/session state.

Step 5 - Execution Context #

For the strict baseline distributed lock service:

Field	Value	Why
Topology	single service distributed	one logical lock service spread across many nodes
Write coordination scope	per object scope	correctness is per `lock_key` and per shard ownership scope
Read consistency target	strong only	stale lock reads can break mutual exclusion
Holder model	client	clients temporarily hold locks
Compensation acceptable?	No	duplicate lock holders cannot be repaired later safely

Derived implications #

holder_may_crash = true
- clients can fail while holding locks
cross_service_write = false
- baseline keeps lock, session, and ownership state within one logical service
bounded_staleness_allowed = false
- lock acquisition and validation need authoritative current state
cross_service_atomicity_required = false
- no multi-service transaction required in baseline
exclusive_claim_required = true
- mutual exclusion is the core product
guarded_by_current_state = true
- acquire, renew, release, and expiry all depend on current state

What this implies #

This pushes us toward:

one authoritative writer per lock shard
lease-backed lock ownership
session-linked expiry
fencing token/epoch for stale-holder protection

Step 6 - Deterministic Mechanism Selection #

Path	Write shape	Base mechanism	Required companions
`P1` acquire lock	exclusive claim	lease	epoch/fencing token, session heartbeat
`P2` renew lock	guarded state transition	CAS on `(state, version)`	epoch/fencing token
`P3` release lock	guarded state transition	CAS on `(state, version)`	epoch/fencing token
`P4` expire stale lock	guarded state transition	leader-applied guarded transition	epoch/fencing token, timeout scan
`P5` route to shard owner	exclusive claim	lease	fencing token, heartbeat
`P6` reassign shard ownership	guarded state transition	CAS on `(state, version)`	fencing token, shard catch-up check

Why these fit #

Acquire #

This is the canonical exclusive-claim path:

one holder wins
ownership is temporary
expiry matters

Renew/release #

These are not blind writes. They must verify:

current holder matches
epoch/token matches
session is still active

So they are guarded transitions.

Expiry #

Expiry is also guarded:

only expired, unchanged lock state may be transitioned back to free

Canonical substrate implied #

The baseline now points to:

sharded lock service
one authoritative owner per shard
lease-like lock records
current session state
fenced renew/release/expiry transitions

Step 7 - Read Model / Source of Truth #

For a distributed lock service, truth is direct source state. Watches are derived.

Concept	Truth	Read path	Rebuild path
`C1` current lock holder and lease	`LockState`	read source directly	authoritative lock-state store
`C2` current client/session validity	`ClientSession`	read source directly	authoritative session store
`C3` shard ownership	`PartitionOwnership`	read source directly	authoritative ownership store
`C4` shard routing map	`PartitionMap`	read source directly	authoritative routing metadata
`C5` lock watch stream	lock/session state changes	materialized view	rebuild from authoritative lock-state transitions
`C6` status dashboards	derived from lock and session state	materialized view	recompute from authoritative state

Important point #

For the core semantics:

acquire/renew/release/read all use authoritative LockState
session validity is authoritative source truth
watch notifications are derived from committed state changes

Step 8 - Failure Handling #

Path	Retry	Competing writers	Crash after commit	Publish failure	Stale holder
`P1` acquire lock	retry safe; loser gets failure or later success	only one claimant should win current lock epoch	if acquire committed and client crashes, lock stays held until expiry	watch delivery may lag	stale holder fenced by epoch/token
`P2` renew lock	retry with current epoch/token	stale renew loses guarded transition	committed renewal survives crash if persisted	watch delivery may lag	old epoch rejected
`P3` release lock	retry with current epoch/token	stale release loses guarded transition	committed release survives crash if persisted	watch delivery may lag	old epoch rejected
`P4` expire stale lock	timeout scan retry safe	only one expiry transition should win for current expired state	scanner crash delays cleanup; next scan retries	watch delivery may lag	prior holder blocked once epoch/version advanced
`P5` route to shard owner	retry after refreshing shard map	only one valid owner should exist	if owner changed, refreshed map points to new owner	n/a	stale owner rejected by fencing token
`P6` reassign shard ownership	retry failover transition safely	only one reassignment wins current ownership state	promoted owner crash triggers later reassignment	n/a	old owner fenced and must not continue serving

What matters most #

1. Fencing token / epoch #

This is the core stale-holder defense.

Bad case:

client A acquires lock
A pauses or partitions
lock expires
client B acquires lock
A resumes and continues acting

Without fencing:

both can act as holder

So every external use of the lock should be coupled with the current epoch/token when possible.

2. Expiry is necessary but not sufficient #

Lease timeout reclaims the lock, but fencing is what prevents stale post-expiry actions.

3. Watch lag must not affect correctness #

Clients should treat watches as hints, not as the sole source of lock truth.

Step 9 - Scale Adjustments #

Hotspot	Type	First response
hot lock keys	contention hotspot	isolate hot locks, shard by lock key, or push users toward coarser-grained coordination alternatives
renewal traffic	write throughput hotspot	lengthen lease duration within acceptable failover bounds and batch/sessionize renewals
watch fanout	fan-out hotspot	derive watch delivery from committed state stream and decouple it from lock truth
strong reads on hot locks	read hotspot	keep reads on authoritative owner; avoid stale replica reads for correctness paths
failover churn	contention hotspot	stabilize shard leadership and avoid aggressive reassignment
session reconnect storms	contention hotspot	stagger reconnects and rate-limit watch/session restoration

What scales well #

A lock service scales only for relatively small coordination data.

It scales by:

sharding lock keys
keeping one authoritative owner per shard
making lock records small
treating watch delivery as secondary

What fails first #

Usually:

one or a few very hot locks
renewal storms
watch fanout spikes
clients relying on watches instead of source truth

Canonical design conclusion #

The mechanical outcome is:

primary state:
- LockState
- ClientSession
- PartitionOwnership
- PartitionMap
critical invariants:
- one valid holder per lock
- renew/release valid only for current holder epoch/token
- stale locks expire safely
- exclusive shard ownership for lock truth
mechanisms:
- lease
- guarded renew/release/expiry transitions
- exclusive claim for lock acquire
- fenced shard ownership
reads:
- direct authoritative reads for lock/session truth
- watches as derived notifications

Polished interview answer #

I’d design the lock service as a sharded strongly consistent system with one authoritative owner per lock shard. Acquiring a lock is an exclusive claim that creates a lease-backed LockState record with an epoch or fencing token. Renew, release, and timeout expiry are guarded transitions that only succeed if the current holder and epoch still match. Client sessions are tracked explicitly so crashed clients eventually lose their locks, and stale holders are prevented from acting by the fencing token even after they resume. Watches are derived from committed lock-state transitions, but correctness-critical reads come directly from authoritative lock and session state.

Concrete Substrate #

I’ll choose a sharded strongly consistent metadata service with lease-backed lock records as the concrete baseline, because it matches the mechanics we derived:

exclusive claim on acquire
guarded renew/release/expiry
session-linked lease validity
one owner per shard

Concrete tech family:

lock service in Go or Java
authoritative state in a replicated metadata store or service-owned Raft state machine
metadata/control:
- built-in Raft consensus per shard or a small etcd-like control layer

Each shard leader stores:

LockState(lock_key)
ClientSession(session_id)
watch registrations
timeout/expiry index

This is effectively the same substrate family as ZooKeeper/etcd-style coordination stores, with a narrower product surface focused on lock semantics.

Operation Layer #

1. Acquire lock #

API

AcquireLock(lock_key, session_id, ttl, expected_free=true)

Initiator

client

Entry point

gateway or any lock-service node

Authoritative decider

current shard leader for lock_key

Precondition

session is active
lock currently acquirable

Transition

LockState(lock_key) = HELD(holder_session_id, epoch, expiry)

Response

{acquired: true|false, epoch, expiry}

Failure cases

competing claimant loses
stale routing -> retry with updated shard map

2. Renew lock #

API

RenewLock(lock_key, session_id, epoch, ttl)

Initiator

client

Entry point

gateway or any node

Authoritative decider

shard leader

Precondition

lock is currently held by session_id
epoch matches current state

Transition

extend expiry

Response

{renewed: true|false, expiry}

3. Release lock #

API

ReleaseLock(lock_key, session_id, epoch)

Initiator

client

Entry point

gateway or any node

Authoritative decider

shard leader

Precondition

lock is currently held by session_id
epoch matches current state

Transition

HELD -> FREE

Response

{released: true|false}

4. Expire stale lock #

API

internal background process

Initiator

system

Entry point

shard leader

Authoritative decider

shard leader

Precondition

current time > expiry
lock state and epoch unchanged

Transition

HELD -> FREE with epoch advancement as needed

5. Register watch #

API

WatchLock(lock_key, from_version?)

Initiator

client

Entry point

gateway or watch endpoint

Authoritative decider

watch subsystem on committed lock-state stream

Precondition

session active

Transition

create transient or session-bound watch registration

Response

watch id / stream handle

Entry Point vs Decider vs Responder #

Path	Entry point	Authoritative decider	Physical responder	Logical responder
`AcquireLock`	gateway / any node	shard leader	leader or front node	lock service
`RenewLock`	gateway / any node	shard leader	leader or front node	lock service
`ReleaseLock`	gateway / any node	shard leader	leader or front node	lock service
expiry	shard leader	shard leader	internal	lock service
watch	watch endpoint	committed state stream / shard leader	watch-serving node	lock service
shard failover	follower / coordination layer	shard quorum / lease store	new leader / control plane	lock service

Concrete HLD #

Main components:

client gateway
- routes lock operations to current shard leader
shard leaders
- authoritative owners of lock and session state
- maintain expiry index
shard followers
- replicate committed lock/session state
watch service
- emits lock-change notifications from committed state transitions
metadata/control service
- tracks shard ownership and routing

Short Interview Version #

I’d build the distributed lock service as a sharded strongly consistent system with one authoritative owner per lock shard. Acquiring a lock is an exclusive claim that creates a lease-backed LockState record with an epoch or fencing token. Renew, release, and timeout expiry are guarded transitions that only succeed if the current holder and epoch still match. Client sessions are tracked explicitly so crashed clients eventually lose their locks, and stale holders are prevented from acting by the fencing token even after they resume. Watches are derived from committed lock-state transitions, but correctness-critical reads come directly from authoritative lock and session state.

Distributed Lock Service (ZooKeeper / etcd-class) #

Step 1 - Normalize #

Notes on normalization #

Step 2 - Critical Path Selection #

Baseline critical paths #

Step 3 - Primary State Extraction #

Important modeling choices #

LockState #

ClientSession #

LockWatchRegistration #

Minimal strict primary set #

Step 4 - Hard Invariants #

What matters most #

1. One valid holder per lock #

2. Renew/release are fenced #

3. Expiry must be revalidated #

4. Watch delivery is secondary to lock truth #

Step 5 - Execution Context #

Derived implications #

What this implies #

Step 6 - Deterministic Mechanism Selection #

Why these fit #

Acquire #

Renew/release #

Expiry #

Canonical substrate implied #

Step 7 - Read Model / Source of Truth #

Important point #

Step 8 - Failure Handling #

What matters most #

1. Fencing token / epoch #

2. Expiry is necessary but not sufficient #

3. Watch lag must not affect correctness #

Step 9 - Scale Adjustments #

What scales well #

What fails first #

Canonical design conclusion #

Polished interview answer #

Concrete Substrate #

Operation Layer #

1. Acquire lock #

2. Renew lock #

3. Release lock #

4. Expire stale lock #

5. Register watch #

Entry Point vs Decider vs Responder #

Concrete HLD #

Short Interview Version #

`LockState` #

`ClientSession` #

`LockWatchRegistration` #