Table of Contents

Pub/Sub Notification System Analysis Note #

This note captures the full step-by-step analysis for a fanout-style pub/sub notification system: topic publication, subscriber relationship state, delivery fanout, retry/ack lifecycle, and shard ownership.

Step 1 — Normalize #

Assume the baseline prompt is:

design a pub/sub notification system
publishers publish events to topics
many subscribers receive the event
fanout is the core model
subscribers may be online/offline
system should scale across nodes

Normalize into state-affecting paths.

Requirement	Actor	Operation	State touched	Priority
Publisher publishes notification to topic	Client	append event	`S1` `create target` `PublishedEvent`	C1
Subscriber registers subscription to topic	Client	append event	`S1` `create target` `SubscriptionEdge`	C1
Subscriber unsubscribes from topic	Client	state transition	`S1` `update target` `SubscriptionEdge`	C1
System fans out published event to subscribers	System	async process	`S1` `hidden write target` `DeliveryRecord`	C1
Subscriber acknowledges delivered notification	Client	state transition	`S1` `update target` `DeliveryRecord`	C1
System retries undelivered notification	System	async process	`S1` `hidden write target` `DeliveryRecord`	C1
System routes topic/shard to current owner	System	read source	`S1` `read source target` `PartitionMap`	C1
System reassigns shard ownership after node failure	System	state transition	`S1` `update target` `PartitionOwnership`	C1
Client reads subscription/topic status	Client	read projection	`S1` `read projection target` `TopicStatusView`	R2

Notes on normalization:

Important choices:

publish is append event
- publication is an immutable fact
subscribe is append event
- creates subscription relationship
unsubscribe is state transition
- subscription lifecycle changes
fanout is async process
- internal expansion from topic event to per-subscriber delivery
ack/retry are delivery-lifecycle transitions
routing/ownership are explicit because this is distributed infra

This system is not a replayable partition log by default. It is:

topic publication
subscriber relationship state
delivery/fanout lifecycle

Step 2 — Critical Path Selection #

Requirement	Priority class	Why
Publish notification to topic	C1	publication truth must not be lost
Register subscription	C1	subscriber-set truth determines fanout correctness
Unsubscribe from topic	C1	stale subscriptions cause incorrect future delivery
Fan out event to subscribers	C1	core delivery semantics depend on it
Acknowledge delivered notification	C1	delivery completion state depends on it
Retry undelivered notification	C1	reliability depends on it
Route topic/shard to owner	C1	wrong routing breaks publication/fanout authority
Reassign shard ownership after node failure	C1	failover must preserve publication and delivery correctness
Read subscription/topic status	R2	operational only

Baseline critical paths:

Main C1 paths:

P1 publish notification
P2 subscribe
P3 unsubscribe
P4 fanout to subscribers
P5 ack delivery
P6 retry undelivered delivery
P7 route to shard owner
P8 reassign shard ownership

This system is driven by:

immutable publication events
authoritative subscription set
per-subscriber delivery lifecycle
fanout expansion correctness
shard ownership/failover

Step 3 — Primary State Extraction #

For a fanout pub/sub system, the minimal primary state is published events, subscription edges, delivery records, and shard ownership/routing state.

Candidate object label	Candidate source	Candidate needed for C1/R1?	Candidate decomposition action	Class	Primary?	Owner	Evolution	Scope kind	Scope value
PublishedEvent	direct noun	Yes	keep as candidate	event	Yes	service	append-only	instance	event_id
SubscriptionEdge	direct noun	Yes	keep as candidate	relationship	Yes	service	state machine	relation	subscriber_id + topic_id
DeliveryRecord	hidden write target	Yes	keep as candidate	process	Yes	service	state machine	relation	event_id + subscriber_id
PartitionOwnership	hidden write target	Yes	keep as candidate	process	Yes	service	state machine	instance	shard_id
PartitionMap	hidden write target	Yes	keep as candidate	entity	Yes	service	overwrite	collection	topic shards
TopicStatusView	derived read model	No	reject as UI artifact	projection	No	derived	overwrite	collection	topic_id
RetryQueue	hidden write target	No	reject as implementation choice	projection	No	derived	overwrite	collection	shard_id

Important modeling choices:

`PublishedEvent` #

Primary because:

publication is immutable truth

`SubscriptionEdge` #

Primary because:

it defines who should receive future events
subscribe/unsubscribe correctness depends on it

`DeliveryRecord` #

Primary because:

fanout creates per-subscriber delivery lifecycle state
retry/ack semantics depend on it

`PartitionOwnership` #

Needed because:

one shard owner should orchestrate publication/fanout for that shard

`PartitionMap` #

Needed because:

routing of publication and subscription operations must be consistent

Minimal strict primary set:

PublishedEvent
SubscriptionEdge
DeliveryRecord
PartitionOwnership
PartitionMap

Step 4 — Hard Invariants #

For a fanout pub/sub system, the hard invariants are about immutable publication, authoritative subscription state, valid delivery lifecycle, and exclusive shard ownership.

Path	Tier	Type	Invariant template	Invariant statement
`P1` `write path` `Publish notification`	HARD	uniqueness	uniqueness template	Key `publish request_id` maps to at most one logical outcome `published event` within topic scope.
`P2` `write path` `Subscribe`	HARD	uniqueness	uniqueness template	Key `subscriber_id + topic_id` maps to at most one logical outcome `active subscription edge` within subscription scope.
`P3` `write path` `Unsubscribe`	HARD	eligibility	eligibility template	Action `unsubscribe` is valid only if `SubscriptionEdge(subscriber_id, topic_id) is ACTIVE` at decision time.
`P4` `write path` `Fanout delivery`	HARD	accounting	accounting template	For a committed `PublishedEvent(event_id)`, `DeliveryRecord` set equals the authoritative active subscription set for the target topic at the chosen fanout-cut semantics.
`P5` `write path` `Ack delivery`	HARD	eligibility	eligibility template	Action `ack_delivery` is valid only if `DeliveryRecord(event_id, subscriber_id) is IN_FLIGHT or DELIVERED_PENDING_ACK and current receipt/attempt token matches` at decision time.
`P6` `write path` `Retry delivery`	HARD	eligibility	eligibility template	Action `retry_delivery` is valid only if `DeliveryRecord(event_id, subscriber_id) is retryable and current attempt/visibility state allows retry` at decision time.
`P7` `write path` `Route to shard owner`	HARD	uniqueness	uniqueness template	Key `shard_id` maps to at most one logical outcome `current authoritative owner` within `shard_id`.
`P8` `write path` `Reassign shard ownership`	HARD	eligibility	eligibility template	Action `reassign_shard` is valid only if `current owner failed or relinquished and candidate owner is eligible and sufficiently current` on `shard_id` at decision time.

What matters most:

1. Subscription edge uniqueness #

No duplicate active subscription edge for the same subscriber/topic scope.

2. Delivery accounting #

Fanout correctness depends on a well-defined subscription cut:

“active at publish time” is the usual baseline

3. Per-subscriber delivery lifecycle #

Ack and retry must operate on current delivery attempt state, not stale tokens.

Step 5 — Execution Context #

For the fanout baseline:

Field	Value	Why
Topology	single service distributed	one logical pub/sub platform across many nodes
Write coordination scope	per object scope	correctness is per subscription edge, per event-subscriber delivery, and per shard ownership scope
Read consistency target	strong only	safest baseline for subscription and delivery lifecycle correctness
Holder model	client	subscribers may temporarily hold in-flight delivery attempts
Compensation acceptable?	No	lost or duplicate delivery-state transitions cannot be repaired blindly after the fact

Derived implications:

holder_may_crash = true
- subscribers can fail while handling delivery attempts
cross_service_write = false
- baseline keeps publication, subscriptions, delivery state, and routing within one logical service
bounded_staleness_allowed = false
- strict baseline disallows stale subscription/delivery state on correctness-critical paths
cross_service_atomicity_required = false
- no multi-service transaction required in baseline
exclusive_claim_required = true
- a delivery attempt should have one active current holder/attempt token where ack/retry semantics depend on it
guarded_by_current_state = true
- unsubscribe, ack, retry, and failover all depend on current state

This pushes us toward:

one authoritative owner per shard
immutable publications
versioned subscription state
per-delivery lifecycle state with guarded ack/retry

Step 6 — Deterministic Mechanism Selection #

6A. Write Shape #

Path	Why	Write shape
`P1` publish notification	immutable topic event	`append-only event`
`P2` subscribe	create active relationship edge with uniqueness	`append-only event` or `guarded state transition`
`P3` unsubscribe	active edge becomes inactive	`guarded state transition`
`P4` fanout delivery	create per-subscriber delivery state from publication + subscription set	`append-only event` plus derived delivery creation
`P5` ack delivery	valid only for current delivery attempt state	`guarded state transition`
`P6` retry delivery	valid only for retryable current delivery state	`guarded state transition`
`P7` route to shard owner	one current authoritative owner per shard	`exclusive claim`
`P8` reassign shard ownership	valid only if current ownership/failover state allows it	`guarded state transition`

6B. Base Mechanism #

Path	Write shape	Base mechanism	Required companions
`P1` publish notification	`append-only event`	`append log`	idempotency key
`P2` subscribe	`guarded state transition`	`CAS on version` or unique edge key	subscription version
`P3` unsubscribe	`guarded state transition`	`CAS on (state, version)`	subscription version
`P4` fanout delivery	`append-only event` + derived delivery creation	`append log` + single-writer shard fanout	delivery dedup key
`P5` ack delivery	`guarded state transition`	`CAS on (state, version)`	attempt token / receipt handle
`P6` retry delivery	`guarded state transition`	leader-applied guarded transition	attempt token, retry/backoff policy
`P7` route to shard owner	`exclusive claim`	`lease`	fencing token, heartbeat
`P8` reassign shard ownership	`guarded state transition`	`CAS on (state, version)`	fencing token, shard catch-up check

Why these fit:

Publish #

Publication is immutable, naturally append-only.

Subscription edge #

You can model subscribe as append-only if subscriptions are immutable edges with versioned replacement, but in the baseline practical system it is usually simpler to treat active/inactive subscription state as guarded current state.

Fanout #

Fanout is driven by immutable publication events and current subscription set. Delivery records themselves can be created append-only at first, but their lifecycle quickly becomes guarded state transitions for ack/retry.

Canonical substrate implied:

sharded pub/sub service
immutable publication stream per topic/shard
authoritative subscription store
per-subscriber delivery lifecycle state
lease-backed shard ownership

Step 7 — Read Model / Source of Truth #

For a fanout pub/sub system, truth is mostly direct source state. Topic status views are projections.

Concept	Truth	Read path	Rebuild path
`C1` `source concept` `Published event`	`PublishedEvent`	`read source directly`	authoritative event store
`C2` `source concept` `Active subscription set`	`SubscriptionEdge`	`read source directly`	authoritative subscription store
`C3` `source concept` `Per-subscriber delivery lifecycle`	`DeliveryRecord`	`read source directly`	authoritative delivery state store
`C4` `source concept` `Shard ownership`	`PartitionOwnership`	`read source directly`	authoritative ownership store
`C5` `source concept` `Shard routing map`	`PartitionMap`	`read source directly`	authoritative routing metadata
`C6` `projection concept` `Topic status / lag / dashboard`	derived from publication, subscriptions, and deliveries	`materialized view`	recompute from primary state

Important point:

For correctness-critical paths:

fanout should read authoritative subscription state
ack/retry should read authoritative delivery state
shard ownership/routing should be authoritative

Step 8 — Failure Handling #

Path	Retry	Competing writers	Crash after commit	Publish failure	Stale holder
`P1` publish notification	retry with publish `request_id` to avoid duplicate publication	competing publishes coexist	committed publication survives owner crash if replicated past commit point	publisher may retry safely with dedup key	n/a
`P2` subscribe	retry with unique subscription key/version	duplicate active edge prevented by uniqueness/versioning	committed subscribe survives crash if persisted	n/a	n/a
`P3` unsubscribe	retry with current subscription version	stale unsubscribe loses guarded transition	committed unsubscribe survives crash if persisted	n/a	n/a
`P4` fanout delivery	retry fanout safely using `(event_id, subscriber_id)` dedup	one delivery record per event-subscriber scope should be created for baseline semantics	committed delivery-record creation survives crash if replicated	downstream physical delivery may lag or be retried	n/a
`P5` ack delivery	retry with current attempt token	stale/wrong attempt token loses guarded transition	committed ack survives crash if persisted	n/a	stale subscriber attempt fenced by token
`P6` retry delivery	retry with current delivery state/version	only one valid retry transition should win current attempt scope	retry worker crash delays delivery; next scheduler retries	n/a	old attempt token becomes invalid after retry
`P7` route to shard owner	retry after refreshing shard map	only one valid owner should exist	refreshed map points to new owner	n/a	stale owner rejected by fencing token
`P8` reassign shard ownership	retry failover safely	only one reassignment wins current ownership state	promoted owner crash triggers later reassignment	n/a	old owner fenced and must not continue fanout state changes

What matters most:

1. Delivery dedup scope #

For a given publication and subscriber:

create at most one current delivery record for the baseline semantic cut

2. Attempt-token fencing #

A stale subscriber ack must not complete a newer retry attempt.

3. Publication retry dedup #

If the publish API promises idempotency:

add request-id dedup on publication

Failure summary:

This system stays correct if:

publication is durable/idempotent as needed
subscription state is authoritative
delivery-record creation is deduped by (event_id, subscriber_id)
ack/retry are guarded by current attempt token/version
stale shard owners and stale subscriber attempts are fenced

Step 9 — Scale Adjustments #

Hotspot	Type	First response
hot topic with massive subscriber fanout	fan-out hotspot	shard topic fanout, batch delivery creation, and isolate hot topics
delivery retry backlog	write throughput hotspot	bucket retries by time and process incrementally with backoff
subscription churn on hot topics	contention hotspot	partition subscription index by topic and spread writers
shard-owner CPU saturation during fanout	write throughput hotspot	add more shards and rebalance hot topics
stale status/dashboard queries	read hotspot	keep them as derived views only
delivery-record growth	storage growth hotspot	compact/archive terminal delivery records after safe retention window

What scales well:

This system scales by:

partitioning topics
partitioning subscription indexes
letting each shard owner expand publication to delivery records for its shard scope
batching downstream delivery

What fails first:

Usually:

hot topic fanout explosion
retry storms
delivery-record storage growth
subscription-index hot spots

Canonical design conclusion:

The mechanical outcome is:

primary state:
- PublishedEvent
- SubscriptionEdge
- DeliveryRecord
- PartitionOwnership
- PartitionMap
critical invariants:
- immutable publication truth
- authoritative subscription set
- one delivery record per event-subscriber baseline scope
- guarded ack/retry with attempt fencing
- fenced shard ownership
mechanisms:
- append log
- guarded subscription and delivery transitions
- lease-backed shard ownership
reads:
- direct source reads for subscriptions and delivery state
- projections only for status/metrics

Polished interview answer:

“I’d design the pub/sub notification system as a sharded fanout service. Publication is an immutable event, subscriptions are authoritative relationship state, and fanout creates per-subscriber delivery records from the current subscription set. Acknowledgment and retry are modeled as guarded transitions on delivery records, fenced by attempt tokens so stale retries or stale acks cannot corrupt delivery state. Hot topics are handled by sharding fanout work and keeping topic status views as derived projections only.”

Concrete Substrate #

I’ll choose a service-owned sharded fanout platform as the concrete baseline, because that fits the mechanics we derived:

immutable publication events
authoritative subscription edges
per-subscriber delivery records
lease-backed shard ownership

Concrete substrate:

pub/sub service cluster
topics split into shards
one leader/owner per shard
shard leader stores:
- publication stream for that shard
- subscription index for subscribers in that shard scope
- delivery records and retry schedule
followers replicate shard mutations
metadata layer tracks shard ownership and routing

Concrete tech family:

service in Go or Java
durable publication storage with RocksDB or append-only segment files
shard replication via Raft or leader-follower replication
metadata/control via etcd or internal quorum
retry scheduling via shard-local time buckets or min-heaps

Operation Layer #

1. Publish notification #

API

Publish(topic_id, payload, request_id?)

Initiator

publisher/client

Entry point

gateway or any pub/sub node

Authoritative decider

current leader for target topic shard

Precondition

valid shard routing
optional dedup request id if publisher idempotency is required

Transition

append PublishedEvent
enqueue fanout work for active subscribers in that shard scope

Response

{event_id}

Failure cases

stale routing -> retry with updated shard map
response loss -> duplicate publish unless request_id dedup is used

API

Subscribe(subscriber_id, topic_id, expected_version?)

Initiator

subscriber/client

Entry point

gateway or any node

Authoritative decider

subscription owner/shard leader

Precondition

no conflicting active subscription edge or expected version matches

Transition

create or activate SubscriptionEdge

Response

{subscription_version}

3. Unsubscribe #

API

Unsubscribe(subscriber_id, topic_id, expected_version?)

Initiator

subscriber/client

Entry point

gateway or any node

Authoritative decider

subscription owner/shard leader

Precondition

current subscription is active and version matches if supplied

Transition

mark SubscriptionEdge inactive

Response

{subscription_version}

4. Fanout delivery creation #

API

internal fanout worker flow

Initiator

system/shard leader

Entry point

shard leader

Authoritative decider

shard leader

Precondition

committed publication exists
authoritative active subscription set available

Transition

create DeliveryRecord(event_id, subscriber_id) for each matching subscriber
set initial delivery state

Response

internal success

Failure cases

duplicate fanout work deduped by (event_id, subscriber_id)

5. Ack delivery #

API

AckDelivery(event_id, subscriber_id, attempt_token)

Initiator

subscriber/client

Entry point

delivery endpoint / any node

Authoritative decider

delivery-record owner/shard leader

Precondition

delivery record is in ackable state
current attempt token matches

Transition

mark delivery terminal/acked

Response

success or stale-token failure

6. Retry delivery #

API

internal retry scheduler flow

Initiator

system

Entry point

shard leader

Authoritative decider

shard leader

Precondition

delivery record is retryable
current state/version allows retry

Transition

advance delivery attempt
assign new attempt token
reschedule actual delivery

Response

internal success

Entry Point vs Decider vs Responder #

Path	Entry point	Authoritative decider	Physical responder	Logical responder
`Publish`	gateway / any node	shard leader	leader or front node	pub/sub service
`Subscribe`	gateway / any node	subscription owner/shard leader	leader or front node	pub/sub service
`Unsubscribe`	gateway / any node	subscription owner/shard leader	leader or front node	pub/sub service
fanout creation	shard leader	shard leader	internal	pub/sub service
`AckDelivery`	delivery endpoint / any node	delivery owner/shard leader	leader or front node	pub/sub service
retry scheduler	shard leader	shard leader	internal	pub/sub service
shard failover	follower / coordination layer	shard quorum / lease store	new leader / control plane	pub/sub service

Concrete HLD #

Main components:

gateway/router
- resolves topic shard and forwards publication/subscription requests
shard leader
- authoritative owner for publication fanout and delivery lifecycle in that shard
subscription index
- authoritative subscription edges keyed by topic/subscriber scope
delivery-state store
- per-event, per-subscriber delivery lifecycle
retry scheduler
- shard-local retry timing and backoff
metadata/control service
- tracks shard ownership and routing

Concrete Technology Realizations #

Stronger infra-native answer #

Go or Java pub/sub service
RocksDB or append-only segment files for publications and delivery state
Raft or leader-follower replication per shard
etcd or internal metadata quorum for shard ownership/routing

Short interview version #

“I’d build the fanout pub/sub platform as a sharded service. Publication is immutable, subscription edges are authoritative relationship state, and the shard leader expands each publication into per-subscriber delivery records. Ack and retry are guarded transitions on those delivery records, fenced by attempt tokens, and hot topics are handled by sharding fanout and keeping status/metrics as derived views.”

Pub/Sub Notification System Analysis Note #

Step 1 — Normalize #

Step 2 — Critical Path Selection #

Step 3 — Primary State Extraction #

PublishedEvent #

SubscriptionEdge #

DeliveryRecord #

PartitionOwnership #

PartitionMap #

Step 4 — Hard Invariants #

1. Subscription edge uniqueness #

2. Delivery accounting #

3. Per-subscriber delivery lifecycle #

Step 5 — Execution Context #

Step 6 — Deterministic Mechanism Selection #

6A. Write Shape #

6B. Base Mechanism #

Publish #

Subscription edge #

Fanout #

Step 7 — Read Model / Source of Truth #

Step 8 — Failure Handling #

1. Delivery dedup scope #

2. Attempt-token fencing #

3. Publication retry dedup #

Step 9 — Scale Adjustments #

Concrete Substrate #

Operation Layer #

1. Publish notification #

2. Subscribe #

3. Unsubscribe #

4. Fanout delivery creation #

5. Ack delivery #

6. Retry delivery #

Entry Point vs Decider vs Responder #

Concrete HLD #

Concrete Technology Realizations #

Stronger infra-native answer #

Short interview version #

`PublishedEvent` #

`SubscriptionEdge` #

`DeliveryRecord` #

`PartitionOwnership` #

`PartitionMap` #