Skip to main content
  1. System Design Components/

Pub/Sub Notification System Analysis Note

Pub/Sub Notification System Analysis Note #

This note captures the full step-by-step analysis for a fanout-style pub/sub notification system: topic publication, subscriber relationship state, delivery fanout, retry/ack lifecycle, and shard ownership.

Step 1 — Normalize #

Assume the baseline prompt is:

  • design a pub/sub notification system
  • publishers publish events to topics
  • many subscribers receive the event
  • fanout is the core model
  • subscribers may be online/offline
  • system should scale across nodes

Normalize into state-affecting paths.

RequirementActorOperationState touchedPriority
Publisher publishes notification to topicClientappend eventS1
create target
PublishedEvent
C1
Subscriber registers subscription to topicClientappend eventS1
create target
SubscriptionEdge
C1
Subscriber unsubscribes from topicClientstate transitionS1
update target
SubscriptionEdge
C1
System fans out published event to subscribersSystemasync processS1
hidden write target
DeliveryRecord
C1
Subscriber acknowledges delivered notificationClientstate transitionS1
update target
DeliveryRecord
C1
System retries undelivered notificationSystemasync processS1
hidden write target
DeliveryRecord
C1
System routes topic/shard to current ownerSystemread sourceS1
read source target
PartitionMap
C1
System reassigns shard ownership after node failureSystemstate transitionS1
update target
PartitionOwnership
C1
Client reads subscription/topic statusClientread projectionS1
read projection target
TopicStatusView
R2

Notes on normalization:

Important choices:

  • publish is append event
    • publication is an immutable fact
  • subscribe is append event
    • creates subscription relationship
  • unsubscribe is state transition
    • subscription lifecycle changes
  • fanout is async process
    • internal expansion from topic event to per-subscriber delivery
  • ack/retry are delivery-lifecycle transitions
  • routing/ownership are explicit because this is distributed infra

This system is not a replayable partition log by default. It is:

  • topic publication
  • subscriber relationship state
  • delivery/fanout lifecycle

Step 2 — Critical Path Selection #

RequirementPriority classWhy
Publish notification to topicC1publication truth must not be lost
Register subscriptionC1subscriber-set truth determines fanout correctness
Unsubscribe from topicC1stale subscriptions cause incorrect future delivery
Fan out event to subscribersC1core delivery semantics depend on it
Acknowledge delivered notificationC1delivery completion state depends on it
Retry undelivered notificationC1reliability depends on it
Route topic/shard to ownerC1wrong routing breaks publication/fanout authority
Reassign shard ownership after node failureC1failover must preserve publication and delivery correctness
Read subscription/topic statusR2operational only

Baseline critical paths:

Main C1 paths:

  • P1 publish notification
  • P2 subscribe
  • P3 unsubscribe
  • P4 fanout to subscribers
  • P5 ack delivery
  • P6 retry undelivered delivery
  • P7 route to shard owner
  • P8 reassign shard ownership

This system is driven by:

  • immutable publication events
  • authoritative subscription set
  • per-subscriber delivery lifecycle
  • fanout expansion correctness
  • shard ownership/failover

Step 3 — Primary State Extraction #

For a fanout pub/sub system, the minimal primary state is published events, subscription edges, delivery records, and shard ownership/routing state.

Candidate object labelCandidate sourceCandidate needed for C1/R1?Candidate decomposition actionClassPrimary?OwnerEvolutionScope kindScope value
PublishedEventdirect nounYeskeep as candidateeventYesserviceappend-onlyinstanceevent_id
SubscriptionEdgedirect nounYeskeep as candidaterelationshipYesservicestate machinerelationsubscriber_id + topic_id
DeliveryRecordhidden write targetYeskeep as candidateprocessYesservicestate machinerelationevent_id + subscriber_id
PartitionOwnershiphidden write targetYeskeep as candidateprocessYesservicestate machineinstanceshard_id
PartitionMaphidden write targetYeskeep as candidateentityYesserviceoverwritecollectiontopic shards
TopicStatusViewderived read modelNoreject as UI artifactprojectionNoderivedoverwritecollectiontopic_id
RetryQueuehidden write targetNoreject as implementation choiceprojectionNoderivedoverwritecollectionshard_id

Important modeling choices:

PublishedEvent #

Primary because:

  • publication is immutable truth

SubscriptionEdge #

Primary because:

  • it defines who should receive future events
  • subscribe/unsubscribe correctness depends on it

DeliveryRecord #

Primary because:

  • fanout creates per-subscriber delivery lifecycle state
  • retry/ack semantics depend on it

PartitionOwnership #

Needed because:

  • one shard owner should orchestrate publication/fanout for that shard

PartitionMap #

Needed because:

  • routing of publication and subscription operations must be consistent

Minimal strict primary set:

  • PublishedEvent
  • SubscriptionEdge
  • DeliveryRecord
  • PartitionOwnership
  • PartitionMap

Step 4 — Hard Invariants #

For a fanout pub/sub system, the hard invariants are about immutable publication, authoritative subscription state, valid delivery lifecycle, and exclusive shard ownership.

PathTierTypeInvariant templateInvariant statement
P1
write path
Publish notification
HARDuniquenessuniqueness templateKey publish request_id maps to at most one logical outcome published event within topic scope.
P2
write path
Subscribe
HARDuniquenessuniqueness templateKey subscriber_id + topic_id maps to at most one logical outcome active subscription edge within subscription scope.
P3
write path
Unsubscribe
HARDeligibilityeligibility templateAction unsubscribe is valid only if SubscriptionEdge(subscriber_id, topic_id) is ACTIVE at decision time.
P4
write path
Fanout delivery
HARDaccountingaccounting templateFor a committed PublishedEvent(event_id), DeliveryRecord set equals the authoritative active subscription set for the target topic at the chosen fanout-cut semantics.
P5
write path
Ack delivery
HARDeligibilityeligibility templateAction ack_delivery is valid only if DeliveryRecord(event_id, subscriber_id) is IN_FLIGHT or DELIVERED_PENDING_ACK and current receipt/attempt token matches at decision time.
P6
write path
Retry delivery
HARDeligibilityeligibility templateAction retry_delivery is valid only if DeliveryRecord(event_id, subscriber_id) is retryable and current attempt/visibility state allows retry at decision time.
P7
write path
Route to shard owner
HARDuniquenessuniqueness templateKey shard_id maps to at most one logical outcome current authoritative owner within shard_id.
P8
write path
Reassign shard ownership
HARDeligibilityeligibility templateAction reassign_shard is valid only if current owner failed or relinquished and candidate owner is eligible and sufficiently current on shard_id at decision time.

What matters most:

1. Subscription edge uniqueness #

No duplicate active subscription edge for the same subscriber/topic scope.

2. Delivery accounting #

Fanout correctness depends on a well-defined subscription cut:

  • “active at publish time” is the usual baseline

3. Per-subscriber delivery lifecycle #

Ack and retry must operate on current delivery attempt state, not stale tokens.

Step 5 — Execution Context #

For the fanout baseline:

FieldValueWhy
Topologysingle service distributedone logical pub/sub platform across many nodes
Write coordination scopeper object scopecorrectness is per subscription edge, per event-subscriber delivery, and per shard ownership scope
Read consistency targetstrong onlysafest baseline for subscription and delivery lifecycle correctness
Holder modelclientsubscribers may temporarily hold in-flight delivery attempts
Compensation acceptable?Nolost or duplicate delivery-state transitions cannot be repaired blindly after the fact

Derived implications:

  • holder_may_crash = true

    • subscribers can fail while handling delivery attempts
  • cross_service_write = false

    • baseline keeps publication, subscriptions, delivery state, and routing within one logical service
  • bounded_staleness_allowed = false

    • strict baseline disallows stale subscription/delivery state on correctness-critical paths
  • cross_service_atomicity_required = false

    • no multi-service transaction required in baseline
  • exclusive_claim_required = true

    • a delivery attempt should have one active current holder/attempt token where ack/retry semantics depend on it
  • guarded_by_current_state = true

    • unsubscribe, ack, retry, and failover all depend on current state

This pushes us toward:

  • one authoritative owner per shard
  • immutable publications
  • versioned subscription state
  • per-delivery lifecycle state with guarded ack/retry

Step 6 — Deterministic Mechanism Selection #

6A. Write Shape #

PathWhyWrite shape
P1 publish notificationimmutable topic eventappend-only event
P2 subscribecreate active relationship edge with uniquenessappend-only event or guarded state transition
P3 unsubscribeactive edge becomes inactiveguarded state transition
P4 fanout deliverycreate per-subscriber delivery state from publication + subscription setappend-only event plus derived delivery creation
P5 ack deliveryvalid only for current delivery attempt stateguarded state transition
P6 retry deliveryvalid only for retryable current delivery stateguarded state transition
P7 route to shard ownerone current authoritative owner per shardexclusive claim
P8 reassign shard ownershipvalid only if current ownership/failover state allows itguarded state transition

6B. Base Mechanism #

PathWrite shapeBase mechanismRequired companions
P1 publish notificationappend-only eventappend logidempotency key
P2 subscribeguarded state transitionCAS on version or unique edge keysubscription version
P3 unsubscribeguarded state transitionCAS on (state, version)subscription version
P4 fanout deliveryappend-only event + derived delivery creationappend log + single-writer shard fanoutdelivery dedup key
P5 ack deliveryguarded state transitionCAS on (state, version)attempt token / receipt handle
P6 retry deliveryguarded state transitionleader-applied guarded transitionattempt token, retry/backoff policy
P7 route to shard ownerexclusive claimleasefencing token, heartbeat
P8 reassign shard ownershipguarded state transitionCAS on (state, version)fencing token, shard catch-up check

Why these fit:

Publish #

Publication is immutable, naturally append-only.

Subscription edge #

You can model subscribe as append-only if subscriptions are immutable edges with versioned replacement, but in the baseline practical system it is usually simpler to treat active/inactive subscription state as guarded current state.

Fanout #

Fanout is driven by immutable publication events and current subscription set. Delivery records themselves can be created append-only at first, but their lifecycle quickly becomes guarded state transitions for ack/retry.

Canonical substrate implied:

  • sharded pub/sub service
  • immutable publication stream per topic/shard
  • authoritative subscription store
  • per-subscriber delivery lifecycle state
  • lease-backed shard ownership

Step 7 — Read Model / Source of Truth #

For a fanout pub/sub system, truth is mostly direct source state. Topic status views are projections.

ConceptTruthRead pathRebuild path
C1
source concept
Published event
PublishedEventread source directlyauthoritative event store
C2
source concept
Active subscription set
SubscriptionEdgeread source directlyauthoritative subscription store
C3
source concept
Per-subscriber delivery lifecycle
DeliveryRecordread source directlyauthoritative delivery state store
C4
source concept
Shard ownership
PartitionOwnershipread source directlyauthoritative ownership store
C5
source concept
Shard routing map
PartitionMapread source directlyauthoritative routing metadata
C6
projection concept
Topic status / lag / dashboard
derived from publication, subscriptions, and deliveriesmaterialized viewrecompute from primary state

Important point:

For correctness-critical paths:

  • fanout should read authoritative subscription state
  • ack/retry should read authoritative delivery state
  • shard ownership/routing should be authoritative

Step 8 — Failure Handling #

PathRetryCompeting writersCrash after commitPublish failureStale holder
P1 publish notificationretry with publish request_id to avoid duplicate publicationcompeting publishes coexistcommitted publication survives owner crash if replicated past commit pointpublisher may retry safely with dedup keyn/a
P2 subscriberetry with unique subscription key/versionduplicate active edge prevented by uniqueness/versioningcommitted subscribe survives crash if persistedn/an/a
P3 unsubscriberetry with current subscription versionstale unsubscribe loses guarded transitioncommitted unsubscribe survives crash if persistedn/an/a
P4 fanout deliveryretry fanout safely using (event_id, subscriber_id) dedupone delivery record per event-subscriber scope should be created for baseline semanticscommitted delivery-record creation survives crash if replicateddownstream physical delivery may lag or be retriedn/a
P5 ack deliveryretry with current attempt tokenstale/wrong attempt token loses guarded transitioncommitted ack survives crash if persistedn/astale subscriber attempt fenced by token
P6 retry deliveryretry with current delivery state/versiononly one valid retry transition should win current attempt scoperetry worker crash delays delivery; next scheduler retriesn/aold attempt token becomes invalid after retry
P7 route to shard ownerretry after refreshing shard maponly one valid owner should existrefreshed map points to new ownern/astale owner rejected by fencing token
P8 reassign shard ownershipretry failover safelyonly one reassignment wins current ownership statepromoted owner crash triggers later reassignmentn/aold owner fenced and must not continue fanout state changes

What matters most:

1. Delivery dedup scope #

For a given publication and subscriber:

  • create at most one current delivery record for the baseline semantic cut

2. Attempt-token fencing #

A stale subscriber ack must not complete a newer retry attempt.

3. Publication retry dedup #

If the publish API promises idempotency:

  • add request-id dedup on publication

Failure summary:

This system stays correct if:

  • publication is durable/idempotent as needed
  • subscription state is authoritative
  • delivery-record creation is deduped by (event_id, subscriber_id)
  • ack/retry are guarded by current attempt token/version
  • stale shard owners and stale subscriber attempts are fenced

Step 9 — Scale Adjustments #

HotspotTypeFirst response
hot topic with massive subscriber fanoutfan-out hotspotshard topic fanout, batch delivery creation, and isolate hot topics
delivery retry backlogwrite throughput hotspotbucket retries by time and process incrementally with backoff
subscription churn on hot topicscontention hotspotpartition subscription index by topic and spread writers
shard-owner CPU saturation during fanoutwrite throughput hotspotadd more shards and rebalance hot topics
stale status/dashboard queriesread hotspotkeep them as derived views only
delivery-record growthstorage growth hotspotcompact/archive terminal delivery records after safe retention window

What scales well:

This system scales by:

  • partitioning topics
  • partitioning subscription indexes
  • letting each shard owner expand publication to delivery records for its shard scope
  • batching downstream delivery

What fails first:

Usually:

  • hot topic fanout explosion
  • retry storms
  • delivery-record storage growth
  • subscription-index hot spots

Canonical design conclusion:

The mechanical outcome is:

  • primary state:
    • PublishedEvent
    • SubscriptionEdge
    • DeliveryRecord
    • PartitionOwnership
    • PartitionMap
  • critical invariants:
    • immutable publication truth
    • authoritative subscription set
    • one delivery record per event-subscriber baseline scope
    • guarded ack/retry with attempt fencing
    • fenced shard ownership
  • mechanisms:
    • append log
    • guarded subscription and delivery transitions
    • lease-backed shard ownership
  • reads:
    • direct source reads for subscriptions and delivery state
    • projections only for status/metrics

Polished interview answer:

“I’d design the pub/sub notification system as a sharded fanout service. Publication is an immutable event, subscriptions are authoritative relationship state, and fanout creates per-subscriber delivery records from the current subscription set. Acknowledgment and retry are modeled as guarded transitions on delivery records, fenced by attempt tokens so stale retries or stale acks cannot corrupt delivery state. Hot topics are handled by sharding fanout work and keeping topic status views as derived projections only.”

Concrete Substrate #

I’ll choose a service-owned sharded fanout platform as the concrete baseline, because that fits the mechanics we derived:

  • immutable publication events
  • authoritative subscription edges
  • per-subscriber delivery records
  • lease-backed shard ownership

Concrete substrate:

  • pub/sub service cluster
  • topics split into shards
  • one leader/owner per shard
  • shard leader stores:
    • publication stream for that shard
    • subscription index for subscribers in that shard scope
    • delivery records and retry schedule
  • followers replicate shard mutations
  • metadata layer tracks shard ownership and routing

Concrete tech family:

  • service in Go or Java
  • durable publication storage with RocksDB or append-only segment files
  • shard replication via Raft or leader-follower replication
  • metadata/control via etcd or internal quorum
  • retry scheduling via shard-local time buckets or min-heaps

Operation Layer #

1. Publish notification #

API

  • Publish(topic_id, payload, request_id?)

Initiator

  • publisher/client

Entry point

  • gateway or any pub/sub node

Authoritative decider

  • current leader for target topic shard

Precondition

  • valid shard routing
  • optional dedup request id if publisher idempotency is required

Transition

  • append PublishedEvent
  • enqueue fanout work for active subscribers in that shard scope

Response

  • {event_id}

Failure cases

  • stale routing -> retry with updated shard map
  • response loss -> duplicate publish unless request_id dedup is used

2. Subscribe #

API

  • Subscribe(subscriber_id, topic_id, expected_version?)

Initiator

  • subscriber/client

Entry point

  • gateway or any node

Authoritative decider

  • subscription owner/shard leader

Precondition

  • no conflicting active subscription edge or expected version matches

Transition

  • create or activate SubscriptionEdge

Response

  • {subscription_version}

3. Unsubscribe #

API

  • Unsubscribe(subscriber_id, topic_id, expected_version?)

Initiator

  • subscriber/client

Entry point

  • gateway or any node

Authoritative decider

  • subscription owner/shard leader

Precondition

  • current subscription is active and version matches if supplied

Transition

  • mark SubscriptionEdge inactive

Response

  • {subscription_version}

4. Fanout delivery creation #

API

  • internal fanout worker flow

Initiator

  • system/shard leader

Entry point

  • shard leader

Authoritative decider

  • shard leader

Precondition

  • committed publication exists
  • authoritative active subscription set available

Transition

  • create DeliveryRecord(event_id, subscriber_id) for each matching subscriber
  • set initial delivery state

Response

  • internal success

Failure cases

  • duplicate fanout work deduped by (event_id, subscriber_id)

5. Ack delivery #

API

  • AckDelivery(event_id, subscriber_id, attempt_token)

Initiator

  • subscriber/client

Entry point

  • delivery endpoint / any node

Authoritative decider

  • delivery-record owner/shard leader

Precondition

  • delivery record is in ackable state
  • current attempt token matches

Transition

  • mark delivery terminal/acked

Response

  • success or stale-token failure

6. Retry delivery #

API

  • internal retry scheduler flow

Initiator

  • system

Entry point

  • shard leader

Authoritative decider

  • shard leader

Precondition

  • delivery record is retryable
  • current state/version allows retry

Transition

  • advance delivery attempt
  • assign new attempt token
  • reschedule actual delivery

Response

  • internal success

Entry Point vs Decider vs Responder #

PathEntry pointAuthoritative deciderPhysical responderLogical responder
Publishgateway / any nodeshard leaderleader or front nodepub/sub service
Subscribegateway / any nodesubscription owner/shard leaderleader or front nodepub/sub service
Unsubscribegateway / any nodesubscription owner/shard leaderleader or front nodepub/sub service
fanout creationshard leadershard leaderinternalpub/sub service
AckDeliverydelivery endpoint / any nodedelivery owner/shard leaderleader or front nodepub/sub service
retry schedulershard leadershard leaderinternalpub/sub service
shard failoverfollower / coordination layershard quorum / lease storenew leader / control planepub/sub service

Concrete HLD #

Main components:

  • gateway/router
    • resolves topic shard and forwards publication/subscription requests
  • shard leader
    • authoritative owner for publication fanout and delivery lifecycle in that shard
  • subscription index
    • authoritative subscription edges keyed by topic/subscriber scope
  • delivery-state store
    • per-event, per-subscriber delivery lifecycle
  • retry scheduler
    • shard-local retry timing and backoff
  • metadata/control service
    • tracks shard ownership and routing

Concrete Technology Realizations #

Stronger infra-native answer #

  • Go or Java pub/sub service
  • RocksDB or append-only segment files for publications and delivery state
  • Raft or leader-follower replication per shard
  • etcd or internal metadata quorum for shard ownership/routing

Short interview version #

“I’d build the fanout pub/sub platform as a sharded service. Publication is immutable, subscription edges are authoritative relationship state, and the shard leader expands each publication into per-subscriber delivery records. Ack and retry are guarded transitions on those delivery records, fenced by attempt tokens, and hot topics are handled by sharding fanout and keeping status/metrics as derived views.”