Pub/Sub Notification System Analysis Note
Table of Contents
Pub/Sub Notification System Analysis Note #
This note captures the full step-by-step analysis for a fanout-style pub/sub notification system: topic publication, subscriber relationship state, delivery fanout, retry/ack lifecycle, and shard ownership.
Step 1 — Normalize #
Assume the baseline prompt is:
- design a pub/sub notification system
- publishers publish events to topics
- many subscribers receive the event
- fanout is the core model
- subscribers may be online/offline
- system should scale across nodes
Normalize into state-affecting paths.
| Requirement | Actor | Operation | State touched | Priority |
|---|---|---|---|---|
| Publisher publishes notification to topic | Client | append event | S1create targetPublishedEvent | C1 |
| Subscriber registers subscription to topic | Client | append event | S1create targetSubscriptionEdge | C1 |
| Subscriber unsubscribes from topic | Client | state transition | S1update targetSubscriptionEdge | C1 |
| System fans out published event to subscribers | System | async process | S1hidden write targetDeliveryRecord | C1 |
| Subscriber acknowledges delivered notification | Client | state transition | S1update targetDeliveryRecord | C1 |
| System retries undelivered notification | System | async process | S1hidden write targetDeliveryRecord | C1 |
| System routes topic/shard to current owner | System | read source | S1read source targetPartitionMap | C1 |
| System reassigns shard ownership after node failure | System | state transition | S1update targetPartitionOwnership | C1 |
| Client reads subscription/topic status | Client | read projection | S1read projection targetTopicStatusView | R2 |
Notes on normalization:
Important choices:
- publish is
append event- publication is an immutable fact
- subscribe is
append event- creates subscription relationship
- unsubscribe is
state transition- subscription lifecycle changes
- fanout is
async process- internal expansion from topic event to per-subscriber delivery
- ack/retry are delivery-lifecycle transitions
- routing/ownership are explicit because this is distributed infra
This system is not a replayable partition log by default. It is:
- topic publication
- subscriber relationship state
- delivery/fanout lifecycle
Step 2 — Critical Path Selection #
| Requirement | Priority class | Why |
|---|---|---|
| Publish notification to topic | C1 | publication truth must not be lost |
| Register subscription | C1 | subscriber-set truth determines fanout correctness |
| Unsubscribe from topic | C1 | stale subscriptions cause incorrect future delivery |
| Fan out event to subscribers | C1 | core delivery semantics depend on it |
| Acknowledge delivered notification | C1 | delivery completion state depends on it |
| Retry undelivered notification | C1 | reliability depends on it |
| Route topic/shard to owner | C1 | wrong routing breaks publication/fanout authority |
| Reassign shard ownership after node failure | C1 | failover must preserve publication and delivery correctness |
| Read subscription/topic status | R2 | operational only |
Baseline critical paths:
Main C1 paths:
P1publish notificationP2subscribeP3unsubscribeP4fanout to subscribersP5ack deliveryP6retry undelivered deliveryP7route to shard ownerP8reassign shard ownership
This system is driven by:
- immutable publication events
- authoritative subscription set
- per-subscriber delivery lifecycle
- fanout expansion correctness
- shard ownership/failover
Step 3 — Primary State Extraction #
For a fanout pub/sub system, the minimal primary state is published events, subscription edges, delivery records, and shard ownership/routing state.
| Candidate object label | Candidate source | Candidate needed for C1/R1? | Candidate decomposition action | Class | Primary? | Owner | Evolution | Scope kind | Scope value |
|---|---|---|---|---|---|---|---|---|---|
| PublishedEvent | direct noun | Yes | keep as candidate | event | Yes | service | append-only | instance | event_id |
| SubscriptionEdge | direct noun | Yes | keep as candidate | relationship | Yes | service | state machine | relation | subscriber_id + topic_id |
| DeliveryRecord | hidden write target | Yes | keep as candidate | process | Yes | service | state machine | relation | event_id + subscriber_id |
| PartitionOwnership | hidden write target | Yes | keep as candidate | process | Yes | service | state machine | instance | shard_id |
| PartitionMap | hidden write target | Yes | keep as candidate | entity | Yes | service | overwrite | collection | topic shards |
| TopicStatusView | derived read model | No | reject as UI artifact | projection | No | derived | overwrite | collection | topic_id |
| RetryQueue | hidden write target | No | reject as implementation choice | projection | No | derived | overwrite | collection | shard_id |
Important modeling choices:
PublishedEvent #
Primary because:
- publication is immutable truth
SubscriptionEdge #
Primary because:
- it defines who should receive future events
- subscribe/unsubscribe correctness depends on it
DeliveryRecord #
Primary because:
- fanout creates per-subscriber delivery lifecycle state
- retry/ack semantics depend on it
PartitionOwnership #
Needed because:
- one shard owner should orchestrate publication/fanout for that shard
PartitionMap #
Needed because:
- routing of publication and subscription operations must be consistent
Minimal strict primary set:
PublishedEventSubscriptionEdgeDeliveryRecordPartitionOwnershipPartitionMap
Step 4 — Hard Invariants #
For a fanout pub/sub system, the hard invariants are about immutable publication, authoritative subscription state, valid delivery lifecycle, and exclusive shard ownership.
| Path | Tier | Type | Invariant template | Invariant statement |
|---|---|---|---|---|
P1write pathPublish notification | HARD | uniqueness | uniqueness template | Key publish request_id maps to at most one logical outcome published event within topic scope. |
P2write pathSubscribe | HARD | uniqueness | uniqueness template | Key subscriber_id + topic_id maps to at most one logical outcome active subscription edge within subscription scope. |
P3write pathUnsubscribe | HARD | eligibility | eligibility template | Action unsubscribe is valid only if SubscriptionEdge(subscriber_id, topic_id) is ACTIVE at decision time. |
P4write pathFanout delivery | HARD | accounting | accounting template | For a committed PublishedEvent(event_id), DeliveryRecord set equals the authoritative active subscription set for the target topic at the chosen fanout-cut semantics. |
P5write pathAck delivery | HARD | eligibility | eligibility template | Action ack_delivery is valid only if DeliveryRecord(event_id, subscriber_id) is IN_FLIGHT or DELIVERED_PENDING_ACK and current receipt/attempt token matches at decision time. |
P6write pathRetry delivery | HARD | eligibility | eligibility template | Action retry_delivery is valid only if DeliveryRecord(event_id, subscriber_id) is retryable and current attempt/visibility state allows retry at decision time. |
P7write pathRoute to shard owner | HARD | uniqueness | uniqueness template | Key shard_id maps to at most one logical outcome current authoritative owner within shard_id. |
P8write pathReassign shard ownership | HARD | eligibility | eligibility template | Action reassign_shard is valid only if current owner failed or relinquished and candidate owner is eligible and sufficiently current on shard_id at decision time. |
What matters most:
1. Subscription edge uniqueness #
No duplicate active subscription edge for the same subscriber/topic scope.
2. Delivery accounting #
Fanout correctness depends on a well-defined subscription cut:
- “active at publish time” is the usual baseline
3. Per-subscriber delivery lifecycle #
Ack and retry must operate on current delivery attempt state, not stale tokens.
Step 5 — Execution Context #
For the fanout baseline:
| Field | Value | Why |
|---|---|---|
| Topology | single service distributed | one logical pub/sub platform across many nodes |
| Write coordination scope | per object scope | correctness is per subscription edge, per event-subscriber delivery, and per shard ownership scope |
| Read consistency target | strong only | safest baseline for subscription and delivery lifecycle correctness |
| Holder model | client | subscribers may temporarily hold in-flight delivery attempts |
| Compensation acceptable? | No | lost or duplicate delivery-state transitions cannot be repaired blindly after the fact |
Derived implications:
holder_may_crash = true- subscribers can fail while handling delivery attempts
cross_service_write = false- baseline keeps publication, subscriptions, delivery state, and routing within one logical service
bounded_staleness_allowed = false- strict baseline disallows stale subscription/delivery state on correctness-critical paths
cross_service_atomicity_required = false- no multi-service transaction required in baseline
exclusive_claim_required = true- a delivery attempt should have one active current holder/attempt token where ack/retry semantics depend on it
guarded_by_current_state = true- unsubscribe, ack, retry, and failover all depend on current state
This pushes us toward:
- one authoritative owner per shard
- immutable publications
- versioned subscription state
- per-delivery lifecycle state with guarded ack/retry
Step 6 — Deterministic Mechanism Selection #
6A. Write Shape #
| Path | Why | Write shape |
|---|---|---|
P1 publish notification | immutable topic event | append-only event |
P2 subscribe | create active relationship edge with uniqueness | append-only event or guarded state transition |
P3 unsubscribe | active edge becomes inactive | guarded state transition |
P4 fanout delivery | create per-subscriber delivery state from publication + subscription set | append-only event plus derived delivery creation |
P5 ack delivery | valid only for current delivery attempt state | guarded state transition |
P6 retry delivery | valid only for retryable current delivery state | guarded state transition |
P7 route to shard owner | one current authoritative owner per shard | exclusive claim |
P8 reassign shard ownership | valid only if current ownership/failover state allows it | guarded state transition |
6B. Base Mechanism #
| Path | Write shape | Base mechanism | Required companions |
|---|---|---|---|
P1 publish notification | append-only event | append log | idempotency key |
P2 subscribe | guarded state transition | CAS on version or unique edge key | subscription version |
P3 unsubscribe | guarded state transition | CAS on (state, version) | subscription version |
P4 fanout delivery | append-only event + derived delivery creation | append log + single-writer shard fanout | delivery dedup key |
P5 ack delivery | guarded state transition | CAS on (state, version) | attempt token / receipt handle |
P6 retry delivery | guarded state transition | leader-applied guarded transition | attempt token, retry/backoff policy |
P7 route to shard owner | exclusive claim | lease | fencing token, heartbeat |
P8 reassign shard ownership | guarded state transition | CAS on (state, version) | fencing token, shard catch-up check |
Why these fit:
Publish #
Publication is immutable, naturally append-only.
Subscription edge #
You can model subscribe as append-only if subscriptions are immutable edges with versioned replacement, but in the baseline practical system it is usually simpler to treat active/inactive subscription state as guarded current state.
Fanout #
Fanout is driven by immutable publication events and current subscription set. Delivery records themselves can be created append-only at first, but their lifecycle quickly becomes guarded state transitions for ack/retry.
Canonical substrate implied:
- sharded pub/sub service
- immutable publication stream per topic/shard
- authoritative subscription store
- per-subscriber delivery lifecycle state
- lease-backed shard ownership
Step 7 — Read Model / Source of Truth #
For a fanout pub/sub system, truth is mostly direct source state. Topic status views are projections.
| Concept | Truth | Read path | Rebuild path |
|---|---|---|---|
C1source conceptPublished event | PublishedEvent | read source directly | authoritative event store |
C2source conceptActive subscription set | SubscriptionEdge | read source directly | authoritative subscription store |
C3source conceptPer-subscriber delivery lifecycle | DeliveryRecord | read source directly | authoritative delivery state store |
C4source conceptShard ownership | PartitionOwnership | read source directly | authoritative ownership store |
C5source conceptShard routing map | PartitionMap | read source directly | authoritative routing metadata |
C6projection conceptTopic status / lag / dashboard | derived from publication, subscriptions, and deliveries | materialized view | recompute from primary state |
Important point:
For correctness-critical paths:
- fanout should read authoritative subscription state
- ack/retry should read authoritative delivery state
- shard ownership/routing should be authoritative
Step 8 — Failure Handling #
| Path | Retry | Competing writers | Crash after commit | Publish failure | Stale holder |
|---|---|---|---|---|---|
P1 publish notification | retry with publish request_id to avoid duplicate publication | competing publishes coexist | committed publication survives owner crash if replicated past commit point | publisher may retry safely with dedup key | n/a |
P2 subscribe | retry with unique subscription key/version | duplicate active edge prevented by uniqueness/versioning | committed subscribe survives crash if persisted | n/a | n/a |
P3 unsubscribe | retry with current subscription version | stale unsubscribe loses guarded transition | committed unsubscribe survives crash if persisted | n/a | n/a |
P4 fanout delivery | retry fanout safely using (event_id, subscriber_id) dedup | one delivery record per event-subscriber scope should be created for baseline semantics | committed delivery-record creation survives crash if replicated | downstream physical delivery may lag or be retried | n/a |
P5 ack delivery | retry with current attempt token | stale/wrong attempt token loses guarded transition | committed ack survives crash if persisted | n/a | stale subscriber attempt fenced by token |
P6 retry delivery | retry with current delivery state/version | only one valid retry transition should win current attempt scope | retry worker crash delays delivery; next scheduler retries | n/a | old attempt token becomes invalid after retry |
P7 route to shard owner | retry after refreshing shard map | only one valid owner should exist | refreshed map points to new owner | n/a | stale owner rejected by fencing token |
P8 reassign shard ownership | retry failover safely | only one reassignment wins current ownership state | promoted owner crash triggers later reassignment | n/a | old owner fenced and must not continue fanout state changes |
What matters most:
1. Delivery dedup scope #
For a given publication and subscriber:
- create at most one current delivery record for the baseline semantic cut
2. Attempt-token fencing #
A stale subscriber ack must not complete a newer retry attempt.
3. Publication retry dedup #
If the publish API promises idempotency:
- add request-id dedup on publication
Failure summary:
This system stays correct if:
- publication is durable/idempotent as needed
- subscription state is authoritative
- delivery-record creation is deduped by
(event_id, subscriber_id) - ack/retry are guarded by current attempt token/version
- stale shard owners and stale subscriber attempts are fenced
Step 9 — Scale Adjustments #
| Hotspot | Type | First response |
|---|---|---|
| hot topic with massive subscriber fanout | fan-out hotspot | shard topic fanout, batch delivery creation, and isolate hot topics |
| delivery retry backlog | write throughput hotspot | bucket retries by time and process incrementally with backoff |
| subscription churn on hot topics | contention hotspot | partition subscription index by topic and spread writers |
| shard-owner CPU saturation during fanout | write throughput hotspot | add more shards and rebalance hot topics |
| stale status/dashboard queries | read hotspot | keep them as derived views only |
| delivery-record growth | storage growth hotspot | compact/archive terminal delivery records after safe retention window |
What scales well:
This system scales by:
- partitioning topics
- partitioning subscription indexes
- letting each shard owner expand publication to delivery records for its shard scope
- batching downstream delivery
What fails first:
Usually:
- hot topic fanout explosion
- retry storms
- delivery-record storage growth
- subscription-index hot spots
Canonical design conclusion:
The mechanical outcome is:
- primary state:
PublishedEventSubscriptionEdgeDeliveryRecordPartitionOwnershipPartitionMap
- critical invariants:
- immutable publication truth
- authoritative subscription set
- one delivery record per event-subscriber baseline scope
- guarded ack/retry with attempt fencing
- fenced shard ownership
- mechanisms:
append log- guarded subscription and delivery transitions
- lease-backed shard ownership
- reads:
- direct source reads for subscriptions and delivery state
- projections only for status/metrics
Polished interview answer:
“I’d design the pub/sub notification system as a sharded fanout service. Publication is an immutable event, subscriptions are authoritative relationship state, and fanout creates per-subscriber delivery records from the current subscription set. Acknowledgment and retry are modeled as guarded transitions on delivery records, fenced by attempt tokens so stale retries or stale acks cannot corrupt delivery state. Hot topics are handled by sharding fanout work and keeping topic status views as derived projections only.”
Concrete Substrate #
I’ll choose a service-owned sharded fanout platform as the concrete baseline, because that fits the mechanics we derived:
- immutable publication events
- authoritative subscription edges
- per-subscriber delivery records
- lease-backed shard ownership
Concrete substrate:
- pub/sub service cluster
- topics split into shards
- one leader/owner per shard
- shard leader stores:
- publication stream for that shard
- subscription index for subscribers in that shard scope
- delivery records and retry schedule
- followers replicate shard mutations
- metadata layer tracks shard ownership and routing
Concrete tech family:
- service in Go or Java
- durable publication storage with RocksDB or append-only segment files
- shard replication via Raft or leader-follower replication
- metadata/control via etcd or internal quorum
- retry scheduling via shard-local time buckets or min-heaps
Operation Layer #
1. Publish notification #
API
Publish(topic_id, payload, request_id?)
Initiator
- publisher/client
Entry point
- gateway or any pub/sub node
Authoritative decider
- current leader for target topic shard
Precondition
- valid shard routing
- optional dedup request id if publisher idempotency is required
Transition
- append
PublishedEvent - enqueue fanout work for active subscribers in that shard scope
Response
{event_id}
Failure cases
- stale routing -> retry with updated shard map
- response loss -> duplicate publish unless
request_iddedup is used
2. Subscribe #
API
Subscribe(subscriber_id, topic_id, expected_version?)
Initiator
- subscriber/client
Entry point
- gateway or any node
Authoritative decider
- subscription owner/shard leader
Precondition
- no conflicting active subscription edge or expected version matches
Transition
- create or activate
SubscriptionEdge
Response
{subscription_version}
3. Unsubscribe #
API
Unsubscribe(subscriber_id, topic_id, expected_version?)
Initiator
- subscriber/client
Entry point
- gateway or any node
Authoritative decider
- subscription owner/shard leader
Precondition
- current subscription is active and version matches if supplied
Transition
- mark
SubscriptionEdgeinactive
Response
{subscription_version}
4. Fanout delivery creation #
API
- internal fanout worker flow
Initiator
- system/shard leader
Entry point
- shard leader
Authoritative decider
- shard leader
Precondition
- committed publication exists
- authoritative active subscription set available
Transition
- create
DeliveryRecord(event_id, subscriber_id)for each matching subscriber - set initial delivery state
Response
- internal success
Failure cases
- duplicate fanout work deduped by
(event_id, subscriber_id)
5. Ack delivery #
API
AckDelivery(event_id, subscriber_id, attempt_token)
Initiator
- subscriber/client
Entry point
- delivery endpoint / any node
Authoritative decider
- delivery-record owner/shard leader
Precondition
- delivery record is in ackable state
- current attempt token matches
Transition
- mark delivery terminal/acked
Response
- success or stale-token failure
6. Retry delivery #
API
- internal retry scheduler flow
Initiator
- system
Entry point
- shard leader
Authoritative decider
- shard leader
Precondition
- delivery record is retryable
- current state/version allows retry
Transition
- advance delivery attempt
- assign new attempt token
- reschedule actual delivery
Response
- internal success
Entry Point vs Decider vs Responder #
| Path | Entry point | Authoritative decider | Physical responder | Logical responder |
|---|---|---|---|---|
Publish | gateway / any node | shard leader | leader or front node | pub/sub service |
Subscribe | gateway / any node | subscription owner/shard leader | leader or front node | pub/sub service |
Unsubscribe | gateway / any node | subscription owner/shard leader | leader or front node | pub/sub service |
| fanout creation | shard leader | shard leader | internal | pub/sub service |
AckDelivery | delivery endpoint / any node | delivery owner/shard leader | leader or front node | pub/sub service |
| retry scheduler | shard leader | shard leader | internal | pub/sub service |
| shard failover | follower / coordination layer | shard quorum / lease store | new leader / control plane | pub/sub service |
Concrete HLD #
Main components:
- gateway/router
- resolves topic shard and forwards publication/subscription requests
- shard leader
- authoritative owner for publication fanout and delivery lifecycle in that shard
- subscription index
- authoritative subscription edges keyed by topic/subscriber scope
- delivery-state store
- per-event, per-subscriber delivery lifecycle
- retry scheduler
- shard-local retry timing and backoff
- metadata/control service
- tracks shard ownership and routing
Concrete Technology Realizations #
Stronger infra-native answer #
- Go or Java pub/sub service
- RocksDB or append-only segment files for publications and delivery state
- Raft or leader-follower replication per shard
- etcd or internal metadata quorum for shard ownership/routing
Short interview version #
“I’d build the fanout pub/sub platform as a sharded service. Publication is immutable, subscription edges are authoritative relationship state, and the shard leader expands each publication into per-subscriber delivery records. Ack and retry are guarded transitions on those delivery records, fenced by attempt tokens, and hot topics are handled by sharding fanout and keeping status/metrics as derived views.”