Table of Contents

Kafka-Class Event Streaming Platform Analysis Note #

This note captures the full step-by-step analysis for a Kafka-class event streaming platform: partitioned immutable log, consumer-group offset management, leader-based partition ownership, replication, and replayable consumption.

Step 1 — Normalize #

Assume the baseline prompt is:

design a Kafka-class event streaming platform
producers publish to topics
topics are partitioned
consumers read in order from partitions
consumers manage offsets/progress
system should support replay and scalable fanout

Normalize into state-affecting paths.

Requirement	Actor	Operation	State touched	Priority
Producer publishes event to topic partition	Client	append event	`S1` `create target` `PartitionLog`	C1
Consumer reads events from partition	Client	read source	`S1` `read source target` `PartitionLog`	R1
Consumer group commits offset/progress	Client	overwrite state	`S1` `update target` `ConsumerOffset`	C1
System routes topic partition to current leader	System	read source	`S1` `read source target` `PartitionMap`	C1
System reassigns partition leadership after node failure	System	state transition	`S1` `update target` `PartitionOwnership`	C1
System replicates partition log to followers	System	async process	`S1` `hidden write target` `ReplicaState`	C1
Consumer group rebalances partition assignment	System	async process	`S1` `hidden write target` `ConsumerAssignment`	C1
Client reads topic/lag metrics	Client	read projection	`S1` `read projection target` `TopicMetricsView`	R2

Notes on normalization:

Important choices:

publish is append event
- event log is immutable
consumer read is read source
- consumers read authoritative log
offset commit is overwrite state
- current progress is the main truth for a group/partition
partition ownership and replication are explicit because this is distributed infra
consumer rebalance is an internal coordination flow

Likely C1:

publish
offset commit
partition routing
partition leadership reassignment
replication
consumer-group assignment changes

Likely R1:

consumer read from partition

This is already clearly:

log + consumer progress not:
queue with in-flight visibility timeout

Step 2 — Critical Path Selection #

Requirement	Priority class	Why
Publish event to partition	C1	event durability and ordered append are core truth
Consumer reads events from partition	R1	core serving path
Consumer group commits offset/progress	C1	delivery/progress semantics depend on correct offset state
Route partition to current leader	C1	wrong routing breaks authoritative append order
Reassign partition leadership after node failure	C1	failover must preserve ordered committed log and authority
Replicate partition log to followers	C1	durability and failover correctness depend on it
Rebalance consumer assignments	C1	one group member should own a partition at a time for ordered processing in group semantics
Read topic/lag metrics	R2	operational only

Baseline critical paths:

Main C1 paths:

P1 publish event
P2 commit consumer offset
P3 route to partition leader
P4 reassign partition leadership
P5 replicate partition log
P6 rebalance consumer assignments

Main R1 path:

P7 read events from partition

What drives the design:

one authoritative append order per partition
one current leader per partition
durable replication of committed log
monotonic consumer-group progress per partition
controlled consumer-group assignment

Step 3 — Primary State Extraction #

For a Kafka-class platform, the minimal primary state is the partition log, consumer progress, partition ownership/routing, and replica state.

Candidate object label	Candidate source	Candidate needed for C1/R1?	Candidate decomposition action	Class	Primary?	Owner	Evolution	Scope kind	Scope value
PartitionLog	direct noun	Yes	keep as candidate	event	Yes	service	append-only	collection	topic_id + partition_id
ConsumerOffset	hidden write target	Yes	keep as candidate	relationship	Yes	service	overwrite	relation	group_id + topic_id + partition_id
PartitionOwnership	hidden write target	Yes	keep as candidate	process	Yes	service	state machine	instance	topic_id + partition_id
PartitionMap	hidden write target	Yes	keep as candidate	entity	Yes	service	overwrite	collection	topic partitions
ReplicaState	hidden write target	Yes	keep as candidate	process	Yes	service	state machine	relation	partition_id + replica_id
ConsumerAssignment	hidden write target	Yes	keep as candidate	relationship	Yes	service	overwrite	relation	group_id + partition_id
TopicMetricsView	derived read model	No	reject as UI artifact	projection	No	derived	overwrite	collection	topic_id

Important modeling choices:

`PartitionLog` #

This is the core truth of the system:

immutable ordered event sequence per partition

`ConsumerOffset` #

This is also primary:

it records current group progress through a partition
replay behavior and delivery semantics depend on it

`PartitionOwnership` #

Needed because:

one leader/owner must serialize authoritative appends for a partition

`ReplicaState` #

Needed because:

failover must know which replicas are sufficiently current

`ConsumerAssignment` #

Needed because:

within a consumer group, partition ownership by consumers is meaningful for ordered processing and rebalance correctness

Minimal strict primary set:

PartitionLog
ConsumerOffset
PartitionOwnership
PartitionMap
ReplicaState
ConsumerAssignment

Step 4 — Hard Invariants #

For a Kafka-class platform, the hard invariants are about authoritative append order, valid partition leadership, durable replication, and monotonic consumer progress.

Path	Tier	Type	Invariant template	Invariant statement
`P1` `write path` `Publish event`	HARD	ordering	ordering template	Instances `partition log appends` are ordered by `authoritative commit order` within `topic_id + partition_id`.
`P1` `write path` `Publish event`	HARD	eligibility	eligibility template	Action `append_event` is valid only if `request is routed to current partition leader and append preconditions, if any, hold` on `topic_id + partition_id` at decision time.
`P2` `write path` `Commit consumer offset`	HARD	ordering	ordering template	Instances `consumer offsets` are ordered by `monotonic progress` within `group_id + topic_id + partition_id`.
`P2` `write path` `Commit consumer offset`	HARD	eligibility	eligibility template	Action `commit_offset` is valid only if `new_offset is not behind current committed offset for group-partition scope` at decision time.
`P3` `write path` `Route to partition leader`	HARD	uniqueness	uniqueness template	Key `topic_id + partition_id` maps to at most one logical outcome `current authoritative leader` within that partition scope.
`P4` `write path` `Reassign partition leadership`	HARD	eligibility	eligibility template	Action `reassign_partition_leader` is valid only if `current leader failed or relinquished and candidate replica is eligible and sufficiently current` on `topic_id + partition_id` at decision time.
`P5` `write path` `Replicate partition log`	HARD	accounting	accounting template	Replica state for `topic_id + partition_id` equals `authoritative committed partition log` modulo bounded replication lag.
`P6` `write path` `Rebalance consumer assignments`	HARD	uniqueness	uniqueness template	Key `group_id + topic_id + partition_id` maps to at most one logical outcome `current assigned consumer` within that group-partition scope.
`P7` `read path` `Read events from partition`	HARD or SOFT	freshness	freshness template	Read path `consume_partition` reflects authoritative partition log within `configured consistency bound`.

What matters most:

1. One authoritative append order per partition #

This is the central Kafka-like invariant.

2. Monotonic consumer progress #

Offsets should not go backward accidentally for the same group/partition scope unless the API explicitly supports reset semantics.

3. Valid failover only to sufficiently current replica #

Otherwise committed events may be lost or reordered.

4. Assignment uniqueness within group #

One consumer group member should own a partition at one time for ordered processing.

Step 5 — Execution Context #

For the Kafka-class baseline:

Field	Value	Why
Topology	single service distributed	one logical streaming platform across many nodes
Write coordination scope	per object scope	correctness is per partition, per group-partition offset, and per assignment scope
Read consistency target	strong only	safest baseline for publish/offset correctness; read relaxation can be added later
Holder model	node	partition leadership is held by broker nodes
Compensation acceptable?	No	lost committed events or split leaders cannot be safely repaired later

Derived implications:

holder_may_crash = true
- brokers can fail while leading a partition
cross_service_write = false
- baseline keeps log, offsets, assignment, and routing within one logical service
bounded_staleness_allowed = false
- for authoritative publish and offset paths in the baseline
cross_service_atomicity_required = false
- no multi-service transaction required in baseline
exclusive_claim_required = true
- partition leadership and group-partition assignment require exclusivity
guarded_by_current_state = true
- offset commits, failover, and assignment changes depend on current state

This pushes us toward:

one leader per partition
append-only authoritative partition log
durable follower replication
monotonic offset updates
fenced partition leadership

Step 6 — Deterministic Mechanism Selection #

6A. Write Shape #

Path	Why	Write shape
`P1` publish event	immutable append to ordered partition log	`append-only event`
`P2` commit consumer offset	replace current progress value subject to monotonicity	`guarded state transition`
`P3` route to partition leader	one current authoritative leader per partition	`exclusive claim`
`P4` reassign partition leadership	valid only if ownership/failover state allows it	`guarded state transition`
`P5` replicate partition log	followers consume authoritative append stream	`append-only event`
`P6` rebalance consumer assignments	one current consumer per group-partition scope	`exclusive claim`

6B. Base Mechanism #

Path	Write shape	Base mechanism	Required companions
`P1` publish event	`append-only event`	`append log`	commit index, producer idempotency if required
`P2` commit consumer offset	`guarded state transition`	`CAS on version` or leader-applied monotonic overwrite	offset version
`P3` route to partition leader	`exclusive claim`	`lease`	fencing token, heartbeat
`P4` reassign partition leadership	`guarded state transition`	`CAS on (state, version)`	fencing token, replica catch-up check
`P5` replicate partition log	`append-only event`	`append log`	replica ack/progress
`P6` rebalance consumer assignments	`exclusive claim`	`lease` or coordinator-managed assignment epoch	assignment epoch

Why these fit:

Publish #

Kafka-class logs are fundamentally:

immutable append-only event streams

Offset commit #

Offset is current progress state, so it is not itself append-only truth. It is:

current value
but constrained by monotonicity

So a guarded overwrite/transition is the best classification.

Leadership and assignment #

Both are exclusive-ownership problems:

one current partition leader
one current consumer per group-partition assignment scope

Canonical substrate implied:

partitioned broker cluster
one leader per partition
follower replication per partition
dedicated or embedded group coordinator for assignments/offsets

Step 7 — Read Model / Source of Truth #

For a Kafka-class platform, truth is mostly direct source state. Operational metrics are projections.

Concept	Truth	Read path	Rebuild path
`C1` `source concept` `Partition event stream`	`PartitionLog`	`read source directly`	authoritative partition log segments
`C2` `source concept` `Consumer-group progress`	`ConsumerOffset`	`read source directly`	authoritative offset store
`C3` `source concept` `Partition leadership`	`PartitionOwnership`	`read source directly`	authoritative ownership store
`C4` `source concept` `Partition routing map`	`PartitionMap`	`read source directly`	authoritative routing metadata
`C5` `source concept` `Replica catch-up state`	`ReplicaState`	`read source directly`	recompute from commit index and replica progress
`C6` `projection concept` `Lag / topic metrics`	derived from logs, offsets, and replica state	`materialized view`	recompute from primary state

Important point:

For the correctness-critical baseline:

producers append to authoritative partition log via leader
consumers read from authoritative log
offsets are read and written against authoritative group-partition progress
metrics and lag dashboards are projections only

Step 8 — Failure Handling #

Path	Retry	Competing writers	Crash after commit	Publish failure	Stale holder
`P1` publish event	retry with producer idempotency key/sequence if required	partition leader serializes appends; stale leader loses authority	committed event survives leader crash if replicated past commit point	producer may retry; idempotent producer avoids duplicate append	stale leader fenced by epoch/term
`P2` commit consumer offset	retry safe with monotonic offset/version checks	stale/lower offset loses guarded transition	committed offset survives crash if persisted	n/a	stale coordinator/owner fenced by assignment epoch/version
`P3` route to partition leader	retry after refreshing partition map	only one valid leader should exist	refreshed map points to new leader after failover	n/a	stale owner rejected by fencing token
`P4` reassign partition leadership	retry failover safely	only one reassignment wins current ownership state	promoted leader crash triggers later failover	n/a	old leader fenced and must not serve writes
`P5` replicate partition log	follower retries from last known offset/index	followers independently catch up from committed log	leader crash stops new replication until failover; committed log remains truth	missing append retried from commit index	stale replica cannot become leader unless sufficiently current
`P6` rebalance consumer assignments	retry rebalance with assignment epoch	only one assignment per group-partition scope should be active	rebalance coordinator crash delays reassignment but next epoch retries	n/a	stale consumer assignment fenced by generation/assignment epoch

What matters most:

1. Partition leader fencing #

An old leader must not continue accepting appends after failover.

2. Producer idempotency #

If producer retry happens after response loss:

duplicate append is possible unless producer idempotency is used

3. Offset monotonicity #

Client retries or misordered commits must not silently move offsets backward unless explicit reset APIs exist.

4. Replica catch-up before leadership #

A stale replica becoming leader can lose acknowledged events.

Step 9 — Scale Adjustments #

Hotspot	Type	First response
hot partition with too much producer traffic	contention hotspot	increase partition count and rebalance producers across partitions
hot consumer group coordinator / offset store	write throughput hotspot	shard consumer-group metadata and isolate busy groups
replication lag on busy partitions	write throughput hotspot	batch log replication, tune follower IO, and isolate hot partitions
leader-heavy read/write pressure	contention hotspot	spread partition leadership and add more brokers
rebalance storms in consumer groups	contention hotspot	reduce eager rebalances, use cooperative/sticky assignment, and slow membership churn
log storage growth	storage growth hotspot	segment retention, compaction where applicable, and tiered/archival storage
lag/metrics fanout	read hotspot	keep lag and topic metrics as derived views only

What scales well:

This system scales by:

partitioning topics
making each partition an independent ordered append stream
replicating partitions independently
keeping offsets and assignments scoped by group-partition

Throughput grows mainly with:

number of partitions
number of brokers
replication and IO efficiency

What fails first:

Usually:

hot partitions
replication lag
rebalance churn
metadata bottlenecks around consumer groups

Canonical design conclusion:

The mechanical outcome is:

primary state:
- PartitionLog
- ConsumerOffset
- PartitionOwnership
- PartitionMap
- ReplicaState
- ConsumerAssignment
critical invariants:
- one authoritative append order per partition
- monotonic consumer progress per group-partition
- one leader per partition
- failover only to sufficiently current replica
- one current assignment per group-partition scope
mechanisms:
- append log
- lease
- guarded offset commit
- fenced leadership and assignment epochs
reads:
- direct source reads for log, offsets, and leadership
- projections only for lag/metrics

Polished interview answer:

“I’d design the event streaming platform as a partitioned immutable log system. Each partition has one leader that serializes appends and replicates them to followers. Consumers read directly from partition logs, while consumer-group progress is tracked separately as current offset state per group and partition. Leadership is lease- or epoch-backed and fenced, failover promotes only a sufficiently current replica, and consumer-group assignments are coordinated separately to ensure one active consumer per partition within a group. The main scaling levers are more partitions, balanced leadership, efficient replication, and controlling rebalance churn.”

Concrete Substrate #

I’ll choose a broker-owned partitioned log with per-partition leaders as the concrete baseline, because that fits the mechanical derivation:

immutable partitioned log
one leader per partition
follower replication
separate offset and assignment state
fenced leadership

Concrete substrate:

broker cluster
topics split into partitions
each partition stored as append-only segment files on leader and followers
leader appends events and replicates to followers
offset store and consumer-group coordinator track progress and assignment
metadata layer tracks partition leadership and routing

Concrete tech family:

broker service in Java, Go, or Scala
partition logs stored as append-only segment files on local disk
replication via Raft-like or Kafka-style leader-follower log replication
metadata/control via etcd, internal quorum, or controller nodes
optional RocksDB/state store for offsets/metadata, though offset state can also live in internal compacted topics

Operation Layer #

1. Publish event #

API

Produce(topic_id, partition_key?, records, producer_id?, sequence?)

Initiator

producer/client

Entry point

gateway or any broker

Authoritative decider

current leader for target partition

Precondition

valid partition routing
producer idempotency sequence, if enabled, must match expected order

Transition

append records to PartitionLog
advance leader log end offset

Response

{partition_id, base_offset, high_watermark?}

Failure cases

stale routing -> redirect/retry
response loss -> duplicate append unless producer idempotency is enabled

Sequence

producer sends Produce
entry broker resolves partition
forwards to partition leader
leader appends records locally
leader replicates to followers
on required ack condition, leader commits
leader replies with offsets

2. Read events from partition #

API

Fetch(topic_id, partition_id, offset, max_bytes)

Initiator

consumer/client

Entry point

broker

Authoritative decider

partition leader in the strict baseline

Precondition

requested offset exists or is within retention bounds

Transition

none

Response

{records, next_offset, high_watermark}

Failure cases

offset out of range due to retention
stale routing

Sequence

consumer sends Fetch
broker routes to partition leader
leader reads segment files from requested offset
returns records and next offset

3. Commit consumer offset #

API

CommitOffset(group_id, topic_id, partition_id, offset, expected_version?)

Initiator

consumer/client

Entry point

group coordinator or offset store endpoint

Authoritative decider

offset owner/coordinator

Precondition

offset commit must satisfy monotonicity/version rules for that group-partition scope

Transition

overwrite ConsumerOffset

Response

{committed_offset, version}

Failure cases

stale commit/version conflict
coordinator change during rebalance

Sequence

consumer sends CommitOffset
coordinator loads current offset/version
validates monotonicity/version
persists new offset
returns success

4. Reassign partition leadership #

API

internal election/failover flow

Initiator

system

Entry point

controller/quorum or replica group

Authoritative decider

metadata quorum / controller / replica-group protocol

Precondition

current leader failed or relinquished
candidate replica sufficiently current

Transition

PartitionOwnership updated to new leader epoch
old leader fenced

Response

updated routing metadata

Failure cases

split brain if stale leader not fenced
stale replica chosen as leader

5. Replicate partition log #

API

internal replication RPC

Initiator

partition leader

Entry point

follower replicas

Authoritative decider

leader for proposed entries; quorum/high-watermark logic for commit

Precondition

follower log matches previous offset/epoch boundary

Transition

follower appends missing entries
commit/high watermark advances after required acks

Response

follower ack/progress

6. Rebalance consumer assignments #

API

internal group-coordinator flow

Initiator

system / group coordinator

Entry point

group coordinator

Authoritative decider

group coordinator with assignment epoch

Precondition

current group membership known

Transition

overwrite ConsumerAssignment for group-partition scopes
increment assignment generation/epoch

Response

assignment map to consumers

Failure cases

rebalance storm
stale consumer acting on old assignment

Entry Point vs Decider vs Responder #

Path	Entry point	Authoritative decider	Physical responder	Logical responder
`Produce`	gateway / any broker	partition leader	leader or front broker	streaming platform
`Fetch`	broker	partition leader	leader or front broker	streaming platform
`CommitOffset`	group coordinator endpoint	offset owner/coordinator	coordinator	streaming platform
leader failover	controller/quorum	metadata quorum / replica-group protocol	new leader / controller	streaming platform
replication RPC	leader	follower + commit logic	follower ack	partition replica group
assignment rebalance	group coordinator	group coordinator	coordinator	streaming platform

Concrete HLD #

Main components:

gateway/router
- resolves topic partition and forwards producer/consumer traffic
partition leader
- authoritative append owner for one or more partitions
partition followers
- replicate partition logs and can be promoted on failover
group coordinator / offset service
- stores consumer-group offsets and assignments
metadata/controller layer
- tracks partition routing and current leaders

Concrete Technology Realizations #

Stronger infra-native answer #

broker service in Java, Scala, or Go
append-only segment files on local disk for partition logs
leader-follower partition replication
controller/quorum metadata service for partition leadership
offset and assignment metadata in compacted internal topics or durable metadata store

Short interview version #

“I’d build the platform as a broker cluster with partitioned immutable logs. Each partition has one leader that serializes appends and replicates to followers. Consumers fetch by offset directly from partition logs, and consumer-group progress is stored separately as current offset state. Partition leadership is fenced by epochs, failover promotes only a sufficiently current replica, and group coordinators manage partition assignments and rebalance generations.”

Kafka-Class Event Streaming Platform Analysis Note #

Step 1 — Normalize #

Step 2 — Critical Path Selection #

Step 3 — Primary State Extraction #

PartitionLog #

ConsumerOffset #

PartitionOwnership #

ReplicaState #

ConsumerAssignment #

Step 4 — Hard Invariants #

1. One authoritative append order per partition #

2. Monotonic consumer progress #

3. Valid failover only to sufficiently current replica #

4. Assignment uniqueness within group #

Step 5 — Execution Context #

Step 6 — Deterministic Mechanism Selection #

6A. Write Shape #

6B. Base Mechanism #

Publish #

Offset commit #

Leadership and assignment #

Step 7 — Read Model / Source of Truth #

Step 8 — Failure Handling #

1. Partition leader fencing #

2. Producer idempotency #

3. Offset monotonicity #

4. Replica catch-up before leadership #

Step 9 — Scale Adjustments #

Concrete Substrate #

Operation Layer #

1. Publish event #

2. Read events from partition #

3. Commit consumer offset #

4. Reassign partition leadership #

5. Replicate partition log #

6. Rebalance consumer assignments #

Entry Point vs Decider vs Responder #

Concrete HLD #

Concrete Technology Realizations #

Stronger infra-native answer #

Short interview version #

`PartitionLog` #

`ConsumerOffset` #

`PartitionOwnership` #

`ReplicaState` #

`ConsumerAssignment` #