- My Development Notes/
- System Design Components/
- Configuration Management System (Distributed Config Push)/
Configuration Management System (Distributed Config Push)
Configuration Management System (Distributed Config Push) #
This note models a distributed configuration management system where operators update configuration centrally, the control plane computes effective config, and versioned snapshots are pushed or pulled to many serving nodes safely.
Step 1 - Normalize #
Assume the baseline prompt is:
- design a distributed configuration management system
- admins update config centrally
- many services or nodes consume config
- config changes should propagate quickly
- serving nodes should use coherent config versions
- system scales across many tenants, services, and config scopes
Normalize into state-affecting paths.
| Requirement | Actor | Operation | State touched | Priority |
|---|---|---|---|---|
| Admin creates or updates config | Admin | overwrite state | S1update targetConfigDefinition | C1 |
| Admin rolls back or disables config version | Admin | state transition | S1update targetConfigReleaseState | C1 |
| System computes effective config snapshot | System | state transition | S1update targetEffectiveConfigState | C1 |
| System propagates config snapshot to consumers | System | async process | S1hidden write targetConsumerConfigSnapshot | C1 |
| Consumer reads local config for request handling | Client | read source | S1read source targetConsumerConfigSnapshot | C1 |
| Consumer acknowledges applied config version | Client | overwrite state | S1update targetConsumerApplyState | R1 |
| User reads config inventory / rollout status | Client | read projection | S1read projection targetConfigStatusView | R2 |
| System routes config scope/shard to current owner | System | read source | S1read source targetPartitionMap | C1 |
| System reassigns shard ownership after node failure | System | state transition | S1update targetPartitionOwnership | C1 |
Notes on normalization #
Important choices:
- raw config edits are
overwrite state- current desired config is current-value truth
- release/rollback is a lifecycle transition
- active version changes over time
- effective config is a computed current view
- propagation is async
- consumer request handling reads local applied snapshots, not control-plane source state on every request
This system is fundamentally:
control plane + data plane
with:
- versioned config
- monotonic rollout
Step 2 - Critical Path Selection #
| Requirement | Priority class | Why |
|---|---|---|
| Create / update config | C1 | config truth changes future behavior system-wide |
| Roll back / disable config version | C1 | rollback correctness affects safety and recovery |
| Compute effective config | C1 | consumers depend on coherent derived config, not arbitrary fragments |
| Propagate config snapshot | C1 | stale or mixed snapshots can break serving behavior |
| Consumer reads local config | C1 | this is the hot serving path |
| Consumer acknowledges applied version | R1 | useful for rollout control and debugging |
| Read config inventory / rollout status | R2 | operational only |
| Route to shard owner | C1 | wrong routing can split config truth |
| Reassign shard ownership | C1 | failover must preserve config correctness |
Baseline critical paths #
Main C1 paths:
P1create/update configP2roll back / disable config versionP3compute effective config snapshotP4propagate snapshotP5consumer local readP6route to shard ownerP7reassign shard ownership
Main R1 path:
P8consumer apply acknowledgment
This design is driven by:
- one authoritative current config definition per scope
- coherent effective config versions
- monotonic consumer rollout
Step 3 - Primary State Extraction #
For a distributed config-push system, the minimal primary state is the current config definition, current release lifecycle, effective config state, consumer-applied snapshot state, and routing/ownership state.
| Candidate object label | Candidate source | Candidate needed for C1/R1? | Candidate decomposition action | Class | Primary? | Owner | Evolution | Scope kind | Scope value |
|---|---|---|---|---|---|---|---|---|---|
| ConfigDefinition | direct noun | Yes | keep as candidate | entity | Yes | service | overwrite | instance | config_scope |
| ConfigReleaseState | lifecycle object | Yes | keep as candidate | process | Yes | service | state machine | instance | config_scope |
| EffectiveConfigState | hidden write target | Yes | keep as candidate | process | Yes | service | overwrite | instance | consumer_scope |
| ConsumerConfigSnapshot | hidden write target | Yes | keep as candidate | projection | Yes | service | overwrite | instance | consumer_id or consumer_scope |
| ConsumerApplyState | hidden write target | Yes | keep as candidate | entity | Yes | service | overwrite | instance | consumer_id + config_scope |
| PartitionOwnership | hidden write target | Yes | keep as candidate | process | Yes | service | state machine | instance | shard_id |
| PartitionMap | hidden write target | Yes | keep as candidate | entity | Yes | service | overwrite | collection | config shards |
| ConfigStatusView | derived read model | No | reject as UI artifact | projection | No | derived | overwrite | collection | tenant or service |
Important modeling choices #
ConfigDefinition #
Primary because:
- raw desired config is the central source of truth
ConfigReleaseState #
Primary because:
- active, staged, rolled-back, disabled versions are lifecycle state
EffectiveConfigState #
Primary because:
- consumers often need a resolved or merged config, not raw documents only
ConsumerConfigSnapshot #
Primary because:
- hot-path serving reads this local or assigned snapshot
ConsumerApplyState #
Primary because:
- rollout control depends on knowing what version each consumer has applied
Minimal strict primary set #
The strongest minimal set is:
ConfigDefinitionConfigReleaseStateEffectiveConfigStateConsumerConfigSnapshotConsumerApplyStatePartitionOwnershipPartitionMap
Step 4 - Hard Invariants #
For a distributed config-push system, the hard invariants are about one authoritative current config per scope, coherent effective config generation, and monotonic consumer snapshot application.
| Path | Tier | Type | Invariant statement |
|---|---|---|---|
P1 create/update config | HARD | ordering | Config-definition revisions are ordered by monotonic version within config scope. |
P2 roll back / disable config version | HARD | eligibility | Action advance_release_state is valid only if current ConfigReleaseState allows the transition at decision time. |
P3 compute effective config | HARD | accounting | EffectiveConfigState(scope, version) equals the deterministic function of current authoritative config inputs and release state for that scope. |
P4 propagate snapshot | HARD | freshness | ConsumerConfigSnapshot reflects an authoritative EffectiveConfigState within configured propagation bounds and moves monotonically forward by version unless an explicit rollback transition is active. |
P5 consumer local read | HARD | freshness | Serving-node config reads reflect the currently applied ConsumerConfigSnapshot for that node/scope. |
P8 consumer apply acknowledgment | HARD | accounting | ConsumerApplyState reflects the highest config version actually applied by the consumer for that scope. |
P6 route to shard owner | HARD | uniqueness | Key shard_id maps to at most one logical outcome current authoritative owner within shard_id. |
P7 reassign shard ownership | HARD | eligibility | Action reassign_shard is valid only if current owner is failed or relinquished and candidate owner is eligible and sufficiently current on shard_id at decision time. |
What matters most #
1. Config versions must be coherent #
Serving nodes should not read partially mixed config fragments for the same version.
2. Consumer snapshots must move monotonically #
Absent explicit rollback, a node must not regress to an older version.
3. Effective config must be deterministic #
Recomputing from the same inputs should yield the same effective snapshot.
4. Applied version is separate from published version #
Publishing a config is not the same as proving consumers are actually serving it.
Step 5 - Execution Context #
For the baseline distributed config-push system:
| Field | Value | Why |
|---|---|---|
| Topology | single service distributed | one logical config-control system with many config consumers |
| Write coordination scope | per object scope | correctness is per config scope, consumer snapshot, and shard ownership scope |
| Read consistency target | bounded stale allowed | serving nodes usually read local snapshots with explicit freshness/version discipline |
| Holder model | none | consumers do not hold exclusive mutable business ownership |
| Compensation acceptable? | No | wrong or mixed config can cause production impact and is not safely compensable afterward |
Derived implications #
holder_may_crash = false- consumers may fail, but they do not own shared lock-like state
cross_service_write = false- baseline keeps config truth, release state, and snapshot distribution in one logical system
bounded_staleness_allowed = true- local snapshot reads can tolerate bounded lag if explicit
cross_service_atomicity_required = false- no multi-service transaction across unrelated services in baseline
exclusive_claim_required = true- shard ownership must be exclusive
guarded_by_current_state = true- release and rollback transitions depend on current release state
What this implies #
This pushes us toward:
- one authoritative owner per config shard
- current-value config and release state
- derived effective snapshots
- monotonic local snapshot application on consumers
Step 6 - Deterministic Mechanism Selection #
| Path | Write shape | Base mechanism | Required companions |
|---|---|---|---|
P1 create/update config | overwrite current value | CAS on version | config version |
P2 roll back / disable config version | guarded state transition | CAS on (state, version) | release version |
P3 compute effective config | overwrite current value | single writer control-plane recompute | effective-config version |
P4 propagate snapshot | overwrite current value | single writer snapshot publication | snapshot version |
P5 consumer local read | read source | local snapshot read | applied version |
P8 consumer apply acknowledgment | overwrite current value | monotonic overwrite | applied version |
P6 route to shard owner | exclusive claim | lease | fencing token, heartbeat |
P7 reassign shard ownership | guarded state transition | CAS on (state, version) | fencing token, shard catch-up check |
Why these fit #
Config definitions #
Current desired config is current-value state, so overwrite fits.
Release lifecycle #
Promote, disable, and rollback depend on current state, so guarded transition fits.
Effective config and consumer snapshots #
These are current resolved views, so overwrite fits.
Apply acknowledgment #
Consumers report their highest applied version, so monotonic overwrite fits.
Canonical substrate implied #
The baseline now points to:
- sharded config-control service
- one owner per config scope
- current-value config and release state
- derived effective config
- local consumer snapshots with monotonic rollout
Step 7 - Read Model / Source of Truth #
For a distributed config-push system, truth is mostly direct source state plus consumer snapshots. Status UIs are derived.
| Concept | Truth | Read path | Rebuild path |
|---|---|---|---|
C1 desired config | ConfigDefinition | read source directly | authoritative config store |
C2 active release lifecycle | ConfigReleaseState | read source directly | authoritative release-state store |
C3 resolved effective config | EffectiveConfigState | read source directly | recompute from config definitions and release state |
C4 consumer local snapshot | ConsumerConfigSnapshot | materialized view | rebuild from latest effective config |
C5 consumer applied version | ConsumerApplyState | read source directly | authoritative apply-state store |
C6 shard ownership | PartitionOwnership | read source directly | authoritative ownership store |
C7 shard routing map | PartitionMap | read source directly | authoritative routing metadata |
C8 config rollout status | derived from definitions, releases, and apply state | materialized view | recompute from authoritative state |
Important point #
For the core semantics:
- control-plane truth lives in config definitions, release state, and effective config
- serving nodes read local snapshots
- rollout status is derived from consumer apply state
Step 8 - Failure Handling #
| Path | Retry | Competing writers | Crash after commit | Publish failure | Stale holder |
|---|---|---|---|---|---|
P1 create/update config | retry with config version | stale update loses CAS | committed config survives crash if persisted | snapshot recompute may lag | stale shard owner blocked by fencing token |
P2 release/rollback transition | retry with release version | stale transition loses guarded update | committed release state survives crash if persisted | consumer propagation may lag | stale shard owner blocked by fencing token |
P3 effective-config recompute | recompute retry safe from source inputs | single recompute/version wins | recompute reruns after crash | consumer propagation may lag | n/a |
P4 snapshot propagation | retry with versioned snapshot | older snapshot loses to newer version unless explicit rollback version is active | consumer keeps last good snapshot until refresh | failed push retried or pulled | n/a |
P5 consumer local read | request retry safe | many consumers read concurrently from same local snapshot | consumer crash drops requests only | n/a | stale snapshot bounded by configured propagation freshness |
P8 apply acknowledgment | retry with highest applied version | stale/lower version report loses monotonic overwrite | applied version survives crash if persisted | rollout UI may lag | n/a |
P6 route to shard owner | retry after refreshing shard map | only one valid owner should exist | if owner changed, refreshed map points to new owner | n/a | stale owner rejected by fencing token |
P7 reassign shard ownership | retry failover transition safely | only one reassignment wins current ownership state | promoted owner crash triggers later reassignment | n/a | old owner fenced and must not continue serving |
What matters most #
1. Monotonic snapshot movement #
Consumers must not accidentally apply older config after a newer one, except under an explicit rollback model.
2. Coherent version boundaries #
Consumers should apply config snapshots atomically at version boundaries, not field by field.
3. Published versus applied version #
Operators need both:
- latest published version
- latest actually applied version per consumer
4. Rollback is a first-class lifecycle transition #
Rollback is not “stale propagation”; it is a deliberate new release-state transition.
Step 9 - Scale Adjustments #
| Hotspot | Type | First response |
|---|---|---|
| very large consumer fleets | fan-out hotspot | hierarchical config distribution or pull-after-notify |
| high config churn | contention hotspot | batch edits and incremental recompute by affected scope |
| large config snapshots | read/memory hotspot | shard config by service/scope and compress snapshots |
| rollout-status queries | read hotspot | serve from derived views over ConsumerApplyState |
| reconnect storms | contention hotspot | stagger re-syncs and use version-based delta fetch |
| mixed-version safety checks | control-plane hotspot | validate rollout invariants before publish and during progressive rollout |
What scales well #
This system scales by:
- sharding config scopes
- pushing/pulling versioned snapshots rather than source reads on every request
- incrementally recomputing only affected effective config
- separating status reporting from the hot serving path
What fails first #
Usually:
- fleet-wide fanout storms
- large snapshots
- rapid config churn
- rollout status reads hitting primary state
Canonical design conclusion #
The mechanical outcome is:
- primary state:
ConfigDefinitionConfigReleaseStateEffectiveConfigStateConsumerConfigSnapshotConsumerApplyStatePartitionOwnershipPartitionMap
- critical invariants:
- one authoritative current config per scope
- deterministic effective config generation
- monotonic consumer snapshot application unless explicit rollback
- exclusive shard ownership for config truth
- mechanisms:
- overwrite current value for config definitions
- guarded release/rollback transitions
- overwrite effective snapshots
- monotonic applied-version tracking
- fenced shard ownership
- reads:
- hot path from local applied snapshots
- status and rollout views from derived projections
Polished interview answer #
I’d build the config system as a control-plane/data-plane service. The control plane owns authoritative config definitions and release lifecycle, computes a deterministic effective config for each scope, and publishes versioned snapshots to consumers. Serving nodes never fetch raw control-plane config on every request; they read local snapshots atomically and move forward monotonically by config version. Rollback is modeled as a first-class release-state transition, not as accidental stale propagation, and consumers report their applied versions back so rollout status is observable. The main scaling levers are sharding config scopes, hierarchical or delta-based distribution, incremental recompute, and keeping rollout dashboards off the serving hot path.
Concrete Substrate #
I’ll choose a control-plane/data-plane config system with authoritative config shards plus local consumer snapshots as the concrete baseline, because it matches the mechanics we derived:
- current-value config and release state
- derived effective snapshots
- monotonic snapshot publication
- one owner per shard
Concrete tech family:
- control plane in
GoorJava - authoritative state store:
- replicated DB or
RocksDB-backed service state
- replicated DB or
- metadata/control:
etcdor internal metadata quorum for shard ownership/routing
- distribution layer:
- watch stream, long poll, or push channel to consumers
Each shard owner stores:
- current config definitions
- current release state
- current effective config per consumer scope
- rollout/apply status from consumers
Consumers store:
- local
ConsumerConfigSnapshot - current applied version
Operation Layer #
1. Update config #
API
PutConfig(scope, config_doc, expected_version?)
Initiator
- admin
Entry point
- config API
Authoritative decider
- shard owner for config scope
Precondition
- config version matches if optimistic concurrency used
Transition
- overwrite
ConfigDefinition - trigger
EffectiveConfigStaterecompute
2. Promote or roll back config #
API
UpdateReleaseState(scope, action, target_version, expected_release_version?)
Initiator
- admin
Entry point
- release API
Authoritative decider
- shard owner for config scope
Precondition
- current release state allows requested transition
Transition
- guarded update of
ConfigReleaseState - trigger snapshot propagation
3. Propagate snapshot #
API
- internal push or pull-after-notify flow
Initiator
- system
Entry point
- control plane / consumer
Authoritative decider
- snapshot publisher
Precondition
- newer effective config version exists
Transition
- overwrite
ConsumerConfigSnapshot
4. Consumer apply and ack #
API
AckAppliedConfig(scope, version, consumer_id)
Initiator
- consumer
Entry point
- rollout status endpoint
Authoritative decider
- shard owner for config scope
Precondition
- version is applied locally
Transition
- monotonic overwrite
ConsumerApplyState
5. Consumer read on hot path #
API
- internal local read
Initiator
- consumer/service
Entry point
- local process
Authoritative decider
- local applied
ConsumerConfigSnapshot
Precondition
- snapshot loaded and valid
Transition
- none
Entry Point vs Decider vs Responder #
| Path | Entry point | Authoritative decider | Physical responder | Logical responder |
|---|---|---|---|---|
| config update / release update | config API | config shard owner | API node | config system |
| snapshot propagation | control plane / consumer | snapshot publisher | control/data-plane node | config system |
| consumer local read | local process | local applied snapshot | local process | config system |
| apply ack | rollout endpoint | config shard owner | API node | config system |
| shard failover | follower / coordination layer | shard quorum / lease store | new leader / control plane | config system |
Concrete HLD #
Main components:
- config control-plane API
- receives config and release updates
- config shard owners
- authoritative owners for config truth and effective snapshot recompute
- distribution layer
- pushes or serves versioned snapshots to consumers
- consumer fleet
- reads local applied snapshots on hot path
- metadata/control service
- tracks shard ownership and routing
- rollout status pipeline
- serves config inventory and rollout status
Short Interview Version #
I’d build the config system as a control-plane/data-plane service. The control plane owns authoritative config definitions and release lifecycle, computes a deterministic effective config for each scope, and publishes versioned snapshots to consumers. Serving nodes never fetch raw control-plane config on every request; they read local snapshots atomically and move forward monotonically by config version. Rollback is modeled as a first-class release-state transition, not as accidental stale propagation, and consumers report their applied versions back so rollout status is observable. The main scaling levers are sharding config scopes, hierarchical or delta-based distribution, incremental recompute, and keeping rollout dashboards off the serving hot path.