Skip to main content
  1. System Design Components/

Configuration Management System (Distributed Config Push)

Configuration Management System (Distributed Config Push) #

This note models a distributed configuration management system where operators update configuration centrally, the control plane computes effective config, and versioned snapshots are pushed or pulled to many serving nodes safely.


Step 1 - Normalize #

Assume the baseline prompt is:

  • design a distributed configuration management system
  • admins update config centrally
  • many services or nodes consume config
  • config changes should propagate quickly
  • serving nodes should use coherent config versions
  • system scales across many tenants, services, and config scopes

Normalize into state-affecting paths.

RequirementActorOperationState touchedPriority
Admin creates or updates configAdminoverwrite stateS1
update target
ConfigDefinition
C1
Admin rolls back or disables config versionAdminstate transitionS1
update target
ConfigReleaseState
C1
System computes effective config snapshotSystemstate transitionS1
update target
EffectiveConfigState
C1
System propagates config snapshot to consumersSystemasync processS1
hidden write target
ConsumerConfigSnapshot
C1
Consumer reads local config for request handlingClientread sourceS1
read source target
ConsumerConfigSnapshot
C1
Consumer acknowledges applied config versionClientoverwrite stateS1
update target
ConsumerApplyState
R1
User reads config inventory / rollout statusClientread projectionS1
read projection target
ConfigStatusView
R2
System routes config scope/shard to current ownerSystemread sourceS1
read source target
PartitionMap
C1
System reassigns shard ownership after node failureSystemstate transitionS1
update target
PartitionOwnership
C1

Notes on normalization #

Important choices:

  • raw config edits are overwrite state
    • current desired config is current-value truth
  • release/rollback is a lifecycle transition
    • active version changes over time
  • effective config is a computed current view
  • propagation is async
  • consumer request handling reads local applied snapshots, not control-plane source state on every request

This system is fundamentally:

  • control plane + data plane

with:

  • versioned config
  • monotonic rollout

Step 2 - Critical Path Selection #

RequirementPriority classWhy
Create / update configC1config truth changes future behavior system-wide
Roll back / disable config versionC1rollback correctness affects safety and recovery
Compute effective configC1consumers depend on coherent derived config, not arbitrary fragments
Propagate config snapshotC1stale or mixed snapshots can break serving behavior
Consumer reads local configC1this is the hot serving path
Consumer acknowledges applied versionR1useful for rollout control and debugging
Read config inventory / rollout statusR2operational only
Route to shard ownerC1wrong routing can split config truth
Reassign shard ownershipC1failover must preserve config correctness

Baseline critical paths #

Main C1 paths:

  • P1 create/update config
  • P2 roll back / disable config version
  • P3 compute effective config snapshot
  • P4 propagate snapshot
  • P5 consumer local read
  • P6 route to shard owner
  • P7 reassign shard ownership

Main R1 path:

  • P8 consumer apply acknowledgment

This design is driven by:

  • one authoritative current config definition per scope
  • coherent effective config versions
  • monotonic consumer rollout

Step 3 - Primary State Extraction #

For a distributed config-push system, the minimal primary state is the current config definition, current release lifecycle, effective config state, consumer-applied snapshot state, and routing/ownership state.

Candidate object labelCandidate sourceCandidate needed for C1/R1?Candidate decomposition actionClassPrimary?OwnerEvolutionScope kindScope value
ConfigDefinitiondirect nounYeskeep as candidateentityYesserviceoverwriteinstanceconfig_scope
ConfigReleaseStatelifecycle objectYeskeep as candidateprocessYesservicestate machineinstanceconfig_scope
EffectiveConfigStatehidden write targetYeskeep as candidateprocessYesserviceoverwriteinstanceconsumer_scope
ConsumerConfigSnapshothidden write targetYeskeep as candidateprojectionYesserviceoverwriteinstanceconsumer_id or consumer_scope
ConsumerApplyStatehidden write targetYeskeep as candidateentityYesserviceoverwriteinstanceconsumer_id + config_scope
PartitionOwnershiphidden write targetYeskeep as candidateprocessYesservicestate machineinstanceshard_id
PartitionMaphidden write targetYeskeep as candidateentityYesserviceoverwritecollectionconfig shards
ConfigStatusViewderived read modelNoreject as UI artifactprojectionNoderivedoverwritecollectiontenant or service

Important modeling choices #

ConfigDefinition #

Primary because:

  • raw desired config is the central source of truth

ConfigReleaseState #

Primary because:

  • active, staged, rolled-back, disabled versions are lifecycle state

EffectiveConfigState #

Primary because:

  • consumers often need a resolved or merged config, not raw documents only

ConsumerConfigSnapshot #

Primary because:

  • hot-path serving reads this local or assigned snapshot

ConsumerApplyState #

Primary because:

  • rollout control depends on knowing what version each consumer has applied

Minimal strict primary set #

The strongest minimal set is:

  • ConfigDefinition
  • ConfigReleaseState
  • EffectiveConfigState
  • ConsumerConfigSnapshot
  • ConsumerApplyState
  • PartitionOwnership
  • PartitionMap

Step 4 - Hard Invariants #

For a distributed config-push system, the hard invariants are about one authoritative current config per scope, coherent effective config generation, and monotonic consumer snapshot application.

PathTierTypeInvariant statement
P1 create/update configHARDorderingConfig-definition revisions are ordered by monotonic version within config scope.
P2 roll back / disable config versionHARDeligibilityAction advance_release_state is valid only if current ConfigReleaseState allows the transition at decision time.
P3 compute effective configHARDaccountingEffectiveConfigState(scope, version) equals the deterministic function of current authoritative config inputs and release state for that scope.
P4 propagate snapshotHARDfreshnessConsumerConfigSnapshot reflects an authoritative EffectiveConfigState within configured propagation bounds and moves monotonically forward by version unless an explicit rollback transition is active.
P5 consumer local readHARDfreshnessServing-node config reads reflect the currently applied ConsumerConfigSnapshot for that node/scope.
P8 consumer apply acknowledgmentHARDaccountingConsumerApplyState reflects the highest config version actually applied by the consumer for that scope.
P6 route to shard ownerHARDuniquenessKey shard_id maps to at most one logical outcome current authoritative owner within shard_id.
P7 reassign shard ownershipHARDeligibilityAction reassign_shard is valid only if current owner is failed or relinquished and candidate owner is eligible and sufficiently current on shard_id at decision time.

What matters most #

1. Config versions must be coherent #

Serving nodes should not read partially mixed config fragments for the same version.

2. Consumer snapshots must move monotonically #

Absent explicit rollback, a node must not regress to an older version.

3. Effective config must be deterministic #

Recomputing from the same inputs should yield the same effective snapshot.

4. Applied version is separate from published version #

Publishing a config is not the same as proving consumers are actually serving it.


Step 5 - Execution Context #

For the baseline distributed config-push system:

FieldValueWhy
Topologysingle service distributedone logical config-control system with many config consumers
Write coordination scopeper object scopecorrectness is per config scope, consumer snapshot, and shard ownership scope
Read consistency targetbounded stale allowedserving nodes usually read local snapshots with explicit freshness/version discipline
Holder modelnoneconsumers do not hold exclusive mutable business ownership
Compensation acceptable?Nowrong or mixed config can cause production impact and is not safely compensable afterward

Derived implications #

  • holder_may_crash = false

    • consumers may fail, but they do not own shared lock-like state
  • cross_service_write = false

    • baseline keeps config truth, release state, and snapshot distribution in one logical system
  • bounded_staleness_allowed = true

    • local snapshot reads can tolerate bounded lag if explicit
  • cross_service_atomicity_required = false

    • no multi-service transaction across unrelated services in baseline
  • exclusive_claim_required = true

    • shard ownership must be exclusive
  • guarded_by_current_state = true

    • release and rollback transitions depend on current release state

What this implies #

This pushes us toward:

  • one authoritative owner per config shard
  • current-value config and release state
  • derived effective snapshots
  • monotonic local snapshot application on consumers

Step 6 - Deterministic Mechanism Selection #

PathWrite shapeBase mechanismRequired companions
P1 create/update configoverwrite current valueCAS on versionconfig version
P2 roll back / disable config versionguarded state transitionCAS on (state, version)release version
P3 compute effective configoverwrite current valuesingle writer control-plane recomputeeffective-config version
P4 propagate snapshotoverwrite current valuesingle writer snapshot publicationsnapshot version
P5 consumer local readread sourcelocal snapshot readapplied version
P8 consumer apply acknowledgmentoverwrite current valuemonotonic overwriteapplied version
P6 route to shard ownerexclusive claimleasefencing token, heartbeat
P7 reassign shard ownershipguarded state transitionCAS on (state, version)fencing token, shard catch-up check

Why these fit #

Config definitions #

Current desired config is current-value state, so overwrite fits.

Release lifecycle #

Promote, disable, and rollback depend on current state, so guarded transition fits.

Effective config and consumer snapshots #

These are current resolved views, so overwrite fits.

Apply acknowledgment #

Consumers report their highest applied version, so monotonic overwrite fits.

Canonical substrate implied #

The baseline now points to:

  • sharded config-control service
  • one owner per config scope
  • current-value config and release state
  • derived effective config
  • local consumer snapshots with monotonic rollout

Step 7 - Read Model / Source of Truth #

For a distributed config-push system, truth is mostly direct source state plus consumer snapshots. Status UIs are derived.

ConceptTruthRead pathRebuild path
C1 desired configConfigDefinitionread source directlyauthoritative config store
C2 active release lifecycleConfigReleaseStateread source directlyauthoritative release-state store
C3 resolved effective configEffectiveConfigStateread source directlyrecompute from config definitions and release state
C4 consumer local snapshotConsumerConfigSnapshotmaterialized viewrebuild from latest effective config
C5 consumer applied versionConsumerApplyStateread source directlyauthoritative apply-state store
C6 shard ownershipPartitionOwnershipread source directlyauthoritative ownership store
C7 shard routing mapPartitionMapread source directlyauthoritative routing metadata
C8 config rollout statusderived from definitions, releases, and apply statematerialized viewrecompute from authoritative state

Important point #

For the core semantics:

  • control-plane truth lives in config definitions, release state, and effective config
  • serving nodes read local snapshots
  • rollout status is derived from consumer apply state

Step 8 - Failure Handling #

PathRetryCompeting writersCrash after commitPublish failureStale holder
P1 create/update configretry with config versionstale update loses CAScommitted config survives crash if persistedsnapshot recompute may lagstale shard owner blocked by fencing token
P2 release/rollback transitionretry with release versionstale transition loses guarded updatecommitted release state survives crash if persistedconsumer propagation may lagstale shard owner blocked by fencing token
P3 effective-config recomputerecompute retry safe from source inputssingle recompute/version winsrecompute reruns after crashconsumer propagation may lagn/a
P4 snapshot propagationretry with versioned snapshotolder snapshot loses to newer version unless explicit rollback version is activeconsumer keeps last good snapshot until refreshfailed push retried or pulledn/a
P5 consumer local readrequest retry safemany consumers read concurrently from same local snapshotconsumer crash drops requests onlyn/astale snapshot bounded by configured propagation freshness
P8 apply acknowledgmentretry with highest applied versionstale/lower version report loses monotonic overwriteapplied version survives crash if persistedrollout UI may lagn/a
P6 route to shard ownerretry after refreshing shard maponly one valid owner should existif owner changed, refreshed map points to new ownern/astale owner rejected by fencing token
P7 reassign shard ownershipretry failover transition safelyonly one reassignment wins current ownership statepromoted owner crash triggers later reassignmentn/aold owner fenced and must not continue serving

What matters most #

1. Monotonic snapshot movement #

Consumers must not accidentally apply older config after a newer one, except under an explicit rollback model.

2. Coherent version boundaries #

Consumers should apply config snapshots atomically at version boundaries, not field by field.

3. Published versus applied version #

Operators need both:

  • latest published version
  • latest actually applied version per consumer

4. Rollback is a first-class lifecycle transition #

Rollback is not “stale propagation”; it is a deliberate new release-state transition.


Step 9 - Scale Adjustments #

HotspotTypeFirst response
very large consumer fleetsfan-out hotspothierarchical config distribution or pull-after-notify
high config churncontention hotspotbatch edits and incremental recompute by affected scope
large config snapshotsread/memory hotspotshard config by service/scope and compress snapshots
rollout-status queriesread hotspotserve from derived views over ConsumerApplyState
reconnect stormscontention hotspotstagger re-syncs and use version-based delta fetch
mixed-version safety checkscontrol-plane hotspotvalidate rollout invariants before publish and during progressive rollout

What scales well #

This system scales by:

  • sharding config scopes
  • pushing/pulling versioned snapshots rather than source reads on every request
  • incrementally recomputing only affected effective config
  • separating status reporting from the hot serving path

What fails first #

Usually:

  • fleet-wide fanout storms
  • large snapshots
  • rapid config churn
  • rollout status reads hitting primary state

Canonical design conclusion #

The mechanical outcome is:

  • primary state:
    • ConfigDefinition
    • ConfigReleaseState
    • EffectiveConfigState
    • ConsumerConfigSnapshot
    • ConsumerApplyState
    • PartitionOwnership
    • PartitionMap
  • critical invariants:
    • one authoritative current config per scope
    • deterministic effective config generation
    • monotonic consumer snapshot application unless explicit rollback
    • exclusive shard ownership for config truth
  • mechanisms:
    • overwrite current value for config definitions
    • guarded release/rollback transitions
    • overwrite effective snapshots
    • monotonic applied-version tracking
    • fenced shard ownership
  • reads:
    • hot path from local applied snapshots
    • status and rollout views from derived projections

Polished interview answer #

I’d build the config system as a control-plane/data-plane service. The control plane owns authoritative config definitions and release lifecycle, computes a deterministic effective config for each scope, and publishes versioned snapshots to consumers. Serving nodes never fetch raw control-plane config on every request; they read local snapshots atomically and move forward monotonically by config version. Rollback is modeled as a first-class release-state transition, not as accidental stale propagation, and consumers report their applied versions back so rollout status is observable. The main scaling levers are sharding config scopes, hierarchical or delta-based distribution, incremental recompute, and keeping rollout dashboards off the serving hot path.


Concrete Substrate #

I’ll choose a control-plane/data-plane config system with authoritative config shards plus local consumer snapshots as the concrete baseline, because it matches the mechanics we derived:

  • current-value config and release state
  • derived effective snapshots
  • monotonic snapshot publication
  • one owner per shard

Concrete tech family:

  • control plane in Go or Java
  • authoritative state store:
    • replicated DB or RocksDB-backed service state
  • metadata/control:
    • etcd or internal metadata quorum for shard ownership/routing
  • distribution layer:
    • watch stream, long poll, or push channel to consumers

Each shard owner stores:

  • current config definitions
  • current release state
  • current effective config per consumer scope
  • rollout/apply status from consumers

Consumers store:

  • local ConsumerConfigSnapshot
  • current applied version

Operation Layer #

1. Update config #

API

  • PutConfig(scope, config_doc, expected_version?)

Initiator

  • admin

Entry point

  • config API

Authoritative decider

  • shard owner for config scope

Precondition

  • config version matches if optimistic concurrency used

Transition

  • overwrite ConfigDefinition
  • trigger EffectiveConfigState recompute

2. Promote or roll back config #

API

  • UpdateReleaseState(scope, action, target_version, expected_release_version?)

Initiator

  • admin

Entry point

  • release API

Authoritative decider

  • shard owner for config scope

Precondition

  • current release state allows requested transition

Transition

  • guarded update of ConfigReleaseState
  • trigger snapshot propagation

3. Propagate snapshot #

API

  • internal push or pull-after-notify flow

Initiator

  • system

Entry point

  • control plane / consumer

Authoritative decider

  • snapshot publisher

Precondition

  • newer effective config version exists

Transition

  • overwrite ConsumerConfigSnapshot

4. Consumer apply and ack #

API

  • AckAppliedConfig(scope, version, consumer_id)

Initiator

  • consumer

Entry point

  • rollout status endpoint

Authoritative decider

  • shard owner for config scope

Precondition

  • version is applied locally

Transition

  • monotonic overwrite ConsumerApplyState

5. Consumer read on hot path #

API

  • internal local read

Initiator

  • consumer/service

Entry point

  • local process

Authoritative decider

  • local applied ConsumerConfigSnapshot

Precondition

  • snapshot loaded and valid

Transition

  • none

Entry Point vs Decider vs Responder #

PathEntry pointAuthoritative deciderPhysical responderLogical responder
config update / release updateconfig APIconfig shard ownerAPI nodeconfig system
snapshot propagationcontrol plane / consumersnapshot publishercontrol/data-plane nodeconfig system
consumer local readlocal processlocal applied snapshotlocal processconfig system
apply ackrollout endpointconfig shard ownerAPI nodeconfig system
shard failoverfollower / coordination layershard quorum / lease storenew leader / control planeconfig system

Concrete HLD #

Main components:

  • config control-plane API
    • receives config and release updates
  • config shard owners
    • authoritative owners for config truth and effective snapshot recompute
  • distribution layer
    • pushes or serves versioned snapshots to consumers
  • consumer fleet
    • reads local applied snapshots on hot path
  • metadata/control service
    • tracks shard ownership and routing
  • rollout status pipeline
    • serves config inventory and rollout status

Short Interview Version #

I’d build the config system as a control-plane/data-plane service. The control plane owns authoritative config definitions and release lifecycle, computes a deterministic effective config for each scope, and publishes versioned snapshots to consumers. Serving nodes never fetch raw control-plane config on every request; they read local snapshots atomically and move forward monotonically by config version. Rollback is modeled as a first-class release-state transition, not as accidental stale propagation, and consumers report their applied versions back so rollout status is observable. The main scaling levers are sharding config scopes, hierarchical or delta-based distribution, incremental recompute, and keeping rollout dashboards off the serving hot path.