Skip to main content
  1. System Design Components/

Secret Management Service (Vault-class)

Secret Management Service (Vault-class) #

This note models a Vault-class secret management service where clients authenticate, read or generate secrets, receive leased dynamic credentials, renew or revoke them, and the system rotates underlying secret material safely at scale.


Step 1 - Normalize #

Assume the baseline prompt is:

  • design a secret management service
  • clients authenticate and request secrets
  • static secrets are stored securely and versioned
  • dynamic credentials can be generated with leases
  • secrets and credentials can be rotated, renewed, revoked, and audited
  • system scales across many tenants, apps, and secret scopes

Normalize into state-affecting paths.

RequirementActorOperationState touchedPriority
Client authenticates to secret serviceClientstate transitionS1
update target
AuthSessionState
C1
Admin writes or updates static secretAdminoverwrite stateS1
update target
SecretVersionState
C1
Client reads secretClientread sourceS1
read source target
SecretVersionState
C1
Service issues dynamic leased credentialSystemappend eventS1
create target
LeasedCredential
C1
Client renews leased credentialClientstate transitionS1
update target
LeaseState
C1
Client or system revokes leased credentialClientstate transitionS1
update target
LeaseState
C1
Admin rotates secret or backend root credentialAdminstate transitionS1
update target
SecretRotationState
C1
System updates access policyAdminoverwrite stateS1
update target
SecretAccessPolicy
C1
System emits audit log for secret access or mutationSystemappend eventS1
create target
SecretAuditEvent
C1
Client reads secret inventory / lease statusClientread projectionS1
read projection target
SecretStatusView
R2
System routes tenant/path shard to current ownerSystemread sourceS1
read source target
PartitionMap
C1
System reassigns shard ownership after node failureSystemstate transitionS1
update target
PartitionOwnership
C1

Notes on normalization #

Important choices:

  • auth establishes a current session/token lifecycle
  • static secret writes are current-value state with versioning
  • reads are security-critical source reads
  • dynamic credential issuance is an immutable issuance fact
  • lease renew/revoke are lifecycle transitions
  • rotation is explicit because secret version and backend validity change over time
  • audit is append-only and part of the core product

This system is a hybrid of:

  • secure current-value secret storage
  • lease-backed dynamic credentials
  • policy-gated access

Step 2 - Critical Path Selection #

RequirementPriority classWhy
Authenticate to serviceC1auth establishes who may read or mutate secrets
Write/update static secretC1current secret truth changes future reads
Read secretC1wrong allow/deny or wrong secret value is core correctness/security failure
Issue dynamic credentialC1leased credentials are core product output
Renew leaseC1stale or invalid renewals affect credential validity
Revoke leaseC1revocation must affect future usage
Rotate secret/backend credentialC1rotation correctness affects secrecy and downstream clients
Update access policyC1future access and issuance depend on current policy
Emit audit logC1auditable access is often a product requirement
Read inventory/statusR2operational only
Route to shard ownerC1wrong routing can split secret truth
Reassign shard ownershipC1failover must preserve secret/lease correctness

Baseline critical paths #

Main C1 paths:

  • P1 authenticate
  • P2 write/update static secret
  • P3 read secret
  • P4 issue dynamic credential
  • P5 renew lease
  • P6 revoke lease
  • P7 rotate secret/backend credential
  • P8 update access policy
  • P9 emit audit event
  • P10 route to shard owner
  • P11 reassign shard ownership

This design is driven by:

  • authoritative current secret version
  • lease lifecycle for dynamic credentials
  • strict policy-gated access
  • durable audit trail

Step 3 - Primary State Extraction #

For a Vault-class system, the minimal primary state is the current auth session, current secret version, dynamic credential issuance and lease lifecycle, rotation state, access policy, audit trail, and routing/ownership state.

Candidate object labelCandidate sourceCandidate needed for C1/R1?Candidate decomposition actionClassPrimary?OwnerEvolutionScope kindScope value
AuthSessionStatedirect nounYeskeep as candidateprocessYesservicestate machineinstancesession_id
SecretVersionStatedirect nounYeskeep as candidateentityYesserviceoverwriteinstancesecret_path
LeasedCredentialdirect nounYeskeep as candidateeventYesserviceappend-onlyinstancecredential_id
LeaseStatelifecycle objectYeskeep as candidateprocessYesservicestate machineinstancelease_id
SecretRotationStatelifecycle objectYeskeep as candidateprocessYesservicestate machineinstancesecret_path or backend_id
SecretAccessPolicydirect nounYeskeep as candidateentityYesserviceoverwriteinstancepolicy_scope
SecretAuditEventdirect nounYeskeep as candidateeventYesserviceappend-onlyinstanceaudit_id
PartitionOwnershiphidden write targetYeskeep as candidateprocessYesservicestate machineinstanceshard_id
PartitionMaphidden write targetYeskeep as candidateentityYesserviceoverwritecollectiontenant/path shards
SecretStatusViewderived read modelNoreject as UI artifactprojectionNoderivedoverwritecollectiontenant or path

Important modeling choices #

SecretVersionState #

Primary because:

  • current secret truth is the core product for static secrets
  • versioning matters for rollback, history, and rotation

LeasedCredential #

Primary because:

  • dynamic credentials are an issuance fact and often need auditability

LeaseState #

Primary because:

  • current validity of a dynamic credential is lifecycle state
  • states like ACTIVE, EXPIRED, REVOKED, RENEWED

SecretRotationState #

Primary because:

  • rotations often span multiple steps:
    • next version created
    • propagated
    • old version retired

SecretAuditEvent #

Primary because:

  • access/mutation audit is usually first-class product behavior

Minimal strict primary set #

  • AuthSessionState
  • SecretVersionState
  • LeasedCredential
  • LeaseState
  • SecretRotationState
  • SecretAccessPolicy
  • SecretAuditEvent
  • PartitionOwnership
  • PartitionMap

Step 4 - Hard Invariants #

For a Vault-class secret management service, the hard invariants are about correct policy-gated secret access, one authoritative current secret version, valid lease lifecycle, and safe rotation/revocation.

PathTierTypeInvariant statement
P1 authenticateHARDeligibilityAction create_auth_session is valid only if presented auth method, identity, and current SecretAccessPolicy allow session issuance at decision time.
P2 write/update static secretHARDorderingSecret-version revisions are ordered by monotonic version within secret path scope.
P3 read secretHARDeligibilityDecision read_secret(path) is valid only if current AuthSessionState is active and current SecretAccessPolicy allows access to the current SecretVersionState at decision time.
P4 issue dynamic credentialHARDeligibilityAction issue_dynamic_credential is valid only if current session/policy allow issuance and backing secret/backend state is active at decision time.
P4 issue dynamic credentialHARDuniquenessKey credential_id maps to at most one logical outcome issued leased credential within credential scope.
P5 renew leaseHARDeligibilityAction renew_lease is valid only if current LeaseState is active and renewable under current policy and backend state at decision time.
P6 revoke leaseHARDeligibilityAction revoke_lease is valid only if current LeaseState is revocable at decision time.
P7 rotate secret/backend credentialHARDeligibilityAction advance_rotation_state is valid only if current SecretRotationState lifecycle allows the transition at decision time.
P8 update access policyHARDorderingAccess-policy revisions are ordered by monotonic version within policy scope.
P9 emit audit eventHARDaccountingSecretAuditEvent corresponds to an actual secret read, write, issuance, renewal, revoke, or rotation action committed by the service.
P10 route to shard ownerHARDuniquenessKey shard_id maps to at most one logical outcome current authoritative owner within shard_id.
P11 reassign shard ownershipHARDeligibilityAction reassign_shard is valid only if current owner is failed or relinquished and candidate owner is eligible and sufficiently current on shard_id at decision time.

What matters most #

1. Secret reads are policy-gated current-state reads #

Wrong allow/deny or stale policy is a security failure.

2. Lease lifecycle is authoritative for dynamic credentials #

Renew/revoke/expire must reflect current state.

3. Rotation is a managed lifecycle #

New secret material must become current without losing auditability or exposing partial state.

4. Audit must reflect committed actions #

Audit records should correspond to actual secret access or mutation events, not attempted but uncommitted actions.


Step 5 - Execution Context #

For the baseline secret-management service:

FieldValueWhy
Topologysingle service distributedone logical secret-management system spread across auth, secret, and lease nodes
Write coordination scopeper object scopecorrectness is per secret path, lease, policy, and shard ownership scope
Read consistency targetstrong onlystale secret or policy reads are security-critical
Holder modelclientclients temporarily hold auth sessions and leased credentials
Compensation acceptable?Nowrong secret disclosure or stale credential validity cannot be safely repaired afterward

Derived implications #

  • holder_may_crash = true

    • clients can disappear while holding leases or sessions
  • cross_service_write = false

    • baseline keeps auth, secret, lease, and policy state in one logical service
  • bounded_staleness_allowed = false

    • secret access and lease validation should use authoritative state
  • cross_service_atomicity_required = false

    • no multi-service transaction in baseline
  • exclusive_claim_required = true

    • shard ownership must be exclusive
  • guarded_by_current_state = true

    • auth, lease, revoke, and rotation all depend on current state

What this implies #

This pushes us toward:

  • one authoritative owner per tenant/path shard
  • current-value secret and policy state
  • lease-backed dynamic credentials
  • append-only audit records

Step 6 - Deterministic Mechanism Selection #

PathWrite shapeBase mechanismRequired companions
P1 authenticateguarded state transitionCAS on (state, version) or single writer per shardauth token/session version
P2 write/update static secretoverwrite current valueCAS on versionsecret version
P3 read secretread sourcedirect source readactive session, policy version
P4 issue dynamic credentialappend-only event guarded by current stateleased issuance under active backend statelease id, expiry
P5 renew leaseguarded state transitionCAS on (state, version)lease version
P6 revoke leaseguarded state transitionCAS on (state, version)lease version
P7 rotate secret/backend credentialguarded state transitionlifecycle transitionrotation version/state
P8 update access policyoverwrite current valueCAS on versionpolicy version
P9 emit audit eventappend-only eventappend logaudit correlation id
P10 route to shard ownerexclusive claimleasefencing token, heartbeat
P11 reassign shard ownershipguarded state transitionCAS on (state, version)fencing token, shard catch-up check

Why these fit #

Static secrets and policies #

These are current-value state, so overwrite fits.

Dynamic credentials #

Issuance is an immutable event, but current validity is captured by LeaseState, so issue plus guarded lease lifecycle fits.

Rotation #

Rotation is lifecycle-managed current state, so guarded transition fits.

Audit #

Audit records are immutable facts, so append-only fits.

Canonical substrate implied #

The baseline now points to:

  • sharded secret-management service
  • one owner per tenant/path shard
  • current secret and policy state
  • leased dynamic credentials
  • append-only audit trail

Step 7 - Read Model / Source of Truth #

For a Vault-class system, truth is mostly direct source state. Inventory/status views are derived.

ConceptTruthRead pathRebuild path
C1 auth/session validityAuthSessionStateread source directlyauthoritative session store
C2 current static secret versionSecretVersionStateread source directlyauthoritative secret store
C3 issued dynamic credential historyLeasedCredentialread source directlyauthoritative issuance store
C4 current lease lifecycleLeaseStateread source directlyauthoritative lease store
C5 current rotation lifecycleSecretRotationStateread source directlyauthoritative rotation store
C6 current access policySecretAccessPolicyread source directlyauthoritative policy store
C7 audit historySecretAuditEventread source directlyauthoritative audit store
C8 shard ownershipPartitionOwnershipread source directlyauthoritative ownership store
C9 shard routing mapPartitionMapread source directlyauthoritative routing metadata
C10 inventory / lease dashboardsderived from secret, lease, and audit statematerialized viewrecompute from authoritative state

Important point #

For the core semantics:

  • secret reads and lease actions use authoritative source state
  • audit is source truth, not a derived UI-only artifact
  • dashboards are projections

Step 8 - Failure Handling #

PathRetryCompeting writersCrash after commitPublish failureStale holder
P1 authenticateretry with auth flow/session versionstale auth transition loses guarded updatecommitted session survives crash if persistedclient may not receive token after creationstale shard owner blocked by fencing token
P2 write secretretry with secret versionstale update loses CAScommitted secret version survives crash if persistedaudit emission may lag but must reflect committed writestale shard owner blocked by fencing token
P3 read secretread retry safemany readers coexistnode crash drops request onlyaudit append may retrystale reads forbidden beyond consistency bound
P4 issue dynamic credentialretry may issue multiple credentials unless request correlation or UX contract handles itissuance must be fenced by current session/policy/backend statecommitted issuance survives crash if persistedclient may fail to receive returned credentialstale issuer blocked by ownership/version discipline
P5 renew leaseretry with lease versionstale renew loses guarded transitioncommitted renewal survives crash if persistedaudit emission may lagold lease version rejected
P6 revoke leaseretry with lease versionstale revoke loses guarded transitioncommitted revoke survives crash if persistedbackend revocation side effect may retryold lease version rejected
P7 rotate secret/backend credentialretry with rotation versionstale transition loses guarded updatecommitted rotation survives crash if persisteddownstream consumers may lag in pickupn/a
P8 update policyretry with policy versionstale update loses CAScommitted policy survives crash if persistedevaluator caches may lagn/a
P9 emit audit eventretry with audit correlation idduplicates should be deduped or tolerated by audit sinkcommitted audit survives crash if durableexternal sink lag acceptable if source audit persistsn/a
P10 route to shard ownerretry after refreshing shard maponly one valid owner should existif owner changed, refreshed map points to new ownern/astale owner rejected by fencing token
P11 reassign shard ownershipretry failover transition safelyonly one reassignment wins current ownership statepromoted owner crash triggers later reassignmentn/aold owner fenced and must not continue serving

What matters most #

1. Secret disclosure is the irreversible action #

Wrong disclosure cannot be compensated later.

2. Dynamic credential issuance and delivery are separate #

The service may issue a leased credential even if the client never receives it.

3. Rotation needs staged lifecycle #

New material must be activated and old material retired safely.

4. Revocation side effects may involve external systems #

But the secret service’s source truth is still the authoritative LeaseState.


Step 9 - Scale Adjustments #

HotspotTypeFirst response
high secret-read QPSread hotspotshard by tenant/path and cache only with strict TTL/version bounds where allowed
dynamic lease churnwrite throughput hotspotbatch renewals where possible and tune lease durations
audit volumewrite throughput hotspotseparate append-only audit pipeline from hot secret path
mass rotation eventscontention hotspotrate-limit rotation rollout and stage by scope
policy churncontrol-plane hotspotscope policy changes narrowly and propagate incrementally
inventory/status queriesread hotspotserve from derived views, not hot secret stores

What scales well #

This system scales by:

  • sharding secrets by tenant/path
  • keeping lease records compact
  • separating audit ingestion from secret read latency
  • using dynamic credentials to reduce long-lived static secret exposure

What fails first #

Usually:

  • very hot secret paths
  • lease-renewal storms
  • audit write amplification
  • poorly staged bulk rotations

Canonical design conclusion #

The mechanical outcome is:

  • primary state:
    • AuthSessionState
    • SecretVersionState
    • LeasedCredential
    • LeaseState
    • SecretRotationState
    • SecretAccessPolicy
    • SecretAuditEvent
    • PartitionOwnership
    • PartitionMap
  • critical invariants:
    • guarded auth/session access
    • one authoritative current secret version per path
    • dynamic credential issuance valid only under current policy/backend state
    • renew/revoke valid only for current lease state
    • rotation is a first-class lifecycle transition
    • audit corresponds to committed actions
  • mechanisms:
    • overwrite current value for static secrets and policy
    • append issuance and audit records
    • guarded lease and rotation transitions
    • fenced shard ownership
  • reads:
    • direct authoritative reads for secret and lease truth
    • derived views for inventory and status

Polished interview answer #

I’d build the secret-management system as a sharded strongly consistent service with one authoritative owner per tenant or secret-path shard. Static secrets are stored as current versioned values behind strict policy checks, while dynamic credentials are issued as leased immutable credentials backed by authoritative lease state. Reads, renewals, revocations, and rotations all operate on current policy, session, secret, and lease truth, and every committed action emits an append-only audit record. The main scaling levers are more shards, compact lease records, a separate audit pipeline, staged rotation workflows, and careful limits on caching for security-critical reads.


Concrete Substrate #

I’ll choose a sharded strongly consistent secret-management service with current secret state, lease-backed dynamic credentials, and append-only audit records as the concrete baseline, because it matches the mechanics we derived:

  • current-value secret and policy state
  • lease-backed dynamic credentials
  • guarded rotation and revocation
  • append-only audit
  • one owner per shard

Concrete tech family:

  • secret service in Go or Rust
  • authoritative state in a replicated metadata store or service-owned Raft state machine
  • secure storage/encryption:
    • envelope encryption with KMS/HSM-backed master keys
  • metadata/control:
    • built-in Raft consensus per shard or a small etcd-like control layer

Each shard leader stores:

  • current AuthSessionState
  • current SecretVersionState
  • LeasedCredential issuance references
  • current LeaseState
  • current SecretRotationState
  • current SecretAccessPolicy
  • append-only SecretAuditEvent

Operation Layer #

1. Write static secret #

API

  • PutSecret(path, secret_value, expected_version?)

Initiator

  • admin

Entry point

  • secret API

Authoritative decider

  • shard owner for path

Precondition

  • caller session active
  • policy allows write
  • version matches if optimistic concurrency used

Transition

  • overwrite SecretVersionState(path)
  • append audit event

Response

  • {version}

2. Read secret #

API

  • GetSecret(path)

Initiator

  • client

Entry point

  • secret API

Authoritative decider

  • shard owner for path

Precondition

  • caller session active
  • policy allows read

Transition

  • none on source truth
  • append audit event

Response

  • {secret_value, version}

3. Issue dynamic credential #

API

  • IssueDynamicCredential(backend, role, ttl)

Initiator

  • client

Entry point

  • dynamic-secret API

Authoritative decider

  • shard owner for backend/path plus backend plugin state

Precondition

  • caller session active
  • policy allows issuance
  • backend active

Transition

  • append LeasedCredential
  • create LeaseState = ACTIVE(expiry)
  • append audit event

Response

  • {credential, lease_id, expiry}

4. Renew or revoke lease #

API

  • RenewLease(lease_id, ttl) / RevokeLease(lease_id)

Initiator

  • client or system

Entry point

  • lease API

Authoritative decider

  • shard owner for lease

Precondition

  • current lease state allows transition

Transition

  • guarded update of LeaseState
  • append audit event

Response

  • {expiry} or {revoked: true}

5. Rotate secret or backend root #

API

  • RotateSecret(path_or_backend, action, expected_version?)

Initiator

  • admin/system

Entry point

  • rotation API

Authoritative decider

  • shard owner for path/backend

Precondition

  • current rotation state allows transition

Transition

  • guarded update of SecretRotationState
  • update current SecretVersionState or backend credential linkage
  • append audit event

Entry Point vs Decider vs Responder #

PathEntry pointAuthoritative deciderPhysical responderLogical responder
authenticate / secret read/writesecret APIshard ownerAPI nodesecret service
dynamic credential issuancedynamic-secret APIshard owner + backend stateAPI nodesecret service
renew / revoke leaselease APIshard ownerAPI nodesecret service
rotate secretrotation APIshard ownercontrol/API nodesecret service
shard failoverfollower / coordination layershard quorum / lease storenew leader / control planesecret service

Concrete HLD #

Main components:

  • auth/session frontend
    • authenticates callers and creates sessions/tokens
  • secret shard owners
    • authoritative owners for secret, lease, and policy truth
  • dynamic-secret backend plugins
    • generate leased credentials for databases/cloud systems/etc.
  • audit pipeline
    • stores append-only committed audit events
  • metadata/control service
    • tracks shard ownership and routing
  • encryption/KMS layer
    • protects stored secret material

Short Interview Version #

I’d build the secret-management system as a sharded strongly consistent service with one authoritative owner per tenant or secret-path shard. Static secrets are stored as current versioned values behind strict policy checks, while dynamic credentials are issued as leased immutable credentials backed by authoritative lease state. Reads, renewals, revocations, and rotations all operate on current policy, session, secret, and lease truth, and every committed action emits an append-only audit record. The main scaling levers are more shards, compact lease records, a separate audit pipeline, staged rotation workflows, and careful limits on caching for security-critical reads.