Skip to main content
  1. System Design Components/

API Key Management Service

API Key Management Service #

This note models an API key management service where users or services create keys, secrets are shown once, keys are validated on request paths, and lifecycle operations like rotation, scope updates, revocation, and quota/policy enforcement are handled safely at scale.


Step 1 - Normalize #

Assume the baseline prompt is:

  • design an API key management service
  • users or service owners create API keys
  • raw key secrets are shown once and then only stored securely
  • request paths validate presented API keys quickly
  • keys can be scoped, rotated, disabled, or revoked
  • system scales across many tenants and applications

Normalize into state-affecting paths.

RequirementActorOperationState touchedPriority
User creates API keyClientappend eventS1
create target
ApiKeyCredential
C1
Service stores or updates current key metadataSystemstate transitionS1
update target
ApiKeyState
C1
Caller validates presented API keyClientread sourceS1
read source target
ApiKeyState
C1
Admin updates key scopes / policyAdminoverwrite stateS1
update target
ApiKeyPolicy
C1
User rotates keyClientstate transitionS1
update target
ApiKeyState
C1
User disables or revokes keyClientstate transitionS1
update target
ApiKeyState
C1
System propagates key-validation snapshot to gateways/evaluatorsSystemasync processS1
hidden write target
ApiKeyValidationSnapshot
C1
User reads key inventory / audit historyClientread projectionS1
read projection target
ApiKeyActivityView
R2
System routes tenant/shard to current ownerSystemread sourceS1
read source target
PartitionMap
C1
System reassigns shard ownership after node failureSystemstate transitionS1
update target
PartitionOwnership
C1

Notes on normalization #

Important choices:

  • raw key issuance is append event
    • secret creation is an immutable issuance fact
  • current key metadata is state transition
    • active, disabled, revoked, rotated lifecycle changes over time
  • validation is the hot read path
  • policy is current-value control state
  • snapshot propagation is explicit because gateways or validators usually should not hit the control plane on every request

This system is a hybrid of:

  • credential issuance
  • credential lifecycle state
  • fast request-path validation

Step 2 - Critical Path Selection #

RequirementPriority classWhy
Create API keyC1secure issuance is the core product output
Store/update current key stateC1validation depends on authoritative metadata and lifecycle
Validate API keyC1wrong allow/deny is the core correctness and security failure
Update key scopes / policyC1changes future authorization behavior
Rotate keyC1rotation correctness affects continuity and security
Disable / revoke keyC1revocation must affect future validation
Propagate validation snapshotC1stale gateways/validators can enforce wrong permissions
Read key inventory / auditR2operational/user-facing only
Route to shard ownerC1wrong routing can split key truth
Reassign shard ownershipC1failover must preserve key lifecycle correctness

Baseline critical paths #

Main C1 paths:

  • P1 create API key
  • P2 update key metadata/state
  • P3 validate API key
  • P4 update policy/scopes
  • P5 rotate key
  • P6 disable/revoke key
  • P7 propagate validation snapshot
  • P8 route to shard owner
  • P9 reassign shard ownership

This design is driven by:

  • secure one-time secret issuance
  • one authoritative current lifecycle per key id
  • fast validation from authoritative or approved snapshots
  • revocation and rotation correctness

Step 3 - Primary State Extraction #

For an API key management service, the minimal primary state is the key issuance record, current key lifecycle state, policy/scopes, validation snapshots, and routing/ownership state.

Candidate object labelCandidate sourceCandidate needed for C1/R1?Candidate decomposition actionClassPrimary?OwnerEvolutionScope kindScope value
ApiKeyCredentialdirect nounYeskeep as candidateeventYesserviceappend-onlyinstancekey_id
ApiKeyStatelifecycle objectYeskeep as candidateprocessYesservicestate machineinstancekey_id
ApiKeyPolicydirect nounYeskeep as candidateentityYesserviceoverwriteinstancekey_id or policy_scope
ApiKeyValidationSnapshothidden write targetYeskeep as candidateprojectionYesserviceoverwriteinstancevalidator_scope
PartitionOwnershiphidden write targetYeskeep as candidateprocessYesservicestate machineinstanceshard_id
PartitionMaphidden write targetYeskeep as candidateentityYesserviceoverwritecollectiontenant/shard map
ApiKeyActivityViewderived read modelNoreject as UI artifactprojectionNoderivedoverwritecollectiontenant or owner

Important modeling choices #

ApiKeyCredential #

Primary because:

  • key creation is an immutable issuance event
  • captures key id, secret-hash reference, owner, creation context, and audit metadata

ApiKeyState #

Primary because:

  • this is the central lifecycle object
  • captures states like ACTIVE, DISABLED, REVOKED, ROTATING, EXPIRED

ApiKeyPolicy #

Primary because:

  • validation and authorization depend on scopes, product limits, service permissions, and optional IP/app restrictions

ApiKeyValidationSnapshot #

Kept explicit because:

  • hot-path validators usually need local read-optimized data

Minimal strict primary set #

The strongest minimal set is:

  • ApiKeyCredential
  • ApiKeyState
  • ApiKeyPolicy
  • ApiKeyValidationSnapshot
  • PartitionOwnership
  • PartitionMap

Step 4 - Hard Invariants #

For an API key management service, the hard invariants are about secure issuance, one authoritative current key lifecycle, valid validation against current key/policy state, and safe rotation/revocation.

PathTierTypeInvariant statement
P1 create API keyHARDuniquenessKey key_id maps to at most one logical outcome issued API key credential within key scope.
P1 create API keyHARDaccountingRaw secret material is revealed only at issuance time, while subsequent validation uses secure stored representation.
P2 update key metadata/stateHARDuniquenessKey key_id maps to at most one logical outcome current authoritative key lifecycle state within key scope.
P3 validate API keyHARDeligibilityDecision validate_api_key(presented_key) is valid only if current ApiKeyState and ApiKeyPolicy allow usage for the request context at decision time.
P4 update policy/scopesHARDorderingAPI-key policy revisions are ordered by monotonic version within policy scope.
P5 rotate keyHARDeligibilityAction rotate_key is valid only if current ApiKeyState allows rotation and current owner/policy permits it at decision time.
P6 disable/revoke keyHARDeligibilityAction revoke_key is valid only if current ApiKeyState allows revocation/disable at decision time.
P7 propagate validation snapshotHARDfreshnessApiKeyValidationSnapshot reflects authoritative key and policy state within configured propagation bounds and moves monotonically forward by version.
P8 route to shard ownerHARDuniquenessKey shard_id maps to at most one logical outcome current authoritative owner within shard_id.
P9 reassign shard ownershipHARDeligibilityAction reassign_shard is valid only if current owner is failed or relinquished and candidate owner is eligible and sufficiently current on shard_id at decision time.

What matters most #

1. Raw secret is one-time output #

After issuance, the system should validate using a hash or secure derived representation, not the raw secret.

2. Current key lifecycle is authoritative #

Validation must use the current active/disabled/revoked state.

3. Snapshot monotonicity matters #

Validators must not move backward to older key/policy state.

4. Revocation must affect future validation #

Once revoked, future request-path validation must deny.


Step 5 - Execution Context #

For the baseline API key management service:

FieldValueWhy
Topologysingle service distributedone logical key-management service with control plane and validator/gateway readers
Write coordination scopeper object scopecorrectness is per key, policy, and shard ownership scope
Read consistency targetbounded stale allowedhot-path validation often uses local snapshots with strict freshness bounds
Holder modelnoneno lease-like client ownership is central to key correctness
Compensation acceptable?Nowrong key validation or stale revocation cannot be repaired afterward

Derived implications #

  • holder_may_crash = false

    • validators can fail, but they do not own mutable business state like workers
  • cross_service_write = false

    • baseline keeps key state, policy, and snapshots in one logical service
  • bounded_staleness_allowed = true

    • request-path validation can use bounded-stale local snapshots if explicit
  • cross_service_atomicity_required = false

    • no multi-service transaction across unrelated services in baseline
  • exclusive_claim_required = true

    • shard ownership must be exclusive
  • guarded_by_current_state = true

    • rotation and revocation depend on current key lifecycle

What this implies #

This pushes us toward:

  • one authoritative owner per tenant/key shard
  • current key and policy state in the control plane
  • fast local validator snapshots
  • monotonic config distribution

Step 6 - Deterministic Mechanism Selection #

PathWrite shapeBase mechanismRequired companions
P1 create API keyappend-only eventsecure issuance recordsecret hashing, one-time display token
P2 update key metadata/stateguarded state transitionCAS on (state, version)lifecycle version
P3 validate API keyread sourcedirect source read or local snapshot readsnapshot version, secure hash lookup
P4 update policy/scopesoverwrite current valueCAS on versionpolicy version
P5 rotate keyguarded state transition plus append issuanceCAS on (state, version)new secret issuance, overlap policy
P6 disable/revoke keyguarded state transitionCAS on (state, version)lifecycle version
P7 propagate validation snapshotoverwrite current valuesingle writer snapshot publicationconfig version
P8 route to shard ownerexclusive claimleasefencing token, heartbeat
P9 reassign shard ownershipguarded state transitionCAS on (state, version)fencing token, shard catch-up check

Why these fit #

Key issuance #

Issuance creates an immutable credential fact, so append-style recording fits.

Key lifecycle changes #

Activation, disable, revoke, and rotate depend on current state, so guarded transition fits.

Validation #

Validation is a hot read path over authoritative or approved snapshot state.

Snapshot distribution #

Validators need current-value local config, so overwrite snapshot publication fits.

Canonical substrate implied #

The baseline now points to:

  • sharded key-management service
  • one owner per tenant/key shard
  • append-only issuance/audit records
  • current key lifecycle and policy state
  • local validation snapshots for gateways/evaluators

Step 7 - Read Model / Source of Truth #

For an API key management service, truth is mostly direct source state plus distributed validation snapshots. Activity views are derived.

ConceptTruthRead pathRebuild path
C1 key issuance historyApiKeyCredentialread source directlyauthoritative issuance store
C2 current key lifecycleApiKeyStateread source directlyauthoritative key-state store
C3 current key policy/scopesApiKeyPolicyread source directlyauthoritative policy store
C4 local validator stateApiKeyValidationSnapshotmaterialized viewrebuild from latest key and policy state
C5 shard ownershipPartitionOwnershipread source directlyauthoritative ownership store
C6 shard routing mapPartitionMapread source directlyauthoritative routing metadata
C7 key inventory / audit viewsderived from issuance and lifecycle statematerialized viewrecompute from authoritative state

Important point #

For the core semantics:

  • authoritative truth lives in key lifecycle and policy state
  • validators usually read local snapshots for the hot path
  • inventory and audit UX are projections

Step 8 - Failure Handling #

PathRetryCompeting writersCrash after commitPublish failureStale holder
P1 create API keyretry may issue multiple keys unless request id or UX contract handles duplicatescompeting creates coexistcommitted issuance survives crash if persistedclient may fail to receive displayed secret even after issuancestale shard owner blocked by fencing token
P2 update key stateretry with lifecycle versionstale lifecycle update loses guarded transitioncommitted key state survives crash if persistedsnapshot propagation may lagstale shard owner blocked by fencing token
P3 validate API keyrequest retry safemany validators can answer concurrently from same snapshotvalidator crash drops request onlyn/astale decision bounded by configured snapshot freshness
P4 update policy/scopesretry with policy versionstale update loses CAScommitted policy survives crash if persistedsnapshot propagation may lagstale shard owner blocked by fencing token
P5 rotate keyretry may issue multiple replacement keys unless rotation flow is fencedstale rotation loses guarded transitioncommitted new key survives crash if persistedowner may fail to receive new raw secret after issuancestale shard owner blocked by fencing token
P6 disable/revoke keyretry with lifecycle versionstale revoke loses guarded transitioncommitted revocation survives crash if persistedvalidators may lag within freshness boundn/a
P7 propagate snapshotretry with versioned snapshotolder snapshot loses to newer versionvalidator keeps last good snapshot until refreshfailed push retried or pulledn/a
P8 route to shard ownerretry after refreshing shard maponly one valid owner should existif owner changed, refreshed map points to new ownern/astale owner rejected by fencing token
P9 reassign shard ownershipretry failover transition safelyonly one reassignment wins current ownership statepromoted owner crash triggers later reassignmentn/aold owner fenced and must not continue serving

What matters most #

1. One-time secret delivery #

If the client misses the returned secret, the system usually cannot show it again and must rotate/create a new key.

2. Validation freshness versus latency #

The main tradeoff is:

  • direct source read for every request
  • versus local snapshot with bounded revocation lag

3. Rotation overlap policy #

Some systems allow old and new keys to coexist briefly; others require immediate replacement.

4. Secure lookup strategy #

Validation usually uses a hashed/prefixed lookup, not raw-secret storage.


Step 9 - Scale Adjustments #

HotspotTypeFirst response
very high validation QPSread hotspotpush local snapshots to gateways and keep validation in-memory
hot tenants with many keyscontention hotspotshard by tenant and isolate large tenants
frequent policy or revocation churnfan-out hotspotbatch snapshot propagation and scope updates more narrowly
audit/inventory queriesread hotspotserve from projections, not hot validation path
rotation spikeswrite throughput hotspotrate-limit bulk rotations and queue background snapshot updates
prefix/hash index sizememory hotspotuse compact hashed index plus prefix partitioning

What scales well #

This system scales by:

  • sharding key truth by tenant/key scope
  • validating from local snapshots
  • using compact hashed key indexes
  • keeping audit and inventory views off the request hot path

What fails first #

Usually:

  • validation QPS spikes
  • very large tenants with hot revocation churn
  • broad snapshot invalidation on mass policy changes
  • audit queries hitting primary stores

Canonical design conclusion #

The mechanical outcome is:

  • primary state:
    • ApiKeyCredential
    • ApiKeyState
    • ApiKeyPolicy
    • ApiKeyValidationSnapshot
    • PartitionOwnership
    • PartitionMap
  • critical invariants:
    • secure one-time key issuance
    • one authoritative current lifecycle per key id
    • validation against current key and policy state
    • revocation and rotation reflected in future validation
    • monotonic validator snapshot propagation
    • exclusive shard ownership for key truth
  • mechanisms:
    • append issuance records
    • guarded key lifecycle transitions
    • current-value policy state
    • versioned snapshot publication
    • fenced shard ownership
  • reads:
    • hot validation from authoritative or approved local snapshot
    • projections for inventory and audit views

Polished interview answer #

I’d build the API key system as a sharded credential-management service with local validator snapshots. Creating a key generates a one-time raw secret for the owner and stores only a secure derived representation plus immutable issuance metadata. The source of truth is current key lifecycle state and current key policy/scopes, and validators answer hot-path requests from versioned local snapshots so request latency stays low. Disable, revoke, and rotate are guarded lifecycle transitions, and snapshot propagation is monotonic so validators never move backward to older key state. The main scaling levers are more tenant shards, compact hashed key indexes, bounded-stale validator snapshots, and keeping audit/inventory reads off the hot validation path.


Concrete Substrate #

I’ll choose a control-plane/data-plane key-management system with authoritative key shards plus local validator snapshots as the concrete baseline, because it matches the mechanics we derived:

  • append-only key issuance records
  • current key lifecycle and policy state
  • monotonic validation snapshot publication
  • one owner per shard

Concrete tech family:

  • control plane in Go or Java
  • authoritative state store:
    • replicated DB or RocksDB-backed service state
  • metadata/control:
    • etcd or internal metadata quorum for shard ownership/routing
  • validators at API gateways, sidecars, or a central auth middleware fleet using in-memory snapshots

Each shard owner stores:

  • issuance history
  • current key state
  • current key policy/scopes
  • latest ApiKeyValidationSnapshot metadata per validator scope

Validators store:

  • in-memory ApiKeyValidationSnapshot
  • hashed/prefixed key lookup index

Operation Layer #

1. Create API key #

API

  • CreateApiKey(owner, scope, metadata, request_id?)

Initiator

  • user/client

Entry point

  • key-management API

Authoritative decider

  • shard owner for tenant/key scope

Precondition

  • owner authorized to create key
  • policy allows key creation

Transition

  • generate raw secret
  • append ApiKeyCredential
  • create ApiKeyState = ACTIVE
  • create or attach ApiKeyPolicy

Response

  • {key_id, raw_secret_once}

2. Validate API key #

API

  • ValidateApiKey(presented_key, request_context)

Initiator

  • gateway / service / client

Entry point

  • validator / gateway

Authoritative decider

  • local ApiKeyValidationSnapshot, or source shard in strong mode

Precondition

  • validator snapshot version valid for tenant/scope

Transition

  • none on source truth

Response

  • {allow|deny, key_id, scopes, snapshot_version}

3. Revoke key #

API

  • RevokeApiKey(key_id, actor, expected_version?)

Initiator

  • user/client or admin

Entry point

  • key-management API

Authoritative decider

  • shard owner for key

Precondition

  • current key state is revocable

Transition

  • guarded update ApiKeyState -> REVOKED
  • trigger snapshot propagation

4. Rotate key #

API

  • RotateApiKey(key_id, actor, expected_version?)

Initiator

  • user/client

Entry point

  • key-management API

Authoritative decider

  • shard owner for key

Precondition

  • current key state allows rotation

Transition

  • issue new ApiKeyCredential
  • update ApiKeyState for old/new credentials per rotation policy
  • trigger snapshot propagation

Response

  • {new_key_id, new_raw_secret_once}

5. Propagate validation snapshot #

API

  • internal snapshot push/pull

Initiator

  • system

Entry point

  • control plane / validator

Authoritative decider

  • snapshot publisher

Precondition

  • newer key/policy version exists

Transition

  • overwrite ApiKeyValidationSnapshot

Entry Point vs Decider vs Responder #

PathEntry pointAuthoritative deciderPhysical responderLogical responder
create / revoke / rotate keykey-management APIkey shard ownerAPI nodekey-management service
validate keygateway / validatorlocal validator snapshot or source shardgateway/validator nodekey-management service
snapshot propagationcontrol plane / validatorsnapshot publishercontrol/data-plane nodekey-management service
shard failoverfollower / coordination layershard quorum / lease storenew leader / control planekey-management service

Concrete HLD #

Main components:

  • key-management control-plane API
    • handles create, revoke, rotate, and policy updates
  • key shard owners
    • authoritative owners for key lifecycle and policy truth
  • validator fleet or gateway plugin
    • validates presented keys from local snapshots
  • metadata/control service
    • tracks shard ownership and routing
  • audit/activity pipeline
    • serves key inventory and compliance views

Short Interview Version #

I’d build the API key system as a sharded credential-management service with local validator snapshots. Creating a key generates a one-time raw secret for the owner and stores only a secure derived representation plus immutable issuance metadata. The source of truth is current key lifecycle state and current key policy/scopes, and validators answer hot-path requests from versioned local snapshots so request latency stays low. Disable, revoke, and rotate are guarded lifecycle transitions, and snapshot propagation is monotonic so validators never move backward to older key state. The main scaling levers are more tenant shards, compact hashed key indexes, bounded-stale validator snapshots, and keeping audit/inventory reads off the hot validation path.