API Key Management Service
API Key Management Service #
This note models an API key management service where users or services create keys, secrets are shown once, keys are validated on request paths, and lifecycle operations like rotation, scope updates, revocation, and quota/policy enforcement are handled safely at scale.
Step 1 - Normalize #
Assume the baseline prompt is:
- design an API key management service
- users or service owners create API keys
- raw key secrets are shown once and then only stored securely
- request paths validate presented API keys quickly
- keys can be scoped, rotated, disabled, or revoked
- system scales across many tenants and applications
Normalize into state-affecting paths.
| Requirement | Actor | Operation | State touched | Priority |
|---|---|---|---|---|
| User creates API key | Client | append event | S1create targetApiKeyCredential | C1 |
| Service stores or updates current key metadata | System | state transition | S1update targetApiKeyState | C1 |
| Caller validates presented API key | Client | read source | S1read source targetApiKeyState | C1 |
| Admin updates key scopes / policy | Admin | overwrite state | S1update targetApiKeyPolicy | C1 |
| User rotates key | Client | state transition | S1update targetApiKeyState | C1 |
| User disables or revokes key | Client | state transition | S1update targetApiKeyState | C1 |
| System propagates key-validation snapshot to gateways/evaluators | System | async process | S1hidden write targetApiKeyValidationSnapshot | C1 |
| User reads key inventory / audit history | Client | read projection | S1read projection targetApiKeyActivityView | R2 |
| System routes tenant/shard to current owner | System | read source | S1read source targetPartitionMap | C1 |
| System reassigns shard ownership after node failure | System | state transition | S1update targetPartitionOwnership | C1 |
Notes on normalization #
Important choices:
- raw key issuance is
append event- secret creation is an immutable issuance fact
- current key metadata is
state transition- active, disabled, revoked, rotated lifecycle changes over time
- validation is the hot read path
- policy is current-value control state
- snapshot propagation is explicit because gateways or validators usually should not hit the control plane on every request
This system is a hybrid of:
credential issuancecredential lifecycle statefast request-path validation
Step 2 - Critical Path Selection #
| Requirement | Priority class | Why |
|---|---|---|
| Create API key | C1 | secure issuance is the core product output |
| Store/update current key state | C1 | validation depends on authoritative metadata and lifecycle |
| Validate API key | C1 | wrong allow/deny is the core correctness and security failure |
| Update key scopes / policy | C1 | changes future authorization behavior |
| Rotate key | C1 | rotation correctness affects continuity and security |
| Disable / revoke key | C1 | revocation must affect future validation |
| Propagate validation snapshot | C1 | stale gateways/validators can enforce wrong permissions |
| Read key inventory / audit | R2 | operational/user-facing only |
| Route to shard owner | C1 | wrong routing can split key truth |
| Reassign shard ownership | C1 | failover must preserve key lifecycle correctness |
Baseline critical paths #
Main C1 paths:
P1create API keyP2update key metadata/stateP3validate API keyP4update policy/scopesP5rotate keyP6disable/revoke keyP7propagate validation snapshotP8route to shard ownerP9reassign shard ownership
This design is driven by:
- secure one-time secret issuance
- one authoritative current lifecycle per key id
- fast validation from authoritative or approved snapshots
- revocation and rotation correctness
Step 3 - Primary State Extraction #
For an API key management service, the minimal primary state is the key issuance record, current key lifecycle state, policy/scopes, validation snapshots, and routing/ownership state.
| Candidate object label | Candidate source | Candidate needed for C1/R1? | Candidate decomposition action | Class | Primary? | Owner | Evolution | Scope kind | Scope value |
|---|---|---|---|---|---|---|---|---|---|
| ApiKeyCredential | direct noun | Yes | keep as candidate | event | Yes | service | append-only | instance | key_id |
| ApiKeyState | lifecycle object | Yes | keep as candidate | process | Yes | service | state machine | instance | key_id |
| ApiKeyPolicy | direct noun | Yes | keep as candidate | entity | Yes | service | overwrite | instance | key_id or policy_scope |
| ApiKeyValidationSnapshot | hidden write target | Yes | keep as candidate | projection | Yes | service | overwrite | instance | validator_scope |
| PartitionOwnership | hidden write target | Yes | keep as candidate | process | Yes | service | state machine | instance | shard_id |
| PartitionMap | hidden write target | Yes | keep as candidate | entity | Yes | service | overwrite | collection | tenant/shard map |
| ApiKeyActivityView | derived read model | No | reject as UI artifact | projection | No | derived | overwrite | collection | tenant or owner |
Important modeling choices #
ApiKeyCredential #
Primary because:
- key creation is an immutable issuance event
- captures key id, secret-hash reference, owner, creation context, and audit metadata
ApiKeyState #
Primary because:
- this is the central lifecycle object
- captures states like
ACTIVE,DISABLED,REVOKED,ROTATING,EXPIRED
ApiKeyPolicy #
Primary because:
- validation and authorization depend on scopes, product limits, service permissions, and optional IP/app restrictions
ApiKeyValidationSnapshot #
Kept explicit because:
- hot-path validators usually need local read-optimized data
Minimal strict primary set #
The strongest minimal set is:
ApiKeyCredentialApiKeyStateApiKeyPolicyApiKeyValidationSnapshotPartitionOwnershipPartitionMap
Step 4 - Hard Invariants #
For an API key management service, the hard invariants are about secure issuance, one authoritative current key lifecycle, valid validation against current key/policy state, and safe rotation/revocation.
| Path | Tier | Type | Invariant statement |
|---|---|---|---|
P1 create API key | HARD | uniqueness | Key key_id maps to at most one logical outcome issued API key credential within key scope. |
P1 create API key | HARD | accounting | Raw secret material is revealed only at issuance time, while subsequent validation uses secure stored representation. |
P2 update key metadata/state | HARD | uniqueness | Key key_id maps to at most one logical outcome current authoritative key lifecycle state within key scope. |
P3 validate API key | HARD | eligibility | Decision validate_api_key(presented_key) is valid only if current ApiKeyState and ApiKeyPolicy allow usage for the request context at decision time. |
P4 update policy/scopes | HARD | ordering | API-key policy revisions are ordered by monotonic version within policy scope. |
P5 rotate key | HARD | eligibility | Action rotate_key is valid only if current ApiKeyState allows rotation and current owner/policy permits it at decision time. |
P6 disable/revoke key | HARD | eligibility | Action revoke_key is valid only if current ApiKeyState allows revocation/disable at decision time. |
P7 propagate validation snapshot | HARD | freshness | ApiKeyValidationSnapshot reflects authoritative key and policy state within configured propagation bounds and moves monotonically forward by version. |
P8 route to shard owner | HARD | uniqueness | Key shard_id maps to at most one logical outcome current authoritative owner within shard_id. |
P9 reassign shard ownership | HARD | eligibility | Action reassign_shard is valid only if current owner is failed or relinquished and candidate owner is eligible and sufficiently current on shard_id at decision time. |
What matters most #
1. Raw secret is one-time output #
After issuance, the system should validate using a hash or secure derived representation, not the raw secret.
2. Current key lifecycle is authoritative #
Validation must use the current active/disabled/revoked state.
3. Snapshot monotonicity matters #
Validators must not move backward to older key/policy state.
4. Revocation must affect future validation #
Once revoked, future request-path validation must deny.
Step 5 - Execution Context #
For the baseline API key management service:
| Field | Value | Why |
|---|---|---|
| Topology | single service distributed | one logical key-management service with control plane and validator/gateway readers |
| Write coordination scope | per object scope | correctness is per key, policy, and shard ownership scope |
| Read consistency target | bounded stale allowed | hot-path validation often uses local snapshots with strict freshness bounds |
| Holder model | none | no lease-like client ownership is central to key correctness |
| Compensation acceptable? | No | wrong key validation or stale revocation cannot be repaired afterward |
Derived implications #
holder_may_crash = false- validators can fail, but they do not own mutable business state like workers
cross_service_write = false- baseline keeps key state, policy, and snapshots in one logical service
bounded_staleness_allowed = true- request-path validation can use bounded-stale local snapshots if explicit
cross_service_atomicity_required = false- no multi-service transaction across unrelated services in baseline
exclusive_claim_required = true- shard ownership must be exclusive
guarded_by_current_state = true- rotation and revocation depend on current key lifecycle
What this implies #
This pushes us toward:
- one authoritative owner per tenant/key shard
- current key and policy state in the control plane
- fast local validator snapshots
- monotonic config distribution
Step 6 - Deterministic Mechanism Selection #
| Path | Write shape | Base mechanism | Required companions |
|---|---|---|---|
P1 create API key | append-only event | secure issuance record | secret hashing, one-time display token |
P2 update key metadata/state | guarded state transition | CAS on (state, version) | lifecycle version |
P3 validate API key | read source | direct source read or local snapshot read | snapshot version, secure hash lookup |
P4 update policy/scopes | overwrite current value | CAS on version | policy version |
P5 rotate key | guarded state transition plus append issuance | CAS on (state, version) | new secret issuance, overlap policy |
P6 disable/revoke key | guarded state transition | CAS on (state, version) | lifecycle version |
P7 propagate validation snapshot | overwrite current value | single writer snapshot publication | config version |
P8 route to shard owner | exclusive claim | lease | fencing token, heartbeat |
P9 reassign shard ownership | guarded state transition | CAS on (state, version) | fencing token, shard catch-up check |
Why these fit #
Key issuance #
Issuance creates an immutable credential fact, so append-style recording fits.
Key lifecycle changes #
Activation, disable, revoke, and rotate depend on current state, so guarded transition fits.
Validation #
Validation is a hot read path over authoritative or approved snapshot state.
Snapshot distribution #
Validators need current-value local config, so overwrite snapshot publication fits.
Canonical substrate implied #
The baseline now points to:
- sharded key-management service
- one owner per tenant/key shard
- append-only issuance/audit records
- current key lifecycle and policy state
- local validation snapshots for gateways/evaluators
Step 7 - Read Model / Source of Truth #
For an API key management service, truth is mostly direct source state plus distributed validation snapshots. Activity views are derived.
| Concept | Truth | Read path | Rebuild path |
|---|---|---|---|
C1 key issuance history | ApiKeyCredential | read source directly | authoritative issuance store |
C2 current key lifecycle | ApiKeyState | read source directly | authoritative key-state store |
C3 current key policy/scopes | ApiKeyPolicy | read source directly | authoritative policy store |
C4 local validator state | ApiKeyValidationSnapshot | materialized view | rebuild from latest key and policy state |
C5 shard ownership | PartitionOwnership | read source directly | authoritative ownership store |
C6 shard routing map | PartitionMap | read source directly | authoritative routing metadata |
C7 key inventory / audit views | derived from issuance and lifecycle state | materialized view | recompute from authoritative state |
Important point #
For the core semantics:
- authoritative truth lives in key lifecycle and policy state
- validators usually read local snapshots for the hot path
- inventory and audit UX are projections
Step 8 - Failure Handling #
| Path | Retry | Competing writers | Crash after commit | Publish failure | Stale holder |
|---|---|---|---|---|---|
P1 create API key | retry may issue multiple keys unless request id or UX contract handles duplicates | competing creates coexist | committed issuance survives crash if persisted | client may fail to receive displayed secret even after issuance | stale shard owner blocked by fencing token |
P2 update key state | retry with lifecycle version | stale lifecycle update loses guarded transition | committed key state survives crash if persisted | snapshot propagation may lag | stale shard owner blocked by fencing token |
P3 validate API key | request retry safe | many validators can answer concurrently from same snapshot | validator crash drops request only | n/a | stale decision bounded by configured snapshot freshness |
P4 update policy/scopes | retry with policy version | stale update loses CAS | committed policy survives crash if persisted | snapshot propagation may lag | stale shard owner blocked by fencing token |
P5 rotate key | retry may issue multiple replacement keys unless rotation flow is fenced | stale rotation loses guarded transition | committed new key survives crash if persisted | owner may fail to receive new raw secret after issuance | stale shard owner blocked by fencing token |
P6 disable/revoke key | retry with lifecycle version | stale revoke loses guarded transition | committed revocation survives crash if persisted | validators may lag within freshness bound | n/a |
P7 propagate snapshot | retry with versioned snapshot | older snapshot loses to newer version | validator keeps last good snapshot until refresh | failed push retried or pulled | n/a |
P8 route to shard owner | retry after refreshing shard map | only one valid owner should exist | if owner changed, refreshed map points to new owner | n/a | stale owner rejected by fencing token |
P9 reassign shard ownership | retry failover transition safely | only one reassignment wins current ownership state | promoted owner crash triggers later reassignment | n/a | old owner fenced and must not continue serving |
What matters most #
1. One-time secret delivery #
If the client misses the returned secret, the system usually cannot show it again and must rotate/create a new key.
2. Validation freshness versus latency #
The main tradeoff is:
- direct source read for every request
- versus local snapshot with bounded revocation lag
3. Rotation overlap policy #
Some systems allow old and new keys to coexist briefly; others require immediate replacement.
4. Secure lookup strategy #
Validation usually uses a hashed/prefixed lookup, not raw-secret storage.
Step 9 - Scale Adjustments #
| Hotspot | Type | First response |
|---|---|---|
| very high validation QPS | read hotspot | push local snapshots to gateways and keep validation in-memory |
| hot tenants with many keys | contention hotspot | shard by tenant and isolate large tenants |
| frequent policy or revocation churn | fan-out hotspot | batch snapshot propagation and scope updates more narrowly |
| audit/inventory queries | read hotspot | serve from projections, not hot validation path |
| rotation spikes | write throughput hotspot | rate-limit bulk rotations and queue background snapshot updates |
| prefix/hash index size | memory hotspot | use compact hashed index plus prefix partitioning |
What scales well #
This system scales by:
- sharding key truth by tenant/key scope
- validating from local snapshots
- using compact hashed key indexes
- keeping audit and inventory views off the request hot path
What fails first #
Usually:
- validation QPS spikes
- very large tenants with hot revocation churn
- broad snapshot invalidation on mass policy changes
- audit queries hitting primary stores
Canonical design conclusion #
The mechanical outcome is:
- primary state:
ApiKeyCredentialApiKeyStateApiKeyPolicyApiKeyValidationSnapshotPartitionOwnershipPartitionMap
- critical invariants:
- secure one-time key issuance
- one authoritative current lifecycle per key id
- validation against current key and policy state
- revocation and rotation reflected in future validation
- monotonic validator snapshot propagation
- exclusive shard ownership for key truth
- mechanisms:
- append issuance records
- guarded key lifecycle transitions
- current-value policy state
- versioned snapshot publication
- fenced shard ownership
- reads:
- hot validation from authoritative or approved local snapshot
- projections for inventory and audit views
Polished interview answer #
I’d build the API key system as a sharded credential-management service with local validator snapshots. Creating a key generates a one-time raw secret for the owner and stores only a secure derived representation plus immutable issuance metadata. The source of truth is current key lifecycle state and current key policy/scopes, and validators answer hot-path requests from versioned local snapshots so request latency stays low. Disable, revoke, and rotate are guarded lifecycle transitions, and snapshot propagation is monotonic so validators never move backward to older key state. The main scaling levers are more tenant shards, compact hashed key indexes, bounded-stale validator snapshots, and keeping audit/inventory reads off the hot validation path.
Concrete Substrate #
I’ll choose a control-plane/data-plane key-management system with authoritative key shards plus local validator snapshots as the concrete baseline, because it matches the mechanics we derived:
- append-only key issuance records
- current key lifecycle and policy state
- monotonic validation snapshot publication
- one owner per shard
Concrete tech family:
- control plane in
GoorJava - authoritative state store:
- replicated DB or
RocksDB-backed service state
- replicated DB or
- metadata/control:
etcdor internal metadata quorum for shard ownership/routing
- validators at API gateways, sidecars, or a central auth middleware fleet using in-memory snapshots
Each shard owner stores:
- issuance history
- current key state
- current key policy/scopes
- latest
ApiKeyValidationSnapshotmetadata per validator scope
Validators store:
- in-memory
ApiKeyValidationSnapshot - hashed/prefixed key lookup index
Operation Layer #
1. Create API key #
API
CreateApiKey(owner, scope, metadata, request_id?)
Initiator
- user/client
Entry point
- key-management API
Authoritative decider
- shard owner for tenant/key scope
Precondition
- owner authorized to create key
- policy allows key creation
Transition
- generate raw secret
- append
ApiKeyCredential - create
ApiKeyState = ACTIVE - create or attach
ApiKeyPolicy
Response
{key_id, raw_secret_once}
2. Validate API key #
API
ValidateApiKey(presented_key, request_context)
Initiator
- gateway / service / client
Entry point
- validator / gateway
Authoritative decider
- local
ApiKeyValidationSnapshot, or source shard in strong mode
Precondition
- validator snapshot version valid for tenant/scope
Transition
- none on source truth
Response
{allow|deny, key_id, scopes, snapshot_version}
3. Revoke key #
API
RevokeApiKey(key_id, actor, expected_version?)
Initiator
- user/client or admin
Entry point
- key-management API
Authoritative decider
- shard owner for key
Precondition
- current key state is revocable
Transition
- guarded update
ApiKeyState -> REVOKED - trigger snapshot propagation
4. Rotate key #
API
RotateApiKey(key_id, actor, expected_version?)
Initiator
- user/client
Entry point
- key-management API
Authoritative decider
- shard owner for key
Precondition
- current key state allows rotation
Transition
- issue new
ApiKeyCredential - update
ApiKeyStatefor old/new credentials per rotation policy - trigger snapshot propagation
Response
{new_key_id, new_raw_secret_once}
5. Propagate validation snapshot #
API
- internal snapshot push/pull
Initiator
- system
Entry point
- control plane / validator
Authoritative decider
- snapshot publisher
Precondition
- newer key/policy version exists
Transition
- overwrite
ApiKeyValidationSnapshot
Entry Point vs Decider vs Responder #
| Path | Entry point | Authoritative decider | Physical responder | Logical responder |
|---|---|---|---|---|
| create / revoke / rotate key | key-management API | key shard owner | API node | key-management service |
| validate key | gateway / validator | local validator snapshot or source shard | gateway/validator node | key-management service |
| snapshot propagation | control plane / validator | snapshot publisher | control/data-plane node | key-management service |
| shard failover | follower / coordination layer | shard quorum / lease store | new leader / control plane | key-management service |
Concrete HLD #
Main components:
- key-management control-plane API
- handles create, revoke, rotate, and policy updates
- key shard owners
- authoritative owners for key lifecycle and policy truth
- validator fleet or gateway plugin
- validates presented keys from local snapshots
- metadata/control service
- tracks shard ownership and routing
- audit/activity pipeline
- serves key inventory and compliance views
Short Interview Version #
I’d build the API key system as a sharded credential-management service with local validator snapshots. Creating a key generates a one-time raw secret for the owner and stores only a secure derived representation plus immutable issuance metadata. The source of truth is current key lifecycle state and current key policy/scopes, and validators answer hot-path requests from versioned local snapshots so request latency stays low. Disable, revoke, and rotate are guarded lifecycle transitions, and snapshot propagation is monotonic so validators never move backward to older key state. The main scaling levers are more tenant shards, compact hashed key indexes, bounded-stale validator snapshots, and keeping audit/inventory reads off the hot validation path.