Skip to main content
  1. System Design Components/

Authentication Service / SSO / SAML Provider

Authentication Service / SSO / SAML Provider #

This note models an authentication service / SSO / SAML identity provider where users authenticate, sessions are established, signed assertions or tokens are issued to relying parties, and lifecycle operations like MFA, logout, and key rotation are handled safely at scale.


Step 1 - Normalize #

Assume the baseline prompt is:

  • design an authentication service / SSO / SAML provider
  • users authenticate to an identity provider
  • relying parties redirect users for login
  • service issues signed assertions or tokens
  • sessions can be reused, revoked, or logged out
  • MFA and policy may apply
  • system scales across many tenants and apps

Normalize into state-affecting paths.

RequirementActorOperationState touchedPriority
User starts login / auth flowClientappend eventS1
create target
AuthRequest
C1
Service verifies credential / factorSystemstate transitionS1
update target
AuthenticationAttemptState
C1
Service creates or refreshes user sessionSystemstate transitionS1
update target
SessionState
C1
Service issues SAML assertion / OAuth tokenSystemappend eventS1
create target
IssuedCredential
C1
Service validates relying-party config / trust relationshipSystemread sourceS1
read source target
ServiceProviderConfig
C1
Admin updates identity / access policyAdminoverwrite stateS1
update target
AuthPolicy
C1
Admin updates service-provider / app configAdminoverwrite stateS1
update target
ServiceProviderConfig
C1
User logs out or admin revokes sessionClientstate transitionS1
update target
SessionState
C1
System rotates signing keys / certsSystemstate transitionS1
update target
SigningKeyState
C1
User or relying party introspects / validates token or sessionClientread sourceS1
read source target
SessionState
R1
User reads account/session activityClientread projectionS1
read projection target
IdentityActivityView
R2
System routes tenant/shard to current ownerSystemread sourceS1
read source target
PartitionMap
C1
System reassigns shard ownership after node failureSystemstate transitionS1
update target
PartitionOwnership
C1

Notes on normalization #

Important choices:

  • auth flow start is append event
    • login request is an immutable interaction fact
  • credential verification is state transition
    • auth attempt moves through challenge/success/failure states
  • session creation is state transition
    • current authenticated session lifecycle changes
  • assertion/token issuance is append event
    • each issued credential is a fact, even if short-lived
  • policy and SP config are current-value control state
  • logout/revocation is a current session lifecycle transition
  • signing-key rotation is explicit because trust and verification depend on it

This system is a hybrid of:

  • identity and credential verification
  • session lifecycle state
  • signed credential issuance

Step 2 - Critical Path Selection #

RequirementPriority classWhy
Start login / auth flowC1login intent and flow correlation must be preserved
Verify credential / factorC1wrong auth decision is a security failure
Create / refresh sessionC1session truth drives reuse, logout, and revocation
Issue assertion / tokenC1signed credentials are the core product output
Read SP/app trust configC1wrong relying-party config breaks trust boundaries
Update auth policyC1policy changes affect future auth decisions
Update SP/app configC1trust metadata and ACS/redirect settings must be correct
Logout / revoke sessionC1revocation correctness affects security
Rotate signing keys / certsC1old/new trust windows must be managed safely
Validate/introspect token or sessionR1core serving path
Account/session activityR2operational/user-facing only
Route to shard ownerC1wrong routing can split session or policy truth
Reassign shard ownershipC1failover must preserve auth/session correctness

Baseline critical paths #

Main C1 paths:

  • P1 start auth flow
  • P2 verify credential / factor
  • P3 create or refresh session
  • P4 issue assertion / token
  • P5 read/update SP config and auth policy
  • P6 logout / revoke session
  • P7 rotate signing keys
  • P8 route to shard owner
  • P9 reassign shard ownership

Main R1 path:

  • P10 validate or introspect token/session

This design is driven by:

  • authoritative current session state
  • guarded auth-attempt transitions
  • safe issuance of signed credentials
  • current trust and policy configuration

Step 3 - Primary State Extraction #

For an authentication/SSO/SAML system, the minimal primary state is the auth request, auth attempt lifecycle, current session state, issued credential record, app/SP config, auth policy, signing key lifecycle, and routing/ownership state.

Candidate object labelCandidate sourceCandidate needed for C1/R1?Candidate decomposition actionClassPrimary?OwnerEvolutionScope kindScope value
AuthRequestdirect nounYeskeep as candidateeventYesserviceappend-onlyinstanceauth_request_id
AuthenticationAttemptStatelifecycle objectYeskeep as candidateprocessYesservicestate machineinstanceattempt_id
SessionStatelifecycle objectYeskeep as candidateprocessYesservicestate machineinstancesession_id
IssuedCredentialdirect nounYeskeep as candidateeventYesserviceappend-onlyinstanceassertion_id or token_id
ServiceProviderConfigdirect nounYeskeep as candidateentityYesserviceoverwriteinstanceapp_id or sp_entity_id
AuthPolicydirect nounYeskeep as candidateentityYesserviceoverwriteinstancetenant_id or policy_scope
SigningKeyStatelifecycle objectYeskeep as candidateprocessYesservicestate machineinstancekey_id
PartitionOwnershiphidden write targetYeskeep as candidateprocessYesservicestate machineinstanceshard_id
PartitionMaphidden write targetYeskeep as candidateentityYesserviceoverwritecollectiontenant/shard map
IdentityActivityViewderived read modelNoreject as UI artifactprojectionNoderivedoverwritecollectionuser_id or tenant

Important modeling choices #

AuthenticationAttemptState #

Primary because:

  • auth often has multi-step lifecycle:
    • challenge
    • MFA pending
    • success
    • failure
    • locked

SessionState #

Primary because:

  • session lifecycle drives SSO reuse, logout, revocation, timeout, and introspection

IssuedCredential #

Primary because:

  • each assertion/token issuance is an immutable fact
  • useful for audit, replay protection, and token metadata

SigningKeyState #

Primary because:

  • key lifecycle matters:
    • active
    • next
    • retired
    • revoked

Minimal strict primary set #

The strongest minimal set is:

  • AuthRequest
  • AuthenticationAttemptState
  • SessionState
  • IssuedCredential
  • ServiceProviderConfig
  • AuthPolicy
  • SigningKeyState
  • PartitionOwnership
  • PartitionMap

Step 4 - Hard Invariants #

For an auth/SSO/SAML provider, the hard invariants are about correct credential verification, one authoritative session lifecycle, valid token/assertion issuance under current trust config and keys, and safe revocation/key rotation.

PathTierTypeInvariant statement
P1 start auth flowHARDuniquenessKey auth_request_id maps to at most one logical outcome recorded authentication request within auth-flow scope.
P2 verify credential / factorHARDeligibilityAction advance_auth_attempt is valid only if current AuthenticationAttemptState, current AuthPolicy, and supplied factors satisfy the transition at decision time.
P3 create or refresh sessionHARDeligibilityAction create_session is valid only if current AuthenticationAttemptState is in a successful issuable state and current policy allows session creation at decision time.
P3 create or refresh sessionHARDuniquenessKey session_id maps to at most one logical outcome current authoritative session lifecycle within session scope.
P4 issue assertion / tokenHARDeligibilityAction issue_credential is valid only if current session/auth state is valid, current ServiceProviderConfig allows issuance, and selected SigningKeyState is active at decision time.
P4 issue assertion / tokenHARDaccountingIssuedCredential contains claims/assertion fields consistent with authoritative identity, session, app config, and signing-key state at issuance time.
P5 update policy / SP configHARDorderingPolicy and SP-config revisions are ordered by monotonic version within their scopes.
P6 logout / revoke sessionHARDeligibilityAction revoke_session is valid only if current SessionState allows revocation/logout at decision time.
P7 rotate signing keysHARDeligibilityAction advance_signing_key_state is valid only if current SigningKeyState lifecycle and trust-distribution rules allow the transition at decision time.
P8 route to shard ownerHARDuniquenessKey shard_id maps to at most one logical outcome current authoritative owner within shard_id.
P9 reassign shard ownershipHARDeligibilityAction reassign_shard is valid only if current owner is failed or relinquished and candidate owner is eligible and sufficiently current on shard_id at decision time.
P10 validate / introspect token or sessionHARDfreshnessValidation/introspection reflects authoritative session, policy, and key state within configured consistency bound.

What matters most #

1. Session is the current auth truth #

For SSO reuse and revocation, current SessionState is central.

2. Issuance is guarded by current trust config #

Wrong app/SP config or wrong key state breaks trust boundaries.

3. Key rotation is lifecycle-managed #

New keys must become trusted before old keys retire.

4. Revocation must affect future validation #

If a session is revoked, introspection and new issuance must reflect it.


Step 5 - Execution Context #

For the baseline auth/SSO platform:

FieldValueWhy
Topologysingle service distributedone logical identity provider spread across auth, session, and config nodes
Write coordination scopeper object scopecorrectness is per auth attempt, session, app config, key lifecycle, and shard ownership scope
Read consistency targetstrong onlyauth, issuance, and introspection are security-critical
Holder modelclientuser/browser session is represented by current server-side session state or its equivalent
Compensation acceptable?Nowrong auth or stale issuance cannot be safely repaired afterward

Derived implications #

  • holder_may_crash = true

    • clients can disappear mid-auth flow, and nodes can fail mid-session lifecycle updates
  • cross_service_write = false

    • baseline keeps auth, session, trust config, and key state in one logical service
  • bounded_staleness_allowed = false

    • security-critical reads should use authoritative state
  • cross_service_atomicity_required = false

    • no multi-service transaction across unrelated services in baseline
  • exclusive_claim_required = true

    • shard ownership must be exclusive
  • guarded_by_current_state = true

    • auth, session, revocation, and key rotation all depend on current state

What this implies #

This pushes us toward:

  • one authoritative owner per tenant/session shard
  • append-oriented auth-request and issued-credential records
  • current-value policy/SP config
  • guarded auth/session/key lifecycle transitions

Step 6 - Deterministic Mechanism Selection #

PathWrite shapeBase mechanismRequired companions
P1 start auth flowappend-only eventappend logcorrelation id, CSRF/state token
P2 verify credential / factorguarded state transitionCAS on (state, version) or single writer per shardMFA challenge token, attempt version
P3 create / refresh sessionguarded state transitionCAS on (state, version)session id, auth-attempt version
P4 issue assertion / tokenappend-only event guarded by current statesigned issuance under active keynonce/audience/replay protection
P5 update policy / SP configoverwrite current valueCAS on versionpolicy/config version
P6 logout / revoke sessionguarded state transitionCAS on (state, version)session version
P7 rotate signing keysguarded state transitionlifecycle state transitionkey version, trust-publication epoch
P8 route to shard ownerexclusive claimleasefencing token, heartbeat
P9 reassign shard ownershipguarded state transitionCAS on (state, version)fencing token, shard catch-up check

Why these fit #

Auth attempt and session lifecycle #

These depend on current state, factors, and policy, so guarded transitions fit.

Issuance #

Issuing an assertion/token is an immutable fact, but only valid under current auth/session/SP/key state.

Policy and config #

These are current-value control state, so overwrite fits.

Key rotation #

This is lifecycle-managed current state, so guarded transition fits.

Canonical substrate implied #

The baseline now points to:

  • sharded identity-provider service
  • one owner per tenant/session shard
  • current session and auth-attempt state
  • append-only issuance/audit records
  • current trust config and signing-key lifecycle

Step 7 - Read Model / Source of Truth #

For an auth/SSO/SAML system, truth is mostly direct source state. Activity views are derived.

ConceptTruthRead pathRebuild path
C1 auth flow initiationAuthRequestread source directlyauthoritative auth-request store
C2 current auth attempt lifecycleAuthenticationAttemptStateread source directlyauthoritative attempt-state store
C3 current session lifecycleSessionStateread source directlyauthoritative session store
C4 issued assertion/token historyIssuedCredentialread source directlyauthoritative issuance/audit store
C5 current SP/app trust configServiceProviderConfigread source directlyauthoritative config store
C6 current auth policyAuthPolicyread source directlyauthoritative policy store
C7 signing key lifecycleSigningKeyStateread source directlyauthoritative key store
C8 shard ownershipPartitionOwnershipread source directlyauthoritative ownership store
C9 shard routing mapPartitionMapread source directlyauthoritative routing metadata
C10 activity dashboards / user viewsderived from requests, sessions, and issuancematerialized viewrecompute from authoritative state

Important point #

For the core semantics:

  • auth and issuance read authoritative policy, config, and key state
  • introspection reads authoritative session state
  • activity views are projections

Step 8 - Failure Handling #

PathRetryCompeting writersCrash after commitPublish failureStale holder
P1 start auth flowretry safe with correlation/state tokenparallel attempts coexist per usercommitted auth request survives crash if persistedbrowser redirect may retrystale shard owner blocked by fencing token
P2 verify credential / factorretry safe with attempt version/challenge tokenstale or duplicate challenge response loses guarded transitioncommitted attempt-state transition survives crash if persistedexternal MFA/send step may retrystale shard owner blocked by fencing token
P3 create / refresh sessionretry with session/attempt versionconcurrent session creation resolved by guarded transition and policycommitted session survives crash if persistedcookie delivery may fail even after session existsstale shard owner blocked by fencing token
P4 issue assertion / tokenretry may create multiple valid issued credentials unless nonce/replay controls usedissuance must be fenced by current session/auth state and one active signing key statecommitted issuance survives crash if audit persistedbrowser post/redirect may retrystale issuer blocked by ownership/version discipline
P5 update policy / SP configretry with config versionstale update loses CAScommitted config survives crash if persistedconfig propagation may lagn/a
P6 logout / revoke sessionretry with session versionstale revoke loses guarded transitioncommitted revocation survives crash if persistedclient cookie cleanup may lagn/a
P7 rotate signing keysretry with key version/lifecycle epochstale rotation loses guarded transitioncommitted key-state transition survives crash if persistedrelying-party metadata refresh may lagn/a
P8 route to shard ownerretry after refreshing shard maponly one valid owner should existif owner changed, refreshed map points to new ownern/astale owner rejected by fencing token
P9 reassign shard ownershipretry failover transition safelyonly one reassignment wins current ownership statepromoted owner crash triggers later reassignmentn/aold owner fenced and must not continue serving
P10 introspect / validateread retry safemany readers coexistnode crash drops request onlyrelying party may retrystale validation forbidden beyond consistency bound

What matters most #

1. Auth/session transitions must be fenced by current state #

Otherwise stale MFA results or stale logout logic can create or preserve invalid sessions.

2. Issuance and browser delivery are separate #

A token/assertion may be issued successfully even if the redirect/post back fails. Retry behavior must be explicit.

3. Key rotation needs overlap #

New trust material must be available before old keys retire, or relying parties break.

4. Revocation semantics depend on token type #

Server-side sessions introspect cleanly. Self-contained tokens may need short TTLs, revocation lists, or introspection.


Step 9 - Scale Adjustments #

HotspotTypeFirst response
login spikes / peak auth eventswrite throughput hotspotshard by tenant/user/session scope and add more auth nodes
session-store hot tenantscontention hotspotisolate large tenants and partition session state more finely
key/config reads on hot pathread hotspotcache config and key material with strict versioning and short refresh bounds
SAML metadata / trust updates across many appsfan-out hotspotincremental config propagation and tenant/app scoping
audit/activity queriesread hotspotserve from projections, not hot auth path
introspection volumeread hotspotuse fast session shards or short-lived signed tokens where appropriate

What scales well #

This system scales by:

  • sharding session and tenant state
  • separating hot auth/session reads from audit projections
  • caching policy/SP metadata carefully under strict version control
  • using short-lived issued credentials to reduce revocation pressure

What fails first #

Usually:

  • login spikes
  • very large tenants with hot session stores
  • mismanaged key rotation
  • introspection load from many relying parties

Canonical design conclusion #

The mechanical outcome is:

  • primary state:
    • AuthRequest
    • AuthenticationAttemptState
    • SessionState
    • IssuedCredential
    • ServiceProviderConfig
    • AuthPolicy
    • SigningKeyState
    • PartitionOwnership
    • PartitionMap
  • critical invariants:
    • guarded auth-attempt and session lifecycle
    • one authoritative current session state per session id
    • issuance valid only under current session, app config, policy, and active key state
    • revocation and logout reflected in future validation
    • exclusive shard ownership for auth/session truth
  • mechanisms:
    • append log
    • guarded auth/session/key transitions
    • current-value policy and app config
    • signed credential issuance
    • fenced shard ownership
  • reads:
    • direct authoritative reads for auth, session, and trust decisions
    • projections for activity views and audit UX

Polished interview answer #

I’d build the authentication and SSO system as a sharded identity-provider service with one authoritative owner per tenant or session shard. Login requests are recorded as auth-flow facts, credential and MFA verification advance a guarded authentication-attempt state machine, and successful attempts create or refresh authoritative session state. Assertions or tokens are then issued as signed immutable credentials, but only if current session state, policy, app trust config, and signing-key state all allow issuance. Logout and revocation are guarded session transitions, and key rotation is a managed lifecycle so new trust material becomes valid before old keys retire. The main scaling levers are more tenant/session shards, careful caching of trust config and keys, short-lived issued credentials, and keeping audit/activity views off the hot auth path.


Concrete Substrate #

I’ll choose a sharded identity-provider service with durable session/auth state plus separate signing-key and app-config stores as the concrete baseline, because it matches the mechanics we derived:

  • append-only auth requests and issuance history
  • guarded auth-attempt and session lifecycle
  • current-value policy and SP/app config
  • managed signing-key lifecycle
  • one owner per shard

Concrete tech family:

  • identity service in Go, Java, or Rust
  • durable metadata/state storage:
    • replicated DB or RocksDB-backed service state
  • shard replication:
    • Raft or leader-follower replication with commit index
  • signing:
    • HSM/KMS-backed keys or managed signing service
  • metadata/control:
    • etcd or internal metadata quorum for shard ownership/routing

Each shard owner stores:

  • auth-request log for owned scope
  • current auth-attempt state
  • current session state
  • issuance history / audit references
  • current policy and app/SP config cache
  • key-reference metadata for active signing material

Operation Layer #

1. Start auth flow #

API

  • StartLogin(app_id, redirect_uri, state, client_context)

Initiator

  • user/browser / relying party client

Entry point

  • auth frontend

Authoritative decider

  • shard owner for tenant/app/user context

Precondition

  • app/SP config exists and redirect/assertion consumer settings are valid

Transition

  • append AuthRequest
  • create AuthenticationAttemptState = STARTED

Response

  • challenge/redirect page or next auth step

2. Verify credential / MFA #

API

  • VerifyFactor(attempt_id, factor_response, expected_version?)

Initiator

  • user/browser

Entry point

  • auth frontend

Authoritative decider

  • shard owner for auth attempt

Precondition

  • current attempt state accepts this factor
  • policy requires/satisfies this factor set

Transition

  • guarded update of AuthenticationAttemptState
  • possibly:
    • PASSWORD_OK -> MFA_PENDING
    • MFA_PENDING -> SUCCESS

3. Create session and issue credential #

API

  • CompleteLogin(attempt_id, app_id, expected_version?)

Initiator

  • system after successful auth attempt

Entry point

  • auth frontend / issuer

Authoritative decider

  • shard owner for attempt/session plus active key state

Precondition

  • current attempt state is issuable
  • app config and policy allow issuance
  • signing key is active

Transition

  • create or refresh SessionState
  • append IssuedCredential
  • sign and return SAML assertion or token

4. Logout / revoke session #

API

  • RevokeSession(session_id, actor, expected_version?)

Initiator

  • user/client or admin

Entry point

  • session API

Authoritative decider

  • shard owner for session

Precondition

  • current session state is revocable

Transition

  • guarded update SessionState -> REVOKED or LOGGED_OUT

5. Rotate signing key #

API

  • internal admin/key-management flow

Initiator

  • system/admin

Entry point

  • key-management API

Authoritative decider

  • key-state owner

Precondition

  • new key material available
  • trust publication rules satisfied

Transition

  • SigningKeyState: NEXT -> ACTIVE
  • prior active key moves to retiring/retired later

Entry Point vs Decider vs Responder #

PathEntry pointAuthoritative deciderPhysical responderLogical responder
start loginauth frontendauth/session shard ownerfrontend nodeidentity provider
verify factorauth frontendauth-attempt shard ownerfrontend nodeidentity provider
issue credentialissuer/auth frontendsession shard owner + active key statefrontend nodeidentity provider
revoke/logoutsession APIsession shard ownerAPI nodeidentity provider
rotate keykey-management APIkey-state ownercontrol-plane nodeidentity provider
shard failoverfollower / coordination layershard quorum / lease storenew leader / control planeidentity provider

Concrete HLD #

Main components:

  • auth frontend
    • handles browser redirects, forms, and callbacks
  • auth/session shard owners
    • authoritative owners for auth attempts and session lifecycle
  • policy/app-config service
    • stores tenant policies and relying-party config
  • signing/key service
    • manages active/next/retired signing material
  • metadata/control service
    • tracks shard ownership and routing
  • audit/activity pipeline
    • serves activity views and compliance reporting

Short Interview Version #

I’d build the authentication and SSO system as a sharded identity-provider service with one authoritative owner per tenant or session shard. Login requests are recorded as auth-flow facts, credential and MFA verification advance a guarded authentication-attempt state machine, and successful attempts create or refresh authoritative session state. Assertions or tokens are then issued as signed immutable credentials, but only if current session state, policy, app trust config, and signing-key state all allow issuance. Logout and revocation are guarded session transitions, and key rotation is a managed lifecycle so new trust material becomes valid before old keys retire. The main scaling levers are more tenant/session shards, careful caching of trust config and keys, short-lived issued credentials, and keeping audit/activity views off the hot auth path.