Authentication Service / SSO / SAML Provider
Authentication Service / SSO / SAML Provider #
This note models an authentication service / SSO / SAML identity provider where users authenticate, sessions are established, signed assertions or tokens are issued to relying parties, and lifecycle operations like MFA, logout, and key rotation are handled safely at scale.
Step 1 - Normalize #
Assume the baseline prompt is:
- design an authentication service / SSO / SAML provider
- users authenticate to an identity provider
- relying parties redirect users for login
- service issues signed assertions or tokens
- sessions can be reused, revoked, or logged out
- MFA and policy may apply
- system scales across many tenants and apps
Normalize into state-affecting paths.
| Requirement | Actor | Operation | State touched | Priority |
|---|---|---|---|---|
| User starts login / auth flow | Client | append event | S1create targetAuthRequest | C1 |
| Service verifies credential / factor | System | state transition | S1update targetAuthenticationAttemptState | C1 |
| Service creates or refreshes user session | System | state transition | S1update targetSessionState | C1 |
| Service issues SAML assertion / OAuth token | System | append event | S1create targetIssuedCredential | C1 |
| Service validates relying-party config / trust relationship | System | read source | S1read source targetServiceProviderConfig | C1 |
| Admin updates identity / access policy | Admin | overwrite state | S1update targetAuthPolicy | C1 |
| Admin updates service-provider / app config | Admin | overwrite state | S1update targetServiceProviderConfig | C1 |
| User logs out or admin revokes session | Client | state transition | S1update targetSessionState | C1 |
| System rotates signing keys / certs | System | state transition | S1update targetSigningKeyState | C1 |
| User or relying party introspects / validates token or session | Client | read source | S1read source targetSessionState | R1 |
| User reads account/session activity | Client | read projection | S1read projection targetIdentityActivityView | R2 |
| System routes tenant/shard to current owner | System | read source | S1read source targetPartitionMap | C1 |
| System reassigns shard ownership after node failure | System | state transition | S1update targetPartitionOwnership | C1 |
Notes on normalization #
Important choices:
- auth flow start is
append event- login request is an immutable interaction fact
- credential verification is
state transition- auth attempt moves through challenge/success/failure states
- session creation is
state transition- current authenticated session lifecycle changes
- assertion/token issuance is
append event- each issued credential is a fact, even if short-lived
- policy and SP config are current-value control state
- logout/revocation is a current session lifecycle transition
- signing-key rotation is explicit because trust and verification depend on it
This system is a hybrid of:
identity and credential verificationsession lifecycle statesigned credential issuance
Step 2 - Critical Path Selection #
| Requirement | Priority class | Why |
|---|---|---|
| Start login / auth flow | C1 | login intent and flow correlation must be preserved |
| Verify credential / factor | C1 | wrong auth decision is a security failure |
| Create / refresh session | C1 | session truth drives reuse, logout, and revocation |
| Issue assertion / token | C1 | signed credentials are the core product output |
| Read SP/app trust config | C1 | wrong relying-party config breaks trust boundaries |
| Update auth policy | C1 | policy changes affect future auth decisions |
| Update SP/app config | C1 | trust metadata and ACS/redirect settings must be correct |
| Logout / revoke session | C1 | revocation correctness affects security |
| Rotate signing keys / certs | C1 | old/new trust windows must be managed safely |
| Validate/introspect token or session | R1 | core serving path |
| Account/session activity | R2 | operational/user-facing only |
| Route to shard owner | C1 | wrong routing can split session or policy truth |
| Reassign shard ownership | C1 | failover must preserve auth/session correctness |
Baseline critical paths #
Main C1 paths:
P1start auth flowP2verify credential / factorP3create or refresh sessionP4issue assertion / tokenP5read/update SP config and auth policyP6logout / revoke sessionP7rotate signing keysP8route to shard ownerP9reassign shard ownership
Main R1 path:
P10validate or introspect token/session
This design is driven by:
- authoritative current session state
- guarded auth-attempt transitions
- safe issuance of signed credentials
- current trust and policy configuration
Step 3 - Primary State Extraction #
For an authentication/SSO/SAML system, the minimal primary state is the auth request, auth attempt lifecycle, current session state, issued credential record, app/SP config, auth policy, signing key lifecycle, and routing/ownership state.
| Candidate object label | Candidate source | Candidate needed for C1/R1? | Candidate decomposition action | Class | Primary? | Owner | Evolution | Scope kind | Scope value |
|---|---|---|---|---|---|---|---|---|---|
| AuthRequest | direct noun | Yes | keep as candidate | event | Yes | service | append-only | instance | auth_request_id |
| AuthenticationAttemptState | lifecycle object | Yes | keep as candidate | process | Yes | service | state machine | instance | attempt_id |
| SessionState | lifecycle object | Yes | keep as candidate | process | Yes | service | state machine | instance | session_id |
| IssuedCredential | direct noun | Yes | keep as candidate | event | Yes | service | append-only | instance | assertion_id or token_id |
| ServiceProviderConfig | direct noun | Yes | keep as candidate | entity | Yes | service | overwrite | instance | app_id or sp_entity_id |
| AuthPolicy | direct noun | Yes | keep as candidate | entity | Yes | service | overwrite | instance | tenant_id or policy_scope |
| SigningKeyState | lifecycle object | Yes | keep as candidate | process | Yes | service | state machine | instance | key_id |
| PartitionOwnership | hidden write target | Yes | keep as candidate | process | Yes | service | state machine | instance | shard_id |
| PartitionMap | hidden write target | Yes | keep as candidate | entity | Yes | service | overwrite | collection | tenant/shard map |
| IdentityActivityView | derived read model | No | reject as UI artifact | projection | No | derived | overwrite | collection | user_id or tenant |
Important modeling choices #
AuthenticationAttemptState #
Primary because:
- auth often has multi-step lifecycle:
- challenge
- MFA pending
- success
- failure
- locked
SessionState #
Primary because:
- session lifecycle drives SSO reuse, logout, revocation, timeout, and introspection
IssuedCredential #
Primary because:
- each assertion/token issuance is an immutable fact
- useful for audit, replay protection, and token metadata
SigningKeyState #
Primary because:
- key lifecycle matters:
- active
- next
- retired
- revoked
Minimal strict primary set #
The strongest minimal set is:
AuthRequestAuthenticationAttemptStateSessionStateIssuedCredentialServiceProviderConfigAuthPolicySigningKeyStatePartitionOwnershipPartitionMap
Step 4 - Hard Invariants #
For an auth/SSO/SAML provider, the hard invariants are about correct credential verification, one authoritative session lifecycle, valid token/assertion issuance under current trust config and keys, and safe revocation/key rotation.
| Path | Tier | Type | Invariant statement |
|---|---|---|---|
P1 start auth flow | HARD | uniqueness | Key auth_request_id maps to at most one logical outcome recorded authentication request within auth-flow scope. |
P2 verify credential / factor | HARD | eligibility | Action advance_auth_attempt is valid only if current AuthenticationAttemptState, current AuthPolicy, and supplied factors satisfy the transition at decision time. |
P3 create or refresh session | HARD | eligibility | Action create_session is valid only if current AuthenticationAttemptState is in a successful issuable state and current policy allows session creation at decision time. |
P3 create or refresh session | HARD | uniqueness | Key session_id maps to at most one logical outcome current authoritative session lifecycle within session scope. |
P4 issue assertion / token | HARD | eligibility | Action issue_credential is valid only if current session/auth state is valid, current ServiceProviderConfig allows issuance, and selected SigningKeyState is active at decision time. |
P4 issue assertion / token | HARD | accounting | IssuedCredential contains claims/assertion fields consistent with authoritative identity, session, app config, and signing-key state at issuance time. |
P5 update policy / SP config | HARD | ordering | Policy and SP-config revisions are ordered by monotonic version within their scopes. |
P6 logout / revoke session | HARD | eligibility | Action revoke_session is valid only if current SessionState allows revocation/logout at decision time. |
P7 rotate signing keys | HARD | eligibility | Action advance_signing_key_state is valid only if current SigningKeyState lifecycle and trust-distribution rules allow the transition at decision time. |
P8 route to shard owner | HARD | uniqueness | Key shard_id maps to at most one logical outcome current authoritative owner within shard_id. |
P9 reassign shard ownership | HARD | eligibility | Action reassign_shard is valid only if current owner is failed or relinquished and candidate owner is eligible and sufficiently current on shard_id at decision time. |
P10 validate / introspect token or session | HARD | freshness | Validation/introspection reflects authoritative session, policy, and key state within configured consistency bound. |
What matters most #
1. Session is the current auth truth #
For SSO reuse and revocation, current SessionState is central.
2. Issuance is guarded by current trust config #
Wrong app/SP config or wrong key state breaks trust boundaries.
3. Key rotation is lifecycle-managed #
New keys must become trusted before old keys retire.
4. Revocation must affect future validation #
If a session is revoked, introspection and new issuance must reflect it.
Step 5 - Execution Context #
For the baseline auth/SSO platform:
| Field | Value | Why |
|---|---|---|
| Topology | single service distributed | one logical identity provider spread across auth, session, and config nodes |
| Write coordination scope | per object scope | correctness is per auth attempt, session, app config, key lifecycle, and shard ownership scope |
| Read consistency target | strong only | auth, issuance, and introspection are security-critical |
| Holder model | client | user/browser session is represented by current server-side session state or its equivalent |
| Compensation acceptable? | No | wrong auth or stale issuance cannot be safely repaired afterward |
Derived implications #
holder_may_crash = true- clients can disappear mid-auth flow, and nodes can fail mid-session lifecycle updates
cross_service_write = false- baseline keeps auth, session, trust config, and key state in one logical service
bounded_staleness_allowed = false- security-critical reads should use authoritative state
cross_service_atomicity_required = false- no multi-service transaction across unrelated services in baseline
exclusive_claim_required = true- shard ownership must be exclusive
guarded_by_current_state = true- auth, session, revocation, and key rotation all depend on current state
What this implies #
This pushes us toward:
- one authoritative owner per tenant/session shard
- append-oriented auth-request and issued-credential records
- current-value policy/SP config
- guarded auth/session/key lifecycle transitions
Step 6 - Deterministic Mechanism Selection #
| Path | Write shape | Base mechanism | Required companions |
|---|---|---|---|
P1 start auth flow | append-only event | append log | correlation id, CSRF/state token |
P2 verify credential / factor | guarded state transition | CAS on (state, version) or single writer per shard | MFA challenge token, attempt version |
P3 create / refresh session | guarded state transition | CAS on (state, version) | session id, auth-attempt version |
P4 issue assertion / token | append-only event guarded by current state | signed issuance under active key | nonce/audience/replay protection |
P5 update policy / SP config | overwrite current value | CAS on version | policy/config version |
P6 logout / revoke session | guarded state transition | CAS on (state, version) | session version |
P7 rotate signing keys | guarded state transition | lifecycle state transition | key version, trust-publication epoch |
P8 route to shard owner | exclusive claim | lease | fencing token, heartbeat |
P9 reassign shard ownership | guarded state transition | CAS on (state, version) | fencing token, shard catch-up check |
Why these fit #
Auth attempt and session lifecycle #
These depend on current state, factors, and policy, so guarded transitions fit.
Issuance #
Issuing an assertion/token is an immutable fact, but only valid under current auth/session/SP/key state.
Policy and config #
These are current-value control state, so overwrite fits.
Key rotation #
This is lifecycle-managed current state, so guarded transition fits.
Canonical substrate implied #
The baseline now points to:
- sharded identity-provider service
- one owner per tenant/session shard
- current session and auth-attempt state
- append-only issuance/audit records
- current trust config and signing-key lifecycle
Step 7 - Read Model / Source of Truth #
For an auth/SSO/SAML system, truth is mostly direct source state. Activity views are derived.
| Concept | Truth | Read path | Rebuild path |
|---|---|---|---|
C1 auth flow initiation | AuthRequest | read source directly | authoritative auth-request store |
C2 current auth attempt lifecycle | AuthenticationAttemptState | read source directly | authoritative attempt-state store |
C3 current session lifecycle | SessionState | read source directly | authoritative session store |
C4 issued assertion/token history | IssuedCredential | read source directly | authoritative issuance/audit store |
C5 current SP/app trust config | ServiceProviderConfig | read source directly | authoritative config store |
C6 current auth policy | AuthPolicy | read source directly | authoritative policy store |
C7 signing key lifecycle | SigningKeyState | read source directly | authoritative key store |
C8 shard ownership | PartitionOwnership | read source directly | authoritative ownership store |
C9 shard routing map | PartitionMap | read source directly | authoritative routing metadata |
C10 activity dashboards / user views | derived from requests, sessions, and issuance | materialized view | recompute from authoritative state |
Important point #
For the core semantics:
- auth and issuance read authoritative policy, config, and key state
- introspection reads authoritative session state
- activity views are projections
Step 8 - Failure Handling #
| Path | Retry | Competing writers | Crash after commit | Publish failure | Stale holder |
|---|---|---|---|---|---|
P1 start auth flow | retry safe with correlation/state token | parallel attempts coexist per user | committed auth request survives crash if persisted | browser redirect may retry | stale shard owner blocked by fencing token |
P2 verify credential / factor | retry safe with attempt version/challenge token | stale or duplicate challenge response loses guarded transition | committed attempt-state transition survives crash if persisted | external MFA/send step may retry | stale shard owner blocked by fencing token |
P3 create / refresh session | retry with session/attempt version | concurrent session creation resolved by guarded transition and policy | committed session survives crash if persisted | cookie delivery may fail even after session exists | stale shard owner blocked by fencing token |
P4 issue assertion / token | retry may create multiple valid issued credentials unless nonce/replay controls used | issuance must be fenced by current session/auth state and one active signing key state | committed issuance survives crash if audit persisted | browser post/redirect may retry | stale issuer blocked by ownership/version discipline |
P5 update policy / SP config | retry with config version | stale update loses CAS | committed config survives crash if persisted | config propagation may lag | n/a |
P6 logout / revoke session | retry with session version | stale revoke loses guarded transition | committed revocation survives crash if persisted | client cookie cleanup may lag | n/a |
P7 rotate signing keys | retry with key version/lifecycle epoch | stale rotation loses guarded transition | committed key-state transition survives crash if persisted | relying-party metadata refresh may lag | n/a |
P8 route to shard owner | retry after refreshing shard map | only one valid owner should exist | if owner changed, refreshed map points to new owner | n/a | stale owner rejected by fencing token |
P9 reassign shard ownership | retry failover transition safely | only one reassignment wins current ownership state | promoted owner crash triggers later reassignment | n/a | old owner fenced and must not continue serving |
P10 introspect / validate | read retry safe | many readers coexist | node crash drops request only | relying party may retry | stale validation forbidden beyond consistency bound |
What matters most #
1. Auth/session transitions must be fenced by current state #
Otherwise stale MFA results or stale logout logic can create or preserve invalid sessions.
2. Issuance and browser delivery are separate #
A token/assertion may be issued successfully even if the redirect/post back fails. Retry behavior must be explicit.
3. Key rotation needs overlap #
New trust material must be available before old keys retire, or relying parties break.
4. Revocation semantics depend on token type #
Server-side sessions introspect cleanly. Self-contained tokens may need short TTLs, revocation lists, or introspection.
Step 9 - Scale Adjustments #
| Hotspot | Type | First response |
|---|---|---|
| login spikes / peak auth events | write throughput hotspot | shard by tenant/user/session scope and add more auth nodes |
| session-store hot tenants | contention hotspot | isolate large tenants and partition session state more finely |
| key/config reads on hot path | read hotspot | cache config and key material with strict versioning and short refresh bounds |
| SAML metadata / trust updates across many apps | fan-out hotspot | incremental config propagation and tenant/app scoping |
| audit/activity queries | read hotspot | serve from projections, not hot auth path |
| introspection volume | read hotspot | use fast session shards or short-lived signed tokens where appropriate |
What scales well #
This system scales by:
- sharding session and tenant state
- separating hot auth/session reads from audit projections
- caching policy/SP metadata carefully under strict version control
- using short-lived issued credentials to reduce revocation pressure
What fails first #
Usually:
- login spikes
- very large tenants with hot session stores
- mismanaged key rotation
- introspection load from many relying parties
Canonical design conclusion #
The mechanical outcome is:
- primary state:
AuthRequestAuthenticationAttemptStateSessionStateIssuedCredentialServiceProviderConfigAuthPolicySigningKeyStatePartitionOwnershipPartitionMap
- critical invariants:
- guarded auth-attempt and session lifecycle
- one authoritative current session state per session id
- issuance valid only under current session, app config, policy, and active key state
- revocation and logout reflected in future validation
- exclusive shard ownership for auth/session truth
- mechanisms:
append log- guarded auth/session/key transitions
- current-value policy and app config
- signed credential issuance
- fenced shard ownership
- reads:
- direct authoritative reads for auth, session, and trust decisions
- projections for activity views and audit UX
Polished interview answer #
I’d build the authentication and SSO system as a sharded identity-provider service with one authoritative owner per tenant or session shard. Login requests are recorded as auth-flow facts, credential and MFA verification advance a guarded authentication-attempt state machine, and successful attempts create or refresh authoritative session state. Assertions or tokens are then issued as signed immutable credentials, but only if current session state, policy, app trust config, and signing-key state all allow issuance. Logout and revocation are guarded session transitions, and key rotation is a managed lifecycle so new trust material becomes valid before old keys retire. The main scaling levers are more tenant/session shards, careful caching of trust config and keys, short-lived issued credentials, and keeping audit/activity views off the hot auth path.
Concrete Substrate #
I’ll choose a sharded identity-provider service with durable session/auth state plus separate signing-key and app-config stores as the concrete baseline, because it matches the mechanics we derived:
- append-only auth requests and issuance history
- guarded auth-attempt and session lifecycle
- current-value policy and SP/app config
- managed signing-key lifecycle
- one owner per shard
Concrete tech family:
- identity service in
Go,Java, orRust - durable metadata/state storage:
- replicated DB or
RocksDB-backed service state
- replicated DB or
- shard replication:
Raftor leader-follower replication with commit index
- signing:
- HSM/KMS-backed keys or managed signing service
- metadata/control:
etcdor internal metadata quorum for shard ownership/routing
Each shard owner stores:
- auth-request log for owned scope
- current auth-attempt state
- current session state
- issuance history / audit references
- current policy and app/SP config cache
- key-reference metadata for active signing material
Operation Layer #
1. Start auth flow #
API
StartLogin(app_id, redirect_uri, state, client_context)
Initiator
- user/browser / relying party client
Entry point
- auth frontend
Authoritative decider
- shard owner for tenant/app/user context
Precondition
- app/SP config exists and redirect/assertion consumer settings are valid
Transition
- append
AuthRequest - create
AuthenticationAttemptState = STARTED
Response
- challenge/redirect page or next auth step
2. Verify credential / MFA #
API
VerifyFactor(attempt_id, factor_response, expected_version?)
Initiator
- user/browser
Entry point
- auth frontend
Authoritative decider
- shard owner for auth attempt
Precondition
- current attempt state accepts this factor
- policy requires/satisfies this factor set
Transition
- guarded update of
AuthenticationAttemptState - possibly:
PASSWORD_OK -> MFA_PENDINGMFA_PENDING -> SUCCESS
3. Create session and issue credential #
API
CompleteLogin(attempt_id, app_id, expected_version?)
Initiator
- system after successful auth attempt
Entry point
- auth frontend / issuer
Authoritative decider
- shard owner for attempt/session plus active key state
Precondition
- current attempt state is issuable
- app config and policy allow issuance
- signing key is active
Transition
- create or refresh
SessionState - append
IssuedCredential - sign and return SAML assertion or token
4. Logout / revoke session #
API
RevokeSession(session_id, actor, expected_version?)
Initiator
- user/client or admin
Entry point
- session API
Authoritative decider
- shard owner for session
Precondition
- current session state is revocable
Transition
- guarded update
SessionState -> REVOKED or LOGGED_OUT
5. Rotate signing key #
API
- internal admin/key-management flow
Initiator
- system/admin
Entry point
- key-management API
Authoritative decider
- key-state owner
Precondition
- new key material available
- trust publication rules satisfied
Transition
SigningKeyState: NEXT -> ACTIVE- prior active key moves to retiring/retired later
Entry Point vs Decider vs Responder #
| Path | Entry point | Authoritative decider | Physical responder | Logical responder |
|---|---|---|---|---|
| start login | auth frontend | auth/session shard owner | frontend node | identity provider |
| verify factor | auth frontend | auth-attempt shard owner | frontend node | identity provider |
| issue credential | issuer/auth frontend | session shard owner + active key state | frontend node | identity provider |
| revoke/logout | session API | session shard owner | API node | identity provider |
| rotate key | key-management API | key-state owner | control-plane node | identity provider |
| shard failover | follower / coordination layer | shard quorum / lease store | new leader / control plane | identity provider |
Concrete HLD #
Main components:
- auth frontend
- handles browser redirects, forms, and callbacks
- auth/session shard owners
- authoritative owners for auth attempts and session lifecycle
- policy/app-config service
- stores tenant policies and relying-party config
- signing/key service
- manages active/next/retired signing material
- metadata/control service
- tracks shard ownership and routing
- audit/activity pipeline
- serves activity views and compliance reporting
Short Interview Version #
I’d build the authentication and SSO system as a sharded identity-provider service with one authoritative owner per tenant or session shard. Login requests are recorded as auth-flow facts, credential and MFA verification advance a guarded authentication-attempt state machine, and successful attempts create or refresh authoritative session state. Assertions or tokens are then issued as signed immutable credentials, but only if current session state, policy, app trust config, and signing-key state all allow issuance. Logout and revocation are guarded session transitions, and key rotation is a managed lifecycle so new trust material becomes valid before old keys retire. The main scaling levers are more tenant/session shards, careful caching of trust config and keys, short-lived issued credentials, and keeping audit/activity views off the hot auth path.