RBAC / ABAC Policy Engine
RBAC / ABAC Policy Engine #
This note models an RBAC/ABAC policy engine where callers ask whether a subject may perform an action on a resource, admins update roles, bindings, attributes, and policies, and the system evaluates decisions consistently at scale.
Step 1 - Normalize #
Assume the baseline prompt is:
- design an RBAC / ABAC policy engine
- callers ask authorization questions like
can user U do action A on resource R - policies can reference roles, bindings, subject/resource attributes, and contextual attributes
- admins update policies and role bindings over time
- policy evaluation must be fast and correct across many services and tenants
Normalize into state-affecting paths.
| Requirement | Actor | Operation | State touched | Priority |
|---|---|---|---|---|
| Caller evaluates authorization decision | Client | read source | S1read source targetPolicyState | C1 |
| Admin updates policy definition | Admin | overwrite state | S1update targetPolicyState | C1 |
| Admin updates role definition | Admin | overwrite state | S1update targetRoleDefinition | C1 |
| Admin grants or revokes role binding | Admin | state transition | S1update targetRoleBindingState | C1 |
| System updates subject attributes | System | overwrite state | S1update targetSubjectAttributeState | C1 |
| System updates resource attributes | System | overwrite state | S1update targetResourceAttributeState | C1 |
| System computes or refreshes effective policy snapshot | System | state transition | S1update targetPolicySnapshot | C1 |
| System propagates policy snapshot to evaluators | System | async process | S1hidden write targetEvaluatorConfigSnapshot | C1 |
| User reads audit / access activity | Client | read projection | S1read projection targetAuthorizationAuditView | R2 |
| System routes tenant/shard to current owner | System | read source | S1read source targetPartitionMap | C1 |
| System reassigns shard ownership after node failure | System | state transition | S1update targetPartitionOwnership | C1 |
Notes on normalization #
Important choices:
- authorization evaluation is a hot read path
- policy and role definitions are current-value control state
- role grants/revokes are lifecycle transitions
- attribute updates are current-value state
- snapshot generation/propagation is explicit because evaluators usually should not hit the control plane on every request
This system is a hybrid of:
policy and relationship statehot-path evaluationcontrol plane to evaluator snapshot distribution
Step 2 - Critical Path Selection #
| Requirement | Priority class | Why |
|---|---|---|
| Evaluate authorization decision | C1 | wrong allow/deny is the core correctness failure |
| Update policy definition | C1 | changes future authorization decisions |
| Update role definition | C1 | changes derived permissions |
| Grant / revoke role binding | C1 | changes subject access immediately or near-immediately |
| Update subject attributes | C1 | ABAC conditions depend on current subject properties |
| Update resource attributes | C1 | ABAC conditions depend on current resource properties |
| Compute effective policy snapshot | C1 | hot-path evaluators depend on correct derived state |
| Propagate snapshot to evaluators | C1 | stale evaluators can enforce wrong permissions |
| Read audit / activity | R2 | operational/compliance only |
| Route to shard owner | C1 | wrong routing can split policy truth |
| Reassign shard ownership | C1 | failover must preserve policy/state correctness |
Baseline critical paths #
Main C1 paths:
P1evaluate authorizationP2update policyP3update role definitionP4grant/revoke bindingP5update subject attributesP6update resource attributesP7compute policy snapshotP8propagate evaluator snapshotP9route to shard ownerP10reassign shard ownership
This design is driven by:
- authoritative current policy, bindings, and attributes
- fast evaluator reads
- bounded-stale but monotonic config distribution
Step 3 - Primary State Extraction #
For an RBAC/ABAC engine, the minimal primary state is policy definitions, role definitions, role-binding lifecycle, subject/resource attributes, derived policy snapshot state, and routing/ownership state.
| Candidate object label | Candidate source | Candidate needed for C1/R1? | Candidate decomposition action | Class | Primary? | Owner | Evolution | Scope kind | Scope value |
|---|---|---|---|---|---|---|---|---|---|
| PolicyState | direct noun | Yes | keep as candidate | entity | Yes | service | overwrite | instance | tenant_id or policy_scope |
| RoleDefinition | direct noun | Yes | keep as candidate | entity | Yes | service | overwrite | instance | role_id |
| RoleBindingState | lifecycle object | Yes | keep as candidate | process | Yes | service | state machine | instance | subject_id + role_id + scope |
| SubjectAttributeState | direct noun | Yes | keep as candidate | entity | Yes | service | overwrite | instance | subject_id |
| ResourceAttributeState | direct noun | Yes | keep as candidate | entity | Yes | service | overwrite | instance | resource_id |
| PolicySnapshot | hidden write target | Yes | keep as candidate | process | Yes | service | overwrite | instance | tenant_id or evaluator_scope |
| EvaluatorConfigSnapshot | hidden write target | Yes | keep as candidate | projection | Yes | service | overwrite | instance | evaluator_id |
| PartitionOwnership | hidden write target | Yes | keep as candidate | process | Yes | service | state machine | instance | shard_id |
| PartitionMap | hidden write target | Yes | keep as candidate | entity | Yes | service | overwrite | collection | tenant/shard map |
| AuthorizationAuditView | derived read model | No | reject as UI artifact | projection | No | derived | overwrite | collection | tenant or subject |
Important modeling choices #
PolicyState #
Primary because:
- policy language/rules are authoritative control truth
RoleBindingState #
Primary because:
- grants and revokes are not just facts; current active/inactive lifecycle matters
SubjectAttributeState / ResourceAttributeState #
Primary because:
- ABAC decisions depend on current attributes
PolicySnapshot #
Primary because:
- many production engines derive an effective evaluation model from raw policies, roles, and bindings
EvaluatorConfigSnapshot #
Kept explicit because:
- hot-path evaluators often run from local snapshots, not synchronous control-plane reads
Minimal strict primary set #
The strongest minimal set is:
PolicyStateRoleDefinitionRoleBindingStateSubjectAttributeStateResourceAttributeStatePolicySnapshotEvaluatorConfigSnapshotPartitionOwnershipPartitionMap
Step 4 - Hard Invariants #
For an RBAC/ABAC policy engine, the hard invariants are about one authoritative current policy/binding/attribute state, correct derived snapshots, and authorization decisions being evaluated against the intended current version.
| Path | Tier | Type | Invariant statement |
|---|---|---|---|
P1 evaluate authorization | HARD | eligibility | Decision authorize(subject, action, resource, context) is valid only if it is evaluated against current authoritative or approved snapshot versions of PolicyState, RoleBindingState, SubjectAttributeState, and ResourceAttributeState for the request scope. |
P2 update policy | HARD | ordering | Policy revisions are ordered by monotonic version within policy scope. |
P3 update role definition | HARD | ordering | Role-definition revisions are ordered by monotonic version within role scope. |
P4 grant / revoke binding | HARD | uniqueness | Key (subject_id, role_id, scope) maps to at most one logical outcome current authoritative binding state within binding scope. |
P5 update subject attributes | HARD | ordering | Subject-attribute revisions are ordered by monotonic version within subject scope. |
P6 update resource attributes | HARD | ordering | Resource-attribute revisions are ordered by monotonic version within resource scope. |
P7 compute policy snapshot | HARD | accounting | PolicySnapshot equals the deterministic function of current policy, roles, bindings, and attribute schema/input state for its evaluation scope. |
P8 propagate evaluator snapshot | HARD | freshness | EvaluatorConfigSnapshot(evaluator_id) reflects an authoritative PolicySnapshot within configured propagation bounds and moves monotonically forward by version. |
P9 route to shard owner | HARD | uniqueness | Key shard_id maps to at most one logical outcome current authoritative owner within shard_id. |
P10 reassign shard ownership | HARD | eligibility | Action reassign_shard is valid only if current owner is failed or relinquished and candidate owner is eligible and sufficiently current on shard_id at decision time. |
What matters most #
1. One authoritative current binding state #
Role grants and revokes must not race into conflicting current truth.
2. Snapshot correctness #
Evaluators can be local and fast, but their snapshot must correspond to real policy truth.
3. Monotonic evaluator config #
An evaluator must not move backward to an older policy version.
4. Freshness is a deliberate tradeoff #
If bounded-stale local evaluation is allowed, that must be explicit.
Step 5 - Execution Context #
For the baseline RBAC/ABAC policy engine:
| Field | Value | Why |
|---|---|---|
| Topology | single service distributed | one logical authorization-policy system with control plane and evaluator fleet |
| Write coordination scope | per object scope | correctness is per policy, role, binding, attribute, and shard ownership scope |
| Read consistency target | bounded stale allowed | hot-path evaluation often uses local snapshots with strict version discipline |
| Holder model | none | no lease-like client ownership is central to per-decision correctness |
| Compensation acceptable? | No | wrong allow/deny decisions cannot be repaired afterward |
Derived implications #
holder_may_crash = false- evaluators can fail, but they do not hold mutable business ownership like queue workers
cross_service_write = false- baseline keeps policy, bindings, attributes, and snapshots in one logical service
bounded_staleness_allowed = true- hot-path local evaluation can tolerate bounded lag if explicit
cross_service_atomicity_required = false- no multi-service transaction across unrelated services in baseline
exclusive_claim_required = true- shard ownership must be exclusive
guarded_by_current_state = true- binding grants/revokes and snapshot updates depend on current state
What this implies #
This pushes us toward:
- one authoritative owner per tenant/policy shard
- current-value policy and attribute state
- local evaluator snapshots distributed from control plane
- monotonic versioned evaluation
Step 6 - Deterministic Mechanism Selection #
| Path | Write shape | Base mechanism | Required companions |
|---|---|---|---|
P1 evaluate authorization | read source | direct source read or local snapshot read | snapshot version |
P2 update policy | overwrite current value | CAS on version | policy version |
P3 update role definition | overwrite current value | CAS on version | role version |
P4 grant / revoke binding | guarded state transition | CAS on (state, version) | binding version |
P5 update subject attributes | overwrite current value | CAS on version | attribute version |
P6 update resource attributes | overwrite current value | CAS on version | attribute version |
P7 compute policy snapshot | overwrite current value | single writer control-plane recompute | snapshot version |
P8 propagate evaluator snapshot | overwrite current value | single writer snapshot publication | config version |
P9 route to shard owner | exclusive claim | lease | fencing token, heartbeat |
P10 reassign shard ownership | guarded state transition | CAS on (state, version) | fencing token, shard catch-up check |
Why these fit #
Policies, roles, and attributes #
These are current-value control state, so overwrite fits.
Role bindings #
Bindings have lifecycle and current-state transitions, so guarded transition fits.
Snapshot build and propagation #
These are derived current views published to evaluators, so overwrite fits.
Routing #
One owner per shard is required for correctness of authoritative updates, so exclusive claim fits.
Canonical substrate implied #
The baseline now points to:
- sharded policy-control service
- one owner per tenant or policy shard
- current policy, binding, and attribute state
- derived snapshots pushed to evaluators
- bounded-stale but monotonic local decisions
Step 7 - Read Model / Source of Truth #
For an RBAC/ABAC engine, truth is mostly direct source state plus derived evaluator snapshots. Audit views are derived.
| Concept | Truth | Read path | Rebuild path |
|---|---|---|---|
C1 policy definitions | PolicyState | read source directly | authoritative policy store |
C2 role definitions | RoleDefinition | read source directly | authoritative role store |
C3 role-binding lifecycle | RoleBindingState | read source directly | authoritative binding store |
C4 subject attributes | SubjectAttributeState | read source directly | authoritative subject-attribute store |
C5 resource attributes | ResourceAttributeState | read source directly | authoritative resource-attribute store |
C6 derived effective policy | PolicySnapshot | read source directly | recompute from policy, roles, bindings, and attributes |
C7 local evaluator config | EvaluatorConfigSnapshot | materialized view | rebuild from latest PolicySnapshot |
C8 shard ownership | PartitionOwnership | read source directly | authoritative ownership store |
C9 shard routing map | PartitionMap | read source directly | authoritative routing metadata |
C10 audit / access activity | derived from decisions and config state | materialized view | recompute from authoritative state and decision logs |
Important point #
For the core semantics:
- authoritative truth lives in policy, binding, and attribute state
- evaluators usually read local snapshots for the hot path
- audit views are projections
Step 8 - Failure Handling #
| Path | Retry | Competing writers | Crash after commit | Publish failure | Stale holder |
|---|---|---|---|---|---|
P2 update policy | retry with policy version | stale update loses CAS | committed policy survives crash if persisted | snapshot propagation may lag | stale shard owner blocked by fencing token |
P3 update role definition | retry with role version | stale update loses CAS | committed role survives crash if persisted | snapshot propagation may lag | stale shard owner blocked by fencing token |
P4 grant / revoke binding | retry with binding version | stale grant/revoke loses guarded transition | committed binding state survives crash if persisted | snapshot propagation may lag | stale shard owner blocked by fencing token |
P5 update subject attributes | retry with attribute version | stale update loses CAS | committed subject attributes survive crash if persisted | snapshot propagation may lag | stale shard owner blocked by fencing token |
P6 update resource attributes | retry with attribute version | stale update loses CAS | committed resource attributes survive crash if persisted | snapshot propagation may lag | stale shard owner blocked by fencing token |
P7 compute policy snapshot | recompute retry safe from source inputs | single recompute/version wins | recompute reruns after crash | evaluator snapshot may lag | n/a |
P8 propagate evaluator snapshot | retry with versioned snapshot | older snapshot loses to newer version | evaluator keeps last good snapshot until refresh | failed push retried or pulled | n/a |
P1 evaluate authorization | request retry safe | many evaluators can answer concurrently from same snapshot | evaluator crash drops request only | n/a | stale decision bounded by configured snapshot freshness |
P9 route to shard owner | retry after refreshing shard map | only one valid owner should exist | if owner changed, refreshed map points to new owner | n/a | stale owner rejected by fencing token |
P10 reassign shard ownership | retry failover transition safely | only one reassignment wins current ownership state | promoted owner crash triggers later reassignment | n/a | old owner fenced and must not continue serving |
What matters most #
1. Snapshot monotonicity #
Evaluators must not accept older config after newer config is installed.
2. Freshness versus latency #
The main architectural tradeoff is:
- direct source read for every authorization
- versus local snapshot with bounded lag
3. Binding lifecycle correctness #
Revokes must not be lost behind stale grants.
4. Decision logs are optional for correctness #
Audit logging matters for compliance, but decision correctness depends first on policy truth and evaluator freshness.
Step 9 - Scale Adjustments #
| Hotspot | Type | First response |
|---|---|---|
| very high authorization QPS | read hotspot | push local snapshots to evaluators and keep hot path in-memory |
| hot tenants with many policy updates | fan-out hotspot | shard by tenant and batch snapshot recomputes |
| large subject-resource graph / many bindings | memory hotspot | scope bindings, compress snapshots, and isolate large tenants |
| expensive ABAC attribute fetches | read hotspot | materialize needed attributes into policy snapshots or side caches |
| audit-query load | read hotspot | serve from projections and logs, not evaluator hot path |
| snapshot churn | contention hotspot | incremental recompute and per-scope propagation |
What scales well #
This system scales by:
- sharding policy truth by tenant/scope
- evaluating from local snapshots
- limiting dynamic attribute fetches on the hot path
- incrementally recomputing only affected policy scopes
What fails first #
Usually:
- very large binding graphs
- high-frequency policy churn
- attribute fetches embedded in every auth decision
- overly broad global snapshots
Canonical design conclusion #
The mechanical outcome is:
- primary state:
PolicyStateRoleDefinitionRoleBindingStateSubjectAttributeStateResourceAttributeStatePolicySnapshotEvaluatorConfigSnapshotPartitionOwnershipPartitionMap
- critical invariants:
- one authoritative current policy/binding/attribute state
- correct derived snapshots for evaluator scopes
- monotonic evaluator config propagation
- authorization decisions evaluated against intended snapshot/source version
- exclusive shard ownership for policy truth
- mechanisms:
- overwrite current value for policy/roles/attributes
- guarded transitions for binding lifecycle
- snapshot recompute and versioned propagation
- fenced shard ownership
- reads:
- hot path from authoritative or approved local snapshot
- projections for audit and access activity
Polished interview answer #
I’d build the RBAC/ABAC engine as a sharded policy-control service with local evaluator snapshots. The source of truth is current policy definitions, role definitions, role bindings, and subject/resource attributes. Control-plane owners recompute an effective policy snapshot for each tenant or evaluator scope and propagate those snapshots monotonically to evaluators, so the hot authorization path can run in memory without querying the control plane on every request. Grants and revokes are guarded binding-state transitions, while policy and attribute updates are versioned overwrites. The main scaling levers are more tenant shards, incremental snapshot recompute, tight bounds on dynamic attribute lookups, and keeping audit views off the decision hot path.
Concrete Substrate #
I’ll choose a control-plane/data-plane authorization system with authoritative policy shards plus local evaluator snapshots as the concrete baseline, because it matches the mechanics we derived:
- current-value policy, role, binding, and attribute state
- derived policy snapshots
- monotonic snapshot publication to evaluators
- one owner per shard
Concrete tech family:
- control plane in
GoorJava - authoritative state store:
- replicated DB or
RocksDB-backed service state
- replicated DB or
- metadata/control:
etcdor internal metadata quorum for shard ownership/routing
- evaluator sidecars/libraries or central PDP fleet using in-memory snapshots
Each shard owner stores:
- current policies
- current roles and bindings
- current subject/resource attributes
- latest
PolicySnapshotper scope
Evaluators store:
- in-memory
EvaluatorConfigSnapshot - optional decision cache keyed by
(subject, action, resource, context hash, snapshot version)
Operation Layer #
1. Update role binding #
API
PutRoleBinding(subject_id, role_id, scope, desired_state, expected_version?)
Initiator
- admin
Entry point
- policy API
Authoritative decider
- shard owner for policy/binding scope
Precondition
- current binding version matches if optimistic concurrency used
Transition
- guarded update of
RoleBindingState - trigger affected snapshot recompute
2. Update policy #
API
PutPolicy(scope, policy_doc, expected_version?)
Initiator
- admin
Entry point
- policy API
Authoritative decider
- shard owner for policy scope
Precondition
- policy version matches if optimistic concurrency used
Transition
- overwrite
PolicyState - trigger affected snapshot recompute
3. Evaluate authorization #
API
Authorize(subject, action, resource, context)
Initiator
- client/service
Entry point
- evaluator / PDP
Authoritative decider
- local
EvaluatorConfigSnapshot, or authoritative policy shard for strong mode
Precondition
- evaluator snapshot version valid for tenant/scope
Transition
- none on source truth
Response
{allow|deny, reason, snapshot_version}
4. Recompute snapshot #
API
- internal control-plane recompute flow
Initiator
- system
Entry point
- shard owner
Authoritative decider
- shard owner
Precondition
- source policy/role/binding/attribute state changed
Transition
- recompute
PolicySnapshot - bump version
5. Propagate snapshot #
API
- internal snapshot push/pull
Initiator
- system
Entry point
- control plane / evaluator
Authoritative decider
- control plane snapshot publisher
Precondition
- newer
PolicySnapshotversion exists
Transition
- overwrite
EvaluatorConfigSnapshot
Entry Point vs Decider vs Responder #
| Path | Entry point | Authoritative decider | Physical responder | Logical responder |
|---|---|---|---|---|
| update policy / binding | policy API | policy shard owner | API node | policy engine |
| evaluate authorization | evaluator / PDP | local evaluator snapshot or source shard | evaluator node | policy engine |
| recompute snapshot | shard owner | shard owner | control-plane node | policy engine |
| propagate snapshot | control plane / evaluator | snapshot publisher | control/data-plane node | policy engine |
| shard failover | follower / coordination layer | shard quorum / lease store | new leader / control plane | policy engine |
Concrete HLD #
Main components:
- policy control-plane API
- receives policy, role, binding, and attribute updates
- policy shard owners
- authoritative owners for policy truth and snapshot recompute
- evaluator fleet or sidecars
- answer hot authorization queries from local snapshots
- metadata/control service
- tracks shard ownership and routing
- audit/activity pipeline
- serves decision logs and compliance views
Short Interview Version #
I’d build the RBAC/ABAC engine as a sharded policy-control service with local evaluator snapshots. The source of truth is current policy definitions, role definitions, role bindings, and subject/resource attributes. Control-plane owners recompute an effective policy snapshot for each tenant or evaluator scope and propagate those snapshots monotonically to evaluators, so the hot authorization path can run in memory without querying the control plane on every request. Grants and revokes are guarded binding-state transitions, while policy and attribute updates are versioned overwrites. The main scaling levers are more tenant shards, incremental snapshot recompute, tight bounds on dynamic attribute lookups, and keeping audit views off the decision hot path.