Policy / Capability #

who may do what, to which resource, under which context?

policy      = rule for deciding authority
capability  = credential that carries authority
enforcement = the place where the decision is applied

Role in the catalog: this block is boundary.md’s crossing discipline, promoted to its own module. boundary.md owns where the lines are; this file owns what happens at the gate.

Central tension (this is axis 1’s tradeoff, stated up front):

local, fast, available decisions
        vs
fresh, revocable, globally consistent decisions

Design Axes (the core module) #

Axis 1 — Where Authority Lives (the structural cleave) #

lookup:  authority is a fact in a store, evaluated at decision time
         (RBAC bindings, IAM policies, Zanzibar tuples)
token:   authority is carried in the credential, verified locally
         (presigned URL, bearer token, macaroon, x.509 SAN)

This changes the interface shape, not just policy:

lookup -> decision needs the store: fresh, revocable, centrally auditable,
          but a runtime dependency on every request path
token  -> verification needs only a key: fast, partition-tolerant, offline,
          but revocation becomes the hard problem and audit moves to issuance

Delegation/attenuation is not a type — it is the signature operation on token-based authority (macaroons: add caveats, never remove; each hop can only shrink the grant).

Interrogation:

Is the decision a lookup or a verification?
What breaks when the authority store is unreachable?
Who was the PDP for a token? (answer: the issuer, at issuance time)
Can a holder attenuate before passing on, or only forward the full grant?

Axis 2 — Decision Model (how the decision is computed) #

Substitutable evaluation languages, freely composed in practice:

role lookup (RBAC):        principal -> role -> permitted actions
                           cheap, legible; pays: role explosion, scope reuse
attribute predicate (ABAC): f(principal attrs, resource attrs, action, context)
                           expressive; pays: missing/spoofed attributes,
                           rule-interaction opacity
graph reachability (ReBAC): does a path exist in the relationship graph?
                           natural for sharing/hierarchy; pays: traversal cost,
                           inherited access that no one can explain

Compositions are the norm: IAM = ABAC in role clothing; Zanzibar caveats = ABAC embedded in ReBAC.

Interrogation:

Can the decision be explained? (who granted this, via what path/rule?)
Who asserts each attribute, and can the requester influence it?
Is deny explicit or default? Do rules combine as first-deny-wins or any-allow?
For graphs: is depth bounded? Are cycles legal?

Axis 3 — Decision Topology (where evaluation runs) #

in-process, local data:     evaluator inside the server        (k8s RBAC in apiserver)
local engine, pushed data:  sidecar/library + policy bundles   (OPA, Istio AuthorizationPolicy)
remote central PDP:         Check() per request                (ext_authz, custom authz service)
precomputed into token:     issuer was the PDP; runtime is
                            verification only                  (presigned URL, JWT scopes)

The pushed-bundle case is control-plane/data-plane (boundary.md #13) wearing policy clothes — same machinery and same failure modes as xDS:

bundle version skew across enforcers = xDS staleness
last-good-policy on distribution failure = last-good-config

Central PDP adds the gate’s own availability question:

PDP outage: fail open (availability, security hole)
            or fail closed (secure, everything stops)?
decision caches trade the same coin as tokens: speed for staleness

Interrogation:

Where is policy stored / evaluated / enforced? (three different places, usually)
Latency budget: can the request path afford a remote Check()?
What version of policy did this enforcer use, and how would you know?
Fail-open or fail-closed — decided explicitly, per enforcement point?

Axis 4 — Enforcement Position (when in the lifecycle the gate sits) #

admission-time:    before state enters the system   (k8s webhooks, PSA, quota admission)
connection-time:   before bytes flow                (mTLS, NetworkPolicy, security groups)
request-time:      per call                         (ext_authz, API authz)
data-access-time:  per row/column/field             (row-level security, column masking)
consumption-time:  metered as capacity is used      (ResourceQuota, rate limit descriptors)

Positions differ in what they can see and how long the decision stays true:

admission-time validates a mutation once while the world keeps moving —
"policy race with later controller action" is scheduler.md's
view-vs-reality* at the policy layer.
connection-time decisions outlive the connection's context
(long-lived mTLS conn survives a policy change — revocation again).
earlier positions are cheaper and coarser; later positions see more and cost more.

Interrogation:

What can this position actually observe? (admission can't see runtime behavior)
How long does a decision made here remain in force?
What later actor can invalidate the assumption this gate checked?
Defense in depth: which positions back this one up?

Axis 5 — Principal Substrate (who is asking — the foundation) #

Workload identity is not a policy type; it is what every other axis stands on. You cannot authorize what you cannot name.

human principals:    OIDC subject, group membership
workload principals: SPIFFE ID, mTLS SAN, ServiceAccount token, IAM role
attestation:         how the credential got bound to the right workload
                     (SPIRE node+workload attestation; projected, audience-bound SA tokens)

Failure here poisons everything above it:

wrong workload obtains identity    -> every policy correctly authorizes the wrong party
credential theft                   -> identity and authority conflated (deep lesson row 1)
trust bundle skew / rotation break -> valid peers rejected, or dead roots trusted
NAT/proxy strips identity          -> policy evaluates against the wrong principal

Interrogation:

Who issued the principal's credential, rooted in what trust bundle?
Is the credential bound (audience, proof-of-possession) or bearer?
How was the workload attested — could another pod obtain this identity?
Does identity survive every hop of the actual traffic path?

Technical Bottleneck: Revocation — the Freshness of Granted Authority* #

the further authority travels from its source —
into a cache, a bundle, a bearer token, a live connection —
the faster checking becomes, and the harder taking it back becomes.

Essential, no general solution: every point on axes 1 and 3 is a stance toward it. Count the doc’s failure modes that are this one problem:

token leaked, revocation hard        stale decision cache
old bundle still enforcing           stale group membership
stale relationship cache             long-lived connection outlives policy change

Known recipes (bounded, composable, none universal):

short expiry            rent authority instead of granting it —
                        the lease (queue/scheduler/state-machine blocks),
                        applied to permission
introspection / CRL     reintroduce the lookup you tried to escape (hybrid)
push invalidation       control-plane distribution, with its own skew window
zookie (Zanzibar)       consistency token: "how stale may this decision be"
                        becomes an explicit per-request parameter, not an
                        ambient property — the flagship recipe

The canonical statement of the bottleneck is Zanzibar’s “new enemy” problem:

1. remove viewer from ACL     2. add secret content
a stale decision that reorders these shows the revoked viewer the new secret —
revocation freshness and content freshness must be causally linked

A strong design says explicitly:

who grants authority,
what authority is represented as,
where the decision is made and enforced,
how stale a decision may be (named, bounded, per path),
how authority is revoked within that bound,
and how every decision is audited.

Gate Protocol (the crossing-point spec — keep) #

General:

authenticate principal
collect request context
resolve policy data
evaluate decision
enforce allow/deny
record audit event
cache decision, if allowed (with named staleness bound)
refresh / revoke / expire authority

Capability lifecycle:

issuer creates scoped credential (issuance = the decision)
holder presents; verifier checks signature, audience, expiry, caveats
resource server enforces
credential expires or is revoked (see bottleneck*)

Central PDP:

PEP receives request
PEP calls Check(principal, action, resource, context [, zookie])
PDP evaluates policy + data
PDP returns allow/deny + reason
PEP enforces and audits

Named Configurations (lookup table) #

Vector = {authority home, decision model, topology, position, principal}.

Name	Vector	Canonical study object	Signature failure
RBAC	lookup, roles, in-process, request-time, human/SA	k8s RBAC + SubjectAccessReview	role explosion; overbroad admin; stale membership
ABAC	lookup, attributes, in-process or PDP, request-time, any	IAM condition evaluation	missing/spoofed attribute; rule-interaction opacity
ReBAC	lookup, graph, central + caches + zookies, request-time, human	Zanzibar	traversal cost; stale cache; unexplainable inherited access
Bearer capability	token, precomputed scopes, verification-only, request-time, holder	presigned URL; OAuth2 bearer	leak = access; revocation hard; wrong audience accepts
Attenuated capability	token + caveat ops, precomputed, verification-only, request-time, delegation chain	Macaroons	unchecked caveat; over-delegation; confused deputy
Workload identity	(substrate for all), —, —, —, attested workload	SPIFFE/SPIRE + Envoy SDS	wrong workload attested; rotation failure; bundle skew
Central PDP	lookup, any model, remote Check, request-time, any	Envoy ext_authz	PDP outage; fail-open leak; per-request latency; stale cache
Policy-as-code bundle	lookup, attributes/rules, local engine + pushed data, request-time, workload	OPA bundles; Cedar	version skew across enforcers; rollout breaks clients
Admission policy	lookup, rules, in-process webhook, admission-time, user/SA	k8s validating/mutating webhooks	webhook outage blocks cluster; race with later controllers; fail-open
Network/service policy	lookup, identity+L4/L7 rules, pushed to proxies, connection-time, workload	Istio AuthorizationPolicy + mTLS	default-allow surprise; identity lost at proxy/NAT; policy ≠ traffic path
Data governance	lookup, classification attrs, engine-embedded, data-access-time, human/service	row/column policy; catalog governance	PII in logs; forbidden-region replica; backup ignores deletion
Quota/resource	lookup, counters, in-process admission, consumption-time, tenant	ResourceQuota; rate-limit service	undercount; wrong key; burst bypass; unmetered shared resource

Vocabulary #

principal  subject  resource  action  context
policy  role  attribute  relationship  tuple
capability  credential  token  claim  scope  audience  expiry
caveat  attenuation  delegation  confused deputy
trust root  attestation  rotation
PDP  PEP  decision  reason  audit
revocation  introspection  zookie  new enemy
fail-open  fail-closed  default deny

Deep Lesson #

Policy bugs come from confusing pairs on different axes:

identity            vs  authority              (axis 5 vs axes 1–4: naming ≠ permitting)
name                vs  principal              (axis 5: a string is not an attested party)
authentication      vs  authorization          (substrate vs decision)
token possession    vs  valid permission       (axis 1: bearer ≠ still-authorized — revocation*)
role                vs  resource-specific access (axis 2: grant scope ≠ evaluation scope)
cache hit           vs  fresh decision         (revocation*: staleness must be named)
network boundary    vs  trust boundary         (boundary.md: mechanism ≠ concern)
fail-open           vs  availability           (axis 3: the gate's own outage is a decision)

Design procedure: attest the principal, choose where authority lives, pick the decision model, place evaluation and enforcement, then name the staleness bound and the revocation path for every cached grant. The named types are recognition shortcuts, not the design space.