Feature Flag / A-B Testing Delivery Analysis Note
Table of Contents
Feature Flag / A-B Testing Delivery Analysis Note #
This note captures the full step-by-step analysis for a feature flag and A/B testing delivery system: flag config, targeting rules, experiment allocation, config propagation, local evaluation, and optional exposure logging.
Step 1 — Normalize #
Assume the baseline prompt is:
- design a feature flag / A-B testing delivery system
- applications evaluate flags locally or through a low-latency service
- admins update flags, rollout percentages, and targeting rules
- experiments assign users to variants deterministically
- system scales across many services/clients
| Requirement | Actor | Operation | State touched | Priority |
|---|---|---|---|---|
| Client evaluates flag for request/user context | Client | read source | S1read source targetEvaluationState | C1 |
| Admin creates/updates flag config | Admin | overwrite state | S1update targetFlagConfig | C1 |
| Admin updates rollout/targeting rules | Admin | overwrite state | S1update targetTargetingPolicy | C1 |
| System computes effective evaluation state | System | state transition | S1update targetEvaluationState | C1 |
| System propagates config snapshot to SDKs/edge nodes | System | async process | S1hidden write targetConfigSnapshot | C1 |
| System records exposure event | System | append event | S1create targetExposureEvent | R2 |
| Client reads flag/admin status | Client | read projection | S1read projection targetFlagStatusView | R2 |
Notes on normalization:
- evaluation is a read path against current effective evaluation state
- flag and targeting config are overwrite-state control-plane objects
- effective evaluation state is a control-plane derived object
- config propagation is async dissemination
- exposure logging is append-only, but secondary unless explicitly required for billing/analytics correctness
This is primarily a:
Control Plane + Data Planesystem
with optional secondary analytics/event-stream behavior.
Step 2 — Critical Path Selection #
| Requirement | Priority class | Why |
|---|---|---|
| Evaluate flag for request/user | C1 | wrong evaluation breaks rollout correctness and experiment treatment |
| Update flag config | C1 | changes future evaluation truth |
| Update targeting/rollout policy | C1 | changes future routing into variants |
| Compute effective evaluation state | C1 | control-plane to data-plane correctness bridge |
| Propagate config snapshot | C1 | stale SDK/edge nodes can evaluate wrongly |
| Record exposure event | R2 | important for analytics and experiment analysis, but not core evaluation correctness in baseline |
| Read flag/admin status | R2 | operational only |
Critical paths:
P1evaluate flagP2update flag configP3update targeting policyP4compute effective evaluation stateP5propagate config snapshot
Optional/secondary:
P6exposure logging
Step 3 — Primary State Extraction #
| Candidate object label | Candidate source | Candidate needed for C1/R1? | Candidate decomposition action | Class | Primary? | Owner | Evolution | Scope kind | Scope value |
|---|---|---|---|---|---|---|---|---|---|
| FlagConfig | direct noun | Yes | keep as candidate | entity | Yes | service | overwrite | instance | flag_id |
| TargetingPolicy | direct noun | Yes | keep as candidate | entity | Yes | service | overwrite | instance | flag_id or segment_scope |
| EvaluationState | hidden write target | Yes | keep as candidate | process | Yes | service | overwrite | instance | flag_id or environment_scope |
| ConfigSnapshot | hidden write target | Yes | keep as candidate | projection | Yes | service | overwrite | instance | client_id or node_id |
| ExposureEvent | hidden write target | No | keep as candidate | event | No | derived | append-only | collection | flag_id + subject_id + request_id |
| FlagStatusView | derived read model | No | reject as UI artifact | projection | No | derived | overwrite | collection | environment_id |
Minimal primary set:
FlagConfigTargetingPolicyEvaluationStateConfigSnapshot
Important modeling choices:
EvaluationState #
This is worth making explicit because:
- it is the effective compiled form used by the hot path
- it can include normalized targeting rules, percentage rollout config, variant weights, kill-switch flags, prerequisites, etc.
ConfigSnapshot #
Like other control-plane/data-plane systems:
- local SDKs or edge nodes should evaluate from a versioned snapshot
- not query control plane for every request
ExposureEvent #
Usually secondary in baseline correctness:
- important for experiment analytics
- not needed to decide variant selection if assignment is deterministic from config + context
Step 4 — Hard Invariants #
| Path | Tier | Type | Invariant statement |
|---|---|---|---|
P1 evaluate flag | HARD | eligibility | evaluate_flag is valid only if result is derived from current eligible flag config, targeting policy, and rollout rules for the request context within the configured propagation bound. |
P2 update flag config | HARD | ordering | Flag-config revisions are ordered by monotonic version within flag_id. |
P3 update targeting policy | HARD | ordering | Targeting-policy revisions are ordered by monotonic version within policy scope. |
P4 compute effective evaluation state | HARD | accounting | Effective EvaluationState(flag_id) equals function of authoritative flag config and targeting policy. |
P5 propagate config snapshot | HARD | freshness | ConfigSnapshot(client_or_node) reflects authoritative evaluation state within configured propagation bound. |
If experiment assignment is deterministic, add:
| Path | Tier | Type | Invariant statement |
|---|---|---|---|
P1 evaluate flag | HARD | uniqueness | For a fixed (flag_id, subject_key, config_version), variant assignment maps to at most one deterministic outcome within that evaluation scope. |
What matters most:
- clients must not evaluate against arbitrarily stale config
- deterministic bucketing must be stable for same subject and config version
- config versions must move monotonically
- effective state must be a faithful compiled form of authoritative rules
Step 5 — Execution Context #
| Field | Value | Why |
|---|---|---|
| Topology | single service distributed | one logical flag-delivery system with many SDKs/edge nodes |
| Write coordination scope | per object scope | correctness is per flag/policy/environment/client snapshot scope |
| Read consistency target | bounded stale allowed | hot path typically uses local cached/snapshotted config |
| Holder model | none | no central lease-like ownership on hot path |
| Compensation acceptable? | No | wrong experiment treatment or feature exposure cannot be undone for correctness purposes |
Derived:
bounded_staleness_allowed = trueexclusive_claim_required = falseguarded_by_current_state = true
This implies:
- authoritative control plane
- versioned snapshot propagation to evaluators
- local evaluation on hot path
Step 6 — Deterministic Mechanism Selection #
| Path | Write shape | Base mechanism | Required companions |
|---|---|---|---|
P2 update flag config | overwrite current value | CAS on version | config version |
P3 update targeting policy | overwrite current value | CAS on version | policy version |
P4 compute effective evaluation state | overwrite current value | single writer control-plane recompute | compiled-state version |
P5 propagate config snapshot | overwrite current value | single writer snapshot publication | config version |
P6 exposure logging | append-only event | append log | event id or request id dedup if needed |
Hot path P1 is a read path.
Why these fit:
- flag config and targeting are versioned current-state config
- effective evaluation state is a recomputed current view
- snapshots are versioned overwrite propagation
- exposure logging, if included, is append-only
Step 7 — Read Model / Source of Truth #
| Concept | Truth | Read path | Rebuild path |
|---|---|---|---|
C1 flag config | FlagConfig | read source directly | authoritative config store |
C2 targeting policy | TargetingPolicy | read source directly | authoritative policy store |
C3 effective evaluation state | EvaluationState | read source directly | recompute from flag config + policy |
C4 evaluator snapshot | ConfigSnapshot | materialized view | rebuild from latest effective evaluation state |
C5 flag/admin status | derived | materialized view | recompute from primary state |
C6 exposure analytics | ExposureEvent if retained | append/event analytics path | replay from event stream |
Important point:
For the hot evaluation path:
- SDK/edge/service reads local
ConfigSnapshot - not control-plane config source on every request
That is the key control-plane/data-plane split for feature-flag delivery.
Step 8 — Failure Handling #
| Path | Retry | Competing writers | Crash after commit | Publish failure | Stale holder |
|---|---|---|---|---|---|
| flag/policy update | retry with config version | stale update loses CAS | committed config survives control-plane crash if persisted | snapshot propagation may lag | n/a |
| effective-state recompute | retry safe from primary inputs | single recompute/version wins | recompute reruns after crash | snapshot propagation may lag | n/a |
| snapshot propagation | retry with versioned snapshot | older snapshot loses to newer version | SDK/edge keeps last good snapshot until refresh | failed push retried or pulled | n/a |
| flag evaluation | retries are application-level | many evaluators can serve concurrently using same snapshot version | one evaluator crash affects only local request handling | n/a | stale snapshot bounded by version/TTL refresh |
| exposure logging | retry with event id/request id | duplicate events coexist unless dedup applied | committed event survives if persisted | async analytics publication may lag | n/a |
What matters most:
- evaluators reject older config versions
- same subject + config version yields stable assignment
- stale snapshots are bounded
- exposure logging, if used, is idempotent enough for analytics
Step 9 — Scale Adjustments #
| Hotspot | Type | First response |
|---|---|---|
| very high evaluation QPS | read hotspot | push evaluation fully local to SDK/edge and keep snapshots compact |
| config churn from frequent flag updates | fan-out hotspot | batch updates and publish incremental snapshots |
| large targeting rule sets | read hotspot | compile rules into efficient evaluation structures and shard config by environment/app |
| huge client fleet reconnects | fan-out hotspot | backoff reconnects and support pull-on-version-miss |
| exposure-event volume | write throughput hotspot | keep exposure logging asynchronous and separate from evaluation hot path |
| admin/status reads | read hotspot | derived views only |
What scales well:
- hot path is local evaluation from snapshot
- control plane is narrow and versioned
- propagation is incremental
What fails first:
- config fanout storms
- large rule trees
- exposure analytics overload if tied too closely to the hot path
Canonical design conclusion:
- archetype:
Control Plane + Data Plane - primary truth:
FlagConfigTargetingPolicyEvaluationStateConfigSnapshot
- hot path:
- local snapshot read + deterministic evaluation/bucketing
- control plane:
- authoritative config + compiled effective state + snapshot publication
Concrete Substrate #
- control plane in
Go/Java - authoritative config store in
etcd, Postgres, or another strongly consistent config DB - config distribution via watch streams / streaming SDK updates / CDN-backed snapshot pull
- local evaluators in SDKs, edge nodes, or a low-latency gateway service
- optional exposure event pipeline via Kafka/PubSub/ClickHouse path
Operation Layer #
Evaluate(flag_id, context, subject_key)
- entry point: SDK / edge evaluator / local client library
- authoritative decider: local
ConfigSnapshot - transition: none on source truth
- response:
{enabled, variant, config_version, reason}
PutFlagConfig(flag_id, config, expected_version?)
- entry point: control-plane API
- authoritative decider: config store owner
- transition: overwrite
FlagConfig
PutTargetingPolicy(scope, config, expected_version?)
- entry point: control-plane API
- authoritative decider: policy store owner
- transition: overwrite
TargetingPolicy
- internal recompute
- recompute
EvaluationStatefrom flag config + policy
- snapshot propagation
- publish latest
ConfigSnapshot(version)to SDKs/edge nodes
RecordExposure(flag_id, subject_key, variant, config_version, request_id?)
- entry point: async event pipeline
- authoritative decider: analytics/event pipeline, not hot-path evaluator
- transition: append
ExposureEvent
Entry Point vs Decider vs Responder #
| Path | Entry point | Authoritative decider | Physical responder | Logical responder |
|---|---|---|---|---|
| evaluate flag | SDK / local evaluator / edge node | local config snapshot | local evaluator | feature-flag service |
| flag/policy update | control-plane API | config/policy store owner | control-plane node | feature-flag service |
| effective-state recompute | control plane | recompute worker | internal | feature-flag service |
| snapshot propagation | SDK / edge node / control plane | snapshot publisher | control/data-plane | feature-flag service |
| exposure logging | async ingestion endpoint | event pipeline | ingestion node | feature-flag analytics subsystem |
Concrete HLD #
Main components:
- control-plane API
- flag + policy state store
- effective-state compiler/recompute worker
- snapshot distribution layer
- SDK / edge evaluators
- optional exposure analytics pipeline
Short interview version #
“I’d design the feature-flag and A/B testing system as a control-plane/data-plane setup. Control plane stores flag definitions and targeting rules, compiles them into an effective evaluation state, and publishes versioned snapshots to SDKs or edge evaluators. The hot path does not query control plane; it evaluates locally from the snapshot and uses deterministic bucketing so the same user gets the same variant for a given config version. Exposure logging is asynchronous and separate from the evaluation hot path.”