Bridging Intent and Commitment
Table of Contents
Bridging Intent and Commitment #
The central tension in application-level coordination is throughput vs correctness. Millions of users must express their intent (claim a seat, place a bid, send a payment) at low latency. A small number of those intents must be committed with strong guarantees (no double charges, no double bookings).
These two requirements are structurally incompatible in a single system:
- High throughput requires optimistic operations, in-memory state, minimal coordination, low latency.
- Strong correctness requires consensus, durable writes, coordination overhead, higher latency.
The architectural response is a two-layer system with a fast, optimistic intent layer and a slow, authoritative commitment layer, connected by bridge mechanisms that ensure the gap between them does not create inconsistency.
The Two Layers #
Intent layer (fast path):
- Redis, in-memory state, or a caching tier
- Optimistic operations:
SET NX, atomic counters, sorted sets - Errors are expected and acceptable
- State is temporary: bounded by TTL
- Throughput: hundreds of thousands of operations per second
- Latency: single-digit milliseconds
Commitment layer (slow path):
- Durable database (PostgreSQL, MySQL, CockroachDB)
- Pessimistic operations: transactions, unique constraints, row locks
- Errors are catastrophic: double charges, double bookings
- State is permanent: survives crashes
- Throughput: thousands of operations per second
- Latency: 10–100ms
The bridge mechanisms are the set of patterns that ensure consistency between these layers. Each bridge addresses a specific failure mode in the gap.
Bridge 1: TTL — Bounding the Cost of Intent Failures #
Problem: the intent layer holds state (a seat reservation, a lock, an inventory hold) that is not yet committed. If the user abandons the flow, the state must be released.
Solution: every intent operation has a TTL. If commitment does not arrive within TTL, the intent is automatically released.
# Seat reservation intent: 10-minute TTL
redis.set(f"seat:{seat_id}:hold", user_id, nx=True, ex=600)
# Inventory hold: 15-minute TTL
redis.set(f"inventory:{sku}:hold:{quantity}", order_id, nx=True, ex=900)
# Bid hold during auction: 5-minute TTL
redis.zadd("auction:123:bids", {user_id: bid_amount})
redis.expire("auction:123:bids", 300)
TTL determines the maximum duration of inconsistency between the intent layer and the real world. A seat held for 10 minutes means that seat is unavailable to other users for up to 10 minutes after the user abandons checkout. The TTL must balance:
- Too short: user is midway through checkout when hold expires; another user grabs the seat; first user’s payment is rejected.
- Too long: abandoned holds prevent legitimate purchases for too long.
Heartbeat extension: for long-form flows (checkout, multi-page form), the client can extend the TTL with a heartbeat:
# Client heartbeat every 2 minutes to extend 10-minute hold
async def extend_hold(seat_id: str, user_id: str) -> bool:
# Only extend if current holder is this user
pipe = redis.pipeline()
pipe.get(f"seat:{seat_id}:hold")
current_holder = (await pipe.execute())[0]
if current_holder == user_id:
redis.expire(f"seat:{seat_id}:hold", 600) # Reset to 10 minutes
return True
return False # Hold was taken by another user
The commitment check: when the user completes the intent phase and tries to commit, the commitment layer must verify the intent still holds:
-- Commitment check: verify intent is still valid, then commit
BEGIN;
SELECT 1 FROM redis_sync WHERE seat_id = $1 AND holder = $2;
-- If not found: intent expired, reject
INSERT INTO seat_purchases (seat_id, user_id, order_id)
VALUES ($1, $2, $3)
ON CONFLICT (seat_id) DO NOTHING;
COMMIT;
In practice, the commitment layer does not query Redis directly. The application layer reads the intent status from Redis before committing to the database.
Bridge 2: Idempotency Key — Connecting Intent to Exactly One Commitment #
Problem: the commitment layer (charging a payment, booking a flight, shipping an order) must execute exactly once. Network retries and application retries cause the same commitment to be attempted multiple times.
Solution: the client generates an idempotency key for each logical intent. The commitment layer uses the key to deduplicate: if the key has been seen before, return the cached result.
(Covered in detail in Chapter 8.)
The bridge role of the idempotency key: it connects the user’s intent (the specific user action: “purchase this seat”) to a single commitment (one payment charge, one booking record). No matter how many times the commitment is attempted, the same idempotency key produces the same result.
Key design for intent-commitment bridging:
# The idempotency key encodes the intent:
# user X wanting to book seat Y in event Z
idempotency_key = f"{user_id}:{event_id}:{seat_id}"
# The key is deterministic: if the user retries (page refresh, network retry),
# the same key is used. The commitment layer deduplicates.
response = payment_service.charge(
amount=ticket_price,
idempotency_key=idempotency_key
)
Scope: the idempotency key must be scoped to the specific intent. A user purchasing two different seats in the same event needs two different keys (the idempotency key encodes seat_id). A user retrying the same seat purchase uses the same key.
Bridge 3: CAS — Committing Without Holding a Lock #
Problem: the intent layer uses an optimistic claim (Redis SET NX). The commitment layer must verify that the claim is still valid when committing, without requiring a lock to be held across the intent-to-commitment transition.
Solution: Compare-And-Swap (CAS) at the commitment layer checks that the state has not changed since the intent was established.
-- CAS commitment: only succeed if state matches intent
UPDATE seats
SET status = 'sold', buyer_id = $user_id, order_id = $order_id
WHERE seat_id = $seat_id
AND status = 'held' -- Intent state matches
AND holder_id = $user_id; -- This user holds the intent
-- If 0 rows updated: intent was overwritten; reject
Version-based CAS:
-- Optimistic lock with version counter
UPDATE inventory
SET quantity = quantity - $requested,
version = version + 1
WHERE sku = $sku
AND quantity >= $requested -- Sufficient stock
AND version = $expected_version; -- No concurrent modification
-- If 0 rows: concurrent update occurred; retry from read
CAS and idempotency: CAS is not inherently idempotent. If the same CAS is retried after success, the version will have changed (incremented by the first execution), causing the retry to fail. To make a CAS-based operation idempotent, combine CAS with an idempotency key:
BEGIN;
-- Check idempotency key
SELECT result FROM commitments WHERE idempotency_key = $key;
-- If exists: return cached result (do not re-execute CAS)
IF NOT FOUND THEN
-- Execute CAS
UPDATE seats SET status = 'sold' WHERE seat_id = $seat_id AND status = 'held' AND holder = $user_id;
-- Record result with idempotency key
INSERT INTO commitments (idempotency_key, result) VALUES ($key, 'sold');
END IF;
COMMIT;
Bridge 4: Fencing Token — Preventing Stale Intent Holders #
Problem: a lease-based intent holder may pause (GC, network stall) beyond the TTL. The intent expires and a new holder takes over. The original holder resumes and attempts to commit, believing it still holds the intent.
Solution: the intent layer issues a monotonically increasing token with each grant. The commitment layer rejects commits from holders with outdated tokens. (Covered in depth in Chapter 3.)
The fencing token bridges the intent layer (Redis hold) to the commitment layer (database write) by encoding the token in the write:
# Intent acquisition: Redis issues fencing token
lease_acquired, token = redis_lua_script(
"acquire_with_token", seat_id, user_id, ttl=600
)
# Commitment: include fencing token in write
db.execute("""
INSERT INTO seat_purchases (seat_id, user_id, order_id, fencing_token)
VALUES ($1, $2, $3, $4)
WHERE NOT EXISTS (
SELECT 1 FROM seat_purchases WHERE seat_id = $1 AND fencing_token >= $4
)
""", seat_id, user_id, order_id, token)
Fencing in etcd (infrastructure): the creation revision of the lease key serves as the fencing token. The protected resource (the database, the file system) must check the token:
// Write to storage with fencing check
storage.WriteWithFence(ctx, data, fencingToken)
// Storage implementation:
func (s *Storage) WriteWithFence(ctx context.Context, data []byte, token int64) error {
s.mu.Lock()
defer s.mu.Unlock()
if token <= s.lastSeenToken {
return ErrStaleToken
}
s.lastSeenToken = token
return s.writeInternal(data)
}
Bridge 5: Compensation — Undoing Committed Intent #
Problem: in a multi-step saga (Chapter 4), a later step fails after earlier steps have already committed. The committed steps must be undone.
Solution: each saga step has a corresponding compensation operation. If the saga fails at step k, steps 1 through k-1 are compensated in reverse order.
Compensation is a bridge because it connects the commitment of an earlier step to the “uncommitted” (rolled back) state that should result from the saga’s failure:
# Compensation example: hotel + flight booking saga
saga_steps = [
(book_hotel, cancel_hotel_booking),
(book_flight, cancel_flight_booking),
(charge_payment, refund_payment),
]
completed_steps = []
for step, compensation in saga_steps:
try:
result = step()
completed_steps.append((compensation, result))
except Exception as e:
# Compensate all completed steps in reverse order
for comp, comp_result in reversed(completed_steps):
comp(comp_result) # Must be idempotent
raise SagaFailed(str(e)) from e
Key constraints on compensation:
- Compensation must be idempotent: if the compensating action is retried (orchestrator crash during compensation), it must be a no-op.
def cancel_hotel_booking(booking_id: str):
# Idempotent: safe to call multiple times
result = hotel_api.cancel(booking_id)
if result.status == "already_cancelled":
return # No-op
if result.status != "cancelled":
raise CompensationFailed(f"Could not cancel {booking_id}")
Not all operations can be compensated: an email that was sent cannot be unsent. A fax that was transmitted cannot be untransmitted. For non-compensatable operations, the saga must account for this — typically by making the operation the last step (so it only runs when everything else has succeeded) or by accepting that some side effects are irrevocable.
Compensation may involve business logic: refunding a payment is not just a database rollback — it may involve a refund fee, a partial refund policy, or a time limit. Compensation is a business operation, not a technical rollback.
Bridge 6: Sequence Number — Ordering Events Across the Gap #
Problem: the intent layer processes events in high-throughput order (by arrival time, by Redis FIFO). The commitment layer needs to process them in a specific order (by bid amount for an auction, by submission time for a job scheduler). The two orderings may diverge.
Solution: assign a sequence number at the intent layer that encodes the ordering semantics. The commitment layer uses sequence numbers to process events in the correct order.
# Auction bids: sequence number encodes bid priority (amount)
def place_bid(auction_id: str, user_id: str, amount: int):
# Intent: add bid to sorted set (score = amount for ordering)
redis.zadd(f"auction:{auction_id}:bids", {user_id: amount})
# The Redis sorted set score IS the sequence number for commitment ordering
# Commitment: when auction closes, take highest score (highest bid)
winning_bid = redis.zrevrange(f"auction:{auction_id}:bids", 0, 0, withscores=True)
return commit_auction_result(auction_id, winning_bid)
For FIFO ordering (first-come-first-served job scheduling):
# Intent: assign monotonic sequence number at arrival
sequence = redis.incr("job_queue:sequence")
redis.zadd("job_queue", {f"{job_id}:{sequence}": sequence})
# Commitment: process jobs in sequence order (lowest first)
jobs = redis.zrange("job_queue", 0, 99, withscores=True) # FIFO
for job_id, seq_num in jobs:
if process_job(job_id, seq_num): # Includes idempotency check
redis.zrem("job_queue", job_id)
Sequence numbers as fencing tokens for job scheduling: the sequence number assigned at intent time doubles as a fencing token. If a job processor crashes and recovers with an old sequence, the commitment layer rejects it (sequence < current_committed_sequence).
Combining Bridges #
In practice, robust systems combine multiple bridges. The Ticketmaster seat reservation example uses all six:
| Phase | Bridge mechanism | Purpose |
|---|---|---|
| Seat hold (Redis SET NX) | TTL | Hold expires if user abandons checkout |
| Seat hold | Fencing token (Redis WATCH / revision) | Stale hold rejected at commitment |
| Checkout (payment attempt) | Idempotency key | Payment charged exactly once |
| Payment + booking | CAS (INSERT WHERE NOT EXISTS) | No double booking despite retries |
| Multi-service saga | Compensation | Rollback if fulfillment fails after payment |
| Queue (virtual waiting room) | Sequence number | Fair FIFO access to seat selection |
Each bridge addresses one failure mode in the gap between intent and commitment. Removing any bridge creates a category of failure: TTL removed → abandoned holds lock seats indefinitely. Idempotency key removed → network retries cause double charges.
The Bridge Selection Guide #
| Failure mode | Bridge mechanism |
|---|---|
| Intent abandoned, resource locked indefinitely | TTL |
| Duplicate commits from retries | Idempotency key |
| Concurrent writes overwrite each other | CAS |
| Stale lock holder writes after TTL expiry | Fencing token |
| Later saga step fails after earlier committed | Compensation |
| Out-of-order processing | Sequence number |