Queue #

queue = policy-governed waiting set

A queue is the statement:

work exists now,
execution happens later,
under some discipline.

It is where time, work, ownership, fairness, and failure meet.

Design Axes (the core module) #

A “queue type” is not a primitive. It is a point in a five-axis parameter space. Design by choosing a value on each axis; recognize systems by reading their vector.

Axis 1 — Retention Semantics (the structural cleave) #

remove-on-ack:   item deleted globally once acknowledged
retained-log:    items kept; each consumer advances its own offset

This is the only axis that changes the interface shape, not just policy:

remove-on-ack -> shared mutable ack/visibility state at the crossing point (thick)
retained-log  -> progress tracking pushed entirely to consumer (thin)

Consequences:

remove-on-ack: one logical consumer group per item lifecycle; redelivery machinery
retained-log:  replay, multiple independent consumers, retention window becomes a contract

Interrogation:

Is work removed globally or retained for replay?
When is progress committed, and by whom?
What deletes history, and is anyone still depending on it?

Axis 2 — Selection Policy #

FIFO:            arrival order / approximate fairness
priority:        importance, cost, deadline
fair/weighted:   per-class or per-tenant share (WFQ, DRR)
coalescing:      keyed latest-wins; dedupe obsolete work

Substitutable independently of retention. A Kafka partition is FIFO-selected retained-log; a controller workqueue is coalescing-selected remove-on-ack.

Interrogation:

Is order required or approximate?
Can priority override arrival order?
Can one source starve others?
Is repeated work on the same key wasteful? (-> coalesce)
Does aging exist to prevent starvation?

Axis 3 — Eligibility / Timing #

immediate:        eligible on enqueue
delayed-until-T:  timer/due-time (scheduled jobs, visibility delay, Temporal timers)
delayed-until-X:  capacity or condition (waitlist, pending pods, accept backlog)

Interrogation:

Can items be delayed? By clock or by condition?
What happens to a due item after a crash? (missed timer)
Can a waiting request go stale before promotion?
Clock skew: whose clock decides "due"?

Axis 4 — Topology Position (failure routing) #

Retry queues and DLQs are not types; they are positions in a graph:

main path -> [fail] -> retry queue (delay + attempt metadata) -> main path
                    -> [attempts exhausted / poison] -> dead letter queue

A retry queue is a delay queue whose items carry {attempt, backoff, error}. A DLQ is any queue at the terminal-failure edge: quarantine + audit + redrive.

Interrogation:

How many attempts before terminal?
Backoff + jitter, or retry storm?
Is the failure recoverable at all? (poison classification)
Who watches the DLQ? (ignored DLQ = silent data loss)
Does redrive just repeat the failure?
PII/secrets in failure payloads?

Axis 5 — Capacity Policy #

unbounded:  backlog absorbs everything; latency unbounded
bounded:    overload becomes an explicit signal
            -> backpressure (producer stalls)
            -> rejection   (drop / shed / timeout)

Interrogation:

Is the queue bounded?
How is overload expressed, and to whom?
Is backlog observable? (depth, age of oldest item)
Does a large buffer hide a capacity problem? (backlog != capacity)

The Ownership Protocol (cuts across all axes) #

Every queue with side-effecting consumers runs some subset of:

publish/enqueue
receive/pull
claim/lease          <- ownership begins
process
ack/commit/delete    <- ownership + item lifecycle end
nack/retry/extend
expire/redeliver     <- lease death; ownership recycled
dead-letter
observe backlog

Interrogation:

Who owns an in-flight item?
Is ownership leased? What is the lease/visibility timeout?
What happens when the worker dies mid-lease?
Can two workers hold the same item? (yes, transiently — design for it)

Technical Bottleneck: the Commit Point* #

commit point = the moment progress is durably acknowledged,
               relative to the side effect

Essential, and no general solution exists — only per-case recipes. Nearly every classic queue bug is downstream of it:

worker crashes after side effect, before ack   -> duplicate execution
offset committed before processing             -> lost work
offset committed after processing              -> duplicate work
redelivery storm                               -> commit too slow / lease too short
"exactly once"                                 -> commit point + dedupe boundary, nothing more

Known recipes (each bounded, none universal):

idempotent consumer        (dedupe key at the effect)
transactional outbox       (effect + commit in one transaction)
exactly-once-within-scope  (Kafka txn: only inside the log's own boundary)
at-least-once + idempotency as the honest default

A strong design states explicitly:

when work becomes visible,
who owns it,
when it is complete,
what happens if completion is unknown,
and how overload is bounded.

Named Configurations (lookup table) #

Famous points in the parameter space. Vector = {retention, selection, eligibility, position, capacity}.

Name	Vector	Canonical systems	Signature failure
FIFO queue	remove-on-ack, FIFO, immediate, main, varies	SQS FIFO, RabbitMQ queue	head-of-line blocking; order breaks under parallelism
Work queue	remove-on-ack, FIFO-ish, immediate, main, varies	Celery, SQS tasks, k8s workqueue, thread pool	crash-after-effect-before-ack; stuck in-flight
Broker/message queue	remove-on-ack, routing+FIFO, immediate, main, varies	RabbitMQ/AMQP, NATS queue group, Pub/Sub sub	loss via ack/persistence misconfig; slow subscriber backlog
Durable log	retained-log, FIFO per partition, immediate, main, retention-bounded	Kafka partition, Pulsar, Kinesis shard	consumer lag; bad offset commit; rebalance pause; retention deletes needed history
Delay/timer queue	either, timestamp-priority, delayed-until-T, main, varies	scheduled job table, SQS delay, Redis zset, Temporal timers	clock skew; timer storm; missed timer after crash
Priority queue	remove-on-ack, priority, immediate, main, varies	scheduler queues, k8s scheduling queue	starvation; priority inversion; priority abuse
Retry queue	remove-on-ack, FIFO, delayed-until-T, retry edge, varies	backoff topics, Temporal retry, DLQ redrive	retry storm; retrying unrecoverable work; duplicate effects
Dead letter queue	remove-on-ack, FIFO, immediate, terminal edge, unbounded-ish	SQS DLQ, Pub/Sub dead-letter topic, RabbitMQ DLX	ignored; second unbounded backlog; secrets in payloads
Bounded queue	either, FIFO, immediate, main, bounded	thread pool queue, socket buffer, bounded channel	too large -> latency; too small -> rejection; producer stall
Fair/multi-queue	remove-on-ack, weighted/DRR, immediate, main, per-class bounds	per-tenant queues, packet schedulers	unfair weights; starvation; utilization loss under idle tenants
Coalescing queue	remove-on-ack, keyed latest-wins, immediate, main, small	k8s controller workqueue, UI event loop, reindex queue	lost intermediate event that mattered; wrong dedupe key; hot-key starvation
Waitlist	remove-on-ack, priority/FIFO, delayed-until-X, main, bounded by policy	pending pods, GPU job queue, accept backlog	stale promotion; overbooking; infinite wait

Vocabulary #

enqueue dequeue peek
claim lease visibility deadline
ack nack redelivery attempt backoff jitter
offset checkpoint replay retention
priority aging preemption fairness deficit-round-robin
head-of-line backpressure capacity load-shedding
dedupe coalescing latest-wins
poison DLQ quarantine redrive

Deep Lesson #

Queue bugs come from confusing pairs that sit on different axes:

delivery            vs  processing        (ownership protocol vs effect)
ack                 vs  successful effect (commit point)
FIFO                vs  global ordering   (selection vs retention/partitioning)
queue               vs  log               (axis 1)
retry               vs  idempotency       (topology vs commit point)
backlog             vs  capacity          (axis 5)
visibility timeout  vs  correctness       (lease is liveness, not safety)

Design procedure: walk the five axes, state the commit point, bound the overload. The named types are recognition shortcuts, not the design space.