Archetype 10 — Frontier + Claimable Run #
What this archetype is #
The system tracks a frontier of not-yet-covered work, claims slices of that frontier, and checkpoints covered progress.
Examples: crawler, migration scanner, batch backfill.
We will use URL crawl frontier as the running example.
Layer 1: Entities and Postgres table design #
FrontierState
BatchRunState
CheckpointState
create table crawl_frontier (
frontier_id bigserial primary key,
partition_id int not null,
item_key text not null,
status text not null default 'DISCOVERED',
discovered_at timestamptz not null default now(),
unique (partition_id, item_key)
);
create table crawl_batch_runs (
batch_id uuid primary key,
partition_id int not null,
claimed_by text not null,
lease_expires_at timestamptz not null,
status text not null default 'CLAIMED',
created_at timestamptz not null default now()
);
create table crawl_checkpoints (
partition_id int primary key,
last_safe_key text,
updated_at timestamptz not null default now()
);
Layer 2: Write path mechanics #
Claim work #
select frontier_id, item_key
from crawl_frontier
where partition_id = $1
and status = 'DISCOVERED'
order by frontier_id
limit 100
for update skip locked;
Then mark claimed:
update crawl_frontier
set status = 'CLAIMED'
where frontier_id = any($2);
Advance checkpoint #
insert into crawl_checkpoints (partition_id, last_safe_key)
values ($1, $2)
on conflict (partition_id) do update
set last_safe_key = excluded.last_safe_key,
updated_at = now();
Layer 3: Fault tolerance #
- frontier advanced too far
- same frontier item claimed twice
- progress not advanced after success
- uncovered work skipped
Layer 4: Scale #
Default hotspots:
- frontier hot row / hot partition
- batch-claim bursts
- skewed range/work distribution
- checkpoint lag
Common mitigations:
- partition frontier aggressively
- lease expiry and reclamation
- done sets / dedup tables for replay safety