· Random Tech Excursions

Archetype 10 — Frontier + Claimable Run #

What this archetype is #

The system tracks a frontier of not-yet-covered work, claims slices of that frontier, and checkpoints covered progress.

Examples: crawler, migration scanner, batch backfill.

We will use URL crawl frontier as the running example.

Layer 1: Entities and Postgres table design #

FrontierState
BatchRunState
CheckpointState

create table crawl_frontier (
  frontier_id bigserial primary key,
  partition_id int not null,
  item_key text not null,
  status text not null default 'DISCOVERED',
  discovered_at timestamptz not null default now(),
  unique (partition_id, item_key)
);

create table crawl_batch_runs (
  batch_id uuid primary key,
  partition_id int not null,
  claimed_by text not null,
  lease_expires_at timestamptz not null,
  status text not null default 'CLAIMED',
  created_at timestamptz not null default now()
);

create table crawl_checkpoints (
  partition_id int primary key,
  last_safe_key text,
  updated_at timestamptz not null default now()
);

Layer 2: Write path mechanics #

Claim work #

select frontier_id, item_key
from crawl_frontier
where partition_id = $1
  and status = 'DISCOVERED'
order by frontier_id
limit 100
for update skip locked;

Then mark claimed:

update crawl_frontier
set status = 'CLAIMED'
where frontier_id = any($2);

Advance checkpoint #

insert into crawl_checkpoints (partition_id, last_safe_key)
values ($1, $2)
on conflict (partition_id) do update
set last_safe_key = excluded.last_safe_key,
    updated_at = now();

Layer 3: Fault tolerance #

frontier advanced too far
same frontier item claimed twice
progress not advanced after success
uncovered work skipped

Layer 4: Scale #

Default hotspots:

frontier hot row / hot partition
batch-claim bursts
skewed range/work distribution
checkpoint lag

Common mitigations:

partition frontier aggressively
lease expiry and reclamation
done sets / dedup tables for replay safety