Skip to main content
  1. System Design Components/

Table of Contents

Bit.ly: System Design #

Derived using the 20-step derivation framework. Every step produces an explicit output artifact. No steps are skipped or abbreviated.


Ordering Principle #

Product requirements
  → normalize into operations over state          (Step 1)
  → extract primary objects                       (Step 2)
  → assign ownership, ordering, evolution         (Step 3)
  → extract invariants                            (Step 4)
  → derive minimal DPs from invariants            (Step 5)
  → select concrete mechanisms                    (Step 6)
  → validate independence and source-of-truth     (Step 7)
  → specify exact algorithms                      (Step 8)
  → define logical data model                     (Step 9)
  → map to technology landscape                   (Step 10)
  → define deployment topology                    (Step 11)
  → classify consistency per path                 (Step 12)
  → identify scaling dimensions and hotspots      (Step 13)
  → enumerate failure modes                       (Step 14)
  → define SLOs                                   (Step 15)
  → define operational parameters                 (Step 16)
  → write runbooks                                (Step 17)
  → define observability                          (Step 18)
  → estimate costs                                (Step 19)
  → plan evolution                                (Step 20)

Context and Scale #

Bit.ly is a URL shortening service. The core proposition: long URLs become short codes (bit.ly/AbCd3F), which redirect to the original. Analytics are a premium feature.

Traffic asymmetry is the defining characteristic. Redirect traffic is approximately 10,000x write traffic. Creating a short URL is the rare event. Following one is the core function. Any design that does not start from this asymmetry will fail at scale.

Reference scale:

  • 300 million short URLs created per month → ~115 writes/sec
  • 10 billion redirects per month → ~3,800 redirects/sec average, peaks 10–100x that
  • P99 redirect latency budget: 10ms end-to-end (cache hit), 100ms (cold)
  • Analytics: click events must not block the redirect path

Step 1 — Problem Normalization #

Goal: Rewrite each functional requirement as (actor, operation, state).

Original RequirementActorOperationState Touched
User creates a short URL from a long URLUser / API clientCreate record (append immutable mapping)ShortLink record
User follows a short URL and is redirectedHTTP client (browser/bot)Read redirect target; append click eventShortLink.target_url (read); ClickEvent (append)
User sees click analytics (count, geo, referrer)Authenticated userRead derived aggregateClickAggregate (derived from ClickEvents)
User claims a custom slugAuthenticated userConditional insert (claim unique name)ShortLink.slug (uniqueness domain)
Short URL expires after configured TTLScheduler / systemTransition state of ShortLink to EXPIREDShortLink.status
Admin disables / deletes a short URLAdmin actorOverwrite ShortLink.status to DISABLEDShortLink.status
User views their dashboard of short URLsAuthenticated userRead projection over ShortLink records owned by userUserLinkIndex (derived)

Key observations from normalization:

  1. The redirect is a read of ShortLink.target_url followed by an append of a ClickEvent. It is not a write to any mutable state. This is critical for scaling: redirects are pure reads.
  2. ClickAggregate is not state — it is a derived view computed from ClickEvents. Treating it as primary state would create a contention-heavy write path on every redirect.
  3. Custom slug creation is a conditional insert — the operation must fail if the slug is already taken. This is an atomic uniqueness check, not a simple insert.
  4. URL expiry is a state machine transition from ACTIVE to EXPIRED. It can be driven by a TTL check on read (lazy expiry) or a scheduler (eager expiry). These have different tradeoffs.

Step 2 — Object Extraction #

Goal: Identify primary objects, classify each, and apply the four purity tests.

Primary Objects #

Class: Stable entity (long-lived record with identity)

Description: The canonical mapping from a short code (slug) to a long URL. Created once, read billions of times.

Fields: slug (PK), target_url, owner_user_id, created_at, expires_at (nullable), status (ACTIVE | EXPIRED | DISABLED), custom_domain (nullable), alias (human-readable name for dashboard)

Purity tests:

  • Ownership: Single writer after creation (owner and admin only). Slug assignment happens once at creation.
  • Evolution: State machine (ACTIVE → EXPIRED, ACTIVE → DISABLED). Target URL is immutable after creation (append-only by convention; overwriting would break caches).
  • Ordering: Total order within slug domain (each slug has exactly one owner; slug assignment is atomic and final).
  • Derivability: Not derivable from any other object. ShortLink IS the source of truth for the redirect target.

Verdict: Primary object. Not derivable. Cannot be merged.


2. ClickEvent #

Class: Immutable event (append-only fact)

Description: A single redirect event — one click on one short URL. Contains the slug, timestamp, geo (country/city inferred from IP), referrer header, user-agent.

Fields: event_id (UUID), slug, occurred_at, country, city, referrer, user_agent, ip_hash (privacy-preserving)

Purity tests:

  • Ownership: The redirect service is the sole writer. The user initiating the redirect does not own this record.
  • Evolution: Append-only. A click that happened cannot un-happen. Immutable.
  • Ordering: Total order by occurred_at within slug (for analytics queries). Partial order across slugs (no cross-slug ordering needed).
  • Derivability: Not derivable. ClickEvents are the source facts. All analytics are derived from them.

Verdict: Primary object. Event class. Source of truth for all click analytics.


3. ClickAggregate #

Class: Derived view (computed from ClickEvents)

Description: Pre-computed counts and breakdowns (by day, country, referrer) for a slug. Serves dashboard queries without scanning billions of raw events.

Purity tests:

  • Ownership: Written by the analytics pipeline, not by any user action.
  • Evolution: Merge-friendly (counts are commutative: adding a new click increments a counter).
  • Ordering: No meaningful order on the aggregate itself.
  • Derivability: Fully derivable from ClickEvents by SELECT slug, date_trunc('day', occurred_at), country, count(*) FROM ClickEvent GROUP BY .... Can be rebuilt from scratch.

Verdict: NOT a primary object. Derived view. Must not be treated as source of truth.


4. User #

Class: Stable entity

Description: Account that owns ShortLinks. Relevant for authentication, ownership enforcement, and dashboard queries.

Fields: user_id, email, plan (FREE | PRO | ENTERPRISE), created_at

Verdict: Primary object. Standard account entity. Not special to the core redirect path.


5. SlugNamespace #

Class: Relationship / uniqueness domain

Description: The global set of slug strings that are already claimed. Not a table per se, but the uniqueness invariant enforced across all ShortLink records.

Purity tests:

  • Derivability: Derivable from ShortLink records (the set of all slugs in use). However, the uniqueness invariant must be enforced atomically at write time, not derived. The namespace IS the constraint surface.

Verdict: Not a standalone object. The uniqueness constraint belongs on ShortLink.slug as a unique index. The enforcement mechanism is a DP, not an object.


6. UserLinkIndex #

Class: Derived view (projection)

Description: The list of ShortLinks belonging to a user, ordered by created_at descending. Powers the user dashboard.

Derivability: Fully derivable from ShortLink records filtered by owner_user_id. Rebuild: SELECT * FROM ShortLink WHERE owner_user_id = ? ORDER BY created_at DESC.

Verdict: Derived view. Not a primary object. A secondary index on ShortLink.owner_user_id suffices.


Object Summary Table #

ObjectClassPrimary?Source of Truth For
ShortLinkStable entityYesRedirect target, ownership, status
ClickEventImmutable eventYesAll click analytics
ClickAggregateDerived viewNoNothing — projection only
UserStable entityYesAuthentication, plan
SlugNamespaceUniqueness constraintNo (constraint on ShortLink)Enforced by unique index
UserLinkIndexDerived viewNoNothing — secondary index

Step 3 — Axis Assignment #

Goal: For each primary object, assign ownership (who writes?), evolution (append/overwrite/state-machine/merge), and ordering (total/partial/none, bound to scope).

Ownership:   Multi-writer at creation (any authenticated user may create);
             single writer after creation (owner or admin modifies status).
             Slug assignment: system assigns auto-increment ID → Base62, or
             user claims custom slug (one winner per slug — CAS semantic).

Evolution:   State machine.
             Valid transitions:
               (none) → ACTIVE  [at creation]
               ACTIVE → EXPIRED [by scheduler when expires_at < now()]
               ACTIVE → DISABLED [by admin or owner]
               DISABLED → ACTIVE [by owner, premium only]
             target_url is immutable after creation (no valid transition modifies it).

Ordering:    Total order within slug (each slug is assigned once, atomically).
             Causal lifecycle order for status transitions (ACTIVE must precede EXPIRED).
             No cross-slug ordering needed.

ClickEvent #

Ownership:   Single writer (redirect service instance that handled the request).
             Multiple redirect service instances may write concurrently —
             but each individual event has a single, unambiguous writer.

Evolution:   Append-only. Events are immutable facts.

Ordering:    Total order by occurred_at within slug (for per-slug analytics).
             Approximate total order (clock skew across redirect nodes < 1ms
             acceptable for analytics — not a strict invariant).
             No ordering required across slugs.

User #

Ownership:   Single writer (the user themselves for profile; system for plan changes).

Evolution:   Overwrite for mutable fields (email, plan). Append-only for audit log.

Ordering:    No meaningful order among users.

Step 4 — Invariant Extraction #

Goal: Derive precise, testable, concurrency-aware invariants from the normalized requirements.

Invariant List #

I1. [Uniqueness] Slug uniqueness across ShortLinks. For any two distinct ShortLink records L1 and L2, L1.slug ≠ L2.slug. This holds at all times, including under concurrent creation requests.

I2. [Uniqueness / Idempotency] Duplicate creation requests produce the same ShortLink. If the same client submits the same creation request N times (same idempotency key), exactly one ShortLink is created. Subsequent submissions return the previously created record, not an error and not a second record.

I3. [Eligibility] Redirect returns the target URL only for ACTIVE ShortLinks. A redirect request for slug S returns HTTP 302 only if ShortLink(S).status == ACTIVE AND (expires_at IS NULL OR expires_at > now()). For EXPIRED or DISABLED slugs, the correct response is HTTP 410.

I4. [Eligibility] Custom slug creation succeeds only if the slug is unclaimed. A custom slug claim for slug S succeeds only if no ShortLink with slug == S currently exists. If another actor claims the same slug concurrently, exactly one claim succeeds.

I5. [Ordering] Status transitions follow the valid state machine. ShortLink.status may only transition via valid edges. Specifically: DISABLED → EXPIRED is forbidden. EXPIRED → DISABLED is forbidden without explicit re-activation first. Any transition that is not an edge in the defined state machine must be rejected.

I6. [Accounting] ClickAggregate(slug, window) = count(ClickEvents where slug = S and occurred_at in window). The aggregate click count for any slug and time window exactly equals the number of ClickEvent records in that window. The aggregate may be stale by at most ε = 60 seconds under normal operation.

I7. [Propagation] A newly created or status-changed ShortLink is reflected in the redirect path within ε. After a ShortLink is created or its status changes, the redirect service must return the updated result within ε_redirect = 5 seconds. (Cache TTL bound.)

I8. [Uniqueness] Each ClickEvent is processed exactly once in analytics aggregation. A ClickEvent that is written to the event stream is counted exactly once in ClickAggregate. Duplicate deliveries (from retry or at-least-once delivery) must be deduplicated before incrementing the aggregate.

I9. [Access-control] Only the ShortLink owner or admin may modify ShortLink.status or ShortLink.alias. Reads of target_url during redirect are unauthenticated. Writes to any ShortLink field require proof of ownership (owner_user_id match) or admin role.

I10. [Accounting] Auto-generated slugs are globally unique without retry. The slug generation process for auto-generated (non-custom) slugs must produce a collision-free slug deterministically, without requiring optimistic retry. (Base62 of auto-incremented ID satisfies this; random generation does not without collision checking.)


Step 5 — DP Derivation #

Goal: Identify the minimal enforcing mechanism (design parameter) per invariant cluster.

A DP is not a technology name. It is the minimal runtime capability required to make the invariant cluster enforceable.

Invariant ClusterDPReasoning
I1, I4 — Slug uniqueness (system-assigned and custom)Atomic conditional insert on slug as unique keyOnly one mechanism can guarantee that exactly one writer wins: an atomic insert that fails if the key already exists. No application-level check-then-insert is sufficient (race condition).
I2 — Idempotent creationIdempotency key storeA dedup table (idempotency_key → shortlink_id) with conditional insert. First write creates; subsequent writes with same key return cached result.
I3 — Redirect eligibility checkLow-latency key-value lookup with TTL-aware cacheThe redirect path reads ShortLink by slug. Must be sub-millisecond for cache hit. Requires a cache that can serve millions of reads/sec.
I5 — State machine enforcementCAS (compare-and-swap) on (status, version)A status transition is valid only from a specific source state. Concurrent transitions to conflicting states must fail. CAS on (current_status, version) enforces this atomically.
I6, I8 — Click counting with exactly-once semanticsAppend-only event log + idempotent consumerClick events are appended to a durable log. Consumers read the log and aggregate. Dedup on event_id ensures exactly-once processing despite at-least-once delivery.
I7 — Redirect cache freshnessTTL-bounded cache with invalidationCache entries for ShortLink records expire after ε_redirect = 5 seconds. On status change, explicit cache invalidation reduces lag to near-zero for known-changed keys.
I9 — Access controlToken-gated write pathEvery write request carries an authenticated session token. Service layer checks owner_user_id == authenticated_user_id before executing any mutation.
I10 — Collision-free auto-slugMonotonic global counter → Base62 encodingA single globally ordered counter (database sequence or distributed counter) guarantees uniqueness without collision probability. Base62 encoding produces short codes.

Step 6 — Mechanism Selection #

Goal: The mechanical bridge from DPs to concrete mechanisms. Apply the full discrimination procedure for the three key paths.

6.1 DP Classification by Invariant Type #

DPInvariant TypeMechanism Family
Atomic conditional insert (slug uniqueness)UniquenessLocking / CAS family
Idempotency key storeUniqueness / IdempotencyConditional insert / dedup table
Low-latency KV lookupPropagationCache family
CAS on status transitionsEligibility + OrderingCAS / optimistic locking family
Append-only event logAccountingLog / stream family
Idempotent consumerAccountingDedup + aggregation family
TTL-bounded cachePropagationCache family
Token-gated write pathAccess-controlAuth middleware family
Monotonic counter + Base62UniquenessSequential assignment family

6.2 Ownership × Evolution Table #

ObjectOwnershipEvolutionTable Result
ShortLink (creation)Multi-writer, one winner per slugState machine→ CAS on (slug, version)
ShortLink (status update)Single writer (owner/admin)State machine→ CAS on (status, version) to prevent concurrent conflicting transitions
ClickEventMulti-writer (many redirect nodes), all succeedAppend-only→ No CAS needed; append to partitioned log
ClickAggregateSingle writer (analytics consumer)Merge (commutative count)→ Idempotent increment with dedup (CRDT G-Counter semantics)

6.3 Detailed Mechanical Derivation #

Derivation A: Slug Uniqueness Enforcement (Invariant I1, I4, I10) #

Step 6.3.1 — Q1 (Scope): The uniqueness constraint is within one service (the ShortLink creation service). It is not cross-region partitioned (a slug must be globally unique, not per-region unique). Therefore scope = within service → distributed CAS.

But “distributed CAS” requires a serialization point. Two options:

  • Option A: Database unique index (the DB serializes inserts on the unique key). The database becomes the arbiter.
  • Option B: Distributed lock (acquire lock on slug string before inserting). Adds network roundtrip and failure surface.

Option A (DB unique index) is strictly superior for slug uniqueness because:

  • The slug IS the database key. No separate lock namespace.
  • The insert IS the CAS. If it fails (duplicate key error), the caller knows exactly one other writer won.
  • No lock timeout or holder crash to handle.

Step 6.3.2 — Q2 (Failure): Crash of the writer during creation: the row either committed or did not. No partial state. Network partition: the creation request fails; the client retries with an idempotency key.

Q2 → Idempotency Key required. Crash of the creation service after the DB insert but before returning to the client means the client retries. Without an idempotency key, retry would hit the unique index (duplicate key error) and the client would incorrectly report failure. With an idempotency key:

  • First attempt: INSERT INTO idempotency_keys (key, shortlink_id) first, then INSERT ShortLink.
  • Retry: SELECT from idempotency_keys returns existing shortlink_id → return success.

Step 6.3.3 — Q3 (Data): The slug for auto-generated links is computed as Base62(auto_increment_id). The auto-increment is the source of truth for ordering and uniqueness. Base62 encoding is deterministic and bijective over the integer domain used. No collision probability.

For custom slugs: the slug string is user-supplied. The unique index is the collision mechanism.

Step 6.3.4 — Required combination: CAS (unique index INSERT) always requires an Idempotency Key. Applied.

Final mechanism for slug uniqueness:

  1. Database sequence generates monotonically increasing link_id.
  2. slug = Base62(link_id) — deterministic, no collision, no retry needed.
  3. For custom slugs: INSERT INTO shortlinks (slug, ...) ON CONFLICT (slug) DO NOTHING — returns affected rows; if 0, slug was taken.
  4. Idempotency key table: idempotency_keys(key TEXT PK, shortlink_id BIGINT, created_at TIMESTAMP).

Derivation B: Click Event Pipeline (Invariants I6, I8) #

Step 6.3.1 — Q1 (Scope): Click events are written by many redirect service instances across many servers. This is multi-writer, append-only. The destination is a durable log. Scope = cross-service (redirect service → analytics service). This eliminates in-transaction handling. The mechanism family is async guaranteed delivery.

Step 6.3.2 — Q5 (Coupling): The redirect path must not block on analytics writes (10ms latency budget). Therefore coupling is async, guaranteed deliveryOutbox pattern or direct append to a durable log.

Outbox pattern (write to DB, relay to stream) is appropriate when the event must be durably captured in the same transaction as the primary state change. But for click events:

  • The primary state change IS the click (redirect happened). There is no transactional coupling to a DB row.
  • The redirect service already responded 302 to the client.
  • Therefore: direct append to a durable partitioned log (e.g., Kafka) is correct. No outbox needed.

Write-ahead logging (CDC) would require a DB write on every redirect — that DB write IS the bottleneck we are trying to avoid.

Step 6.3.3 — Q2 (Failure): Crash of redirect node after responding 302 but before appending to log: click is lost. Acceptable — analytics is eventually consistent (I6 allows ε = 60 seconds of staleness; losing a small fraction of clicks is an acceptable analytics approximation at this scale). If exactness is required, the redirect node can fire-and-forget to Kafka asynchronously before responding, accepting the tiny window of loss.

Duplicate delivery from Kafka at-least-once: I8 requires exactly-once counting. Mechanism: each ClickEvent has a event_id = UUID. The analytics consumer maintains a dedup window: processed_event_ids (Bloom filter or Redis SET with TTL). Before incrementing ClickAggregate, check event_id not in dedup set.

For ClickHouse (columnar OLAP): use ReplacingMergeTree or AggregatingMergeTree with event_id dedup key to handle at-least-once delivery natively.

Step 6.3.4 — Q3 (Data): Click counts are commutative and associative (count += 1). This is G-Counter CRDT semantics. Each analytics consumer shard can independently accumulate partial counts and merge. No serialization point needed for counting.

Step 6.3.5 — Q4 (Access): Analytics is read » write for the aggregation layer (10B events/month, queries from dashboards). The aggregate is a pre-computed materialized view. Pattern: CQRS — ClickEvent log is the write model; ClickAggregate tables are the read model.

Final mechanism for click event pipeline:

Redirect service
  → fire-and-forget async write to Kafka topic: click_events
  → partition key: slug (ensures per-slug ordering within partition)

Kafka consumer (analytics worker)
  → reads click_events topic at-least-once
  → deduplicates on event_id (Bloom filter in memory, 5-minute window)
  → batches 10,000 events
  → bulk-inserts ClickEvent rows to ClickHouse
  → ClickHouse ReplacingMergeTree merges in background

ClickHouse aggregate table
  → materialized view: SELECT slug, toDate(occurred_at), country, count()
    FROM click_events GROUP BY slug, date, country
  → refreshed every 60 seconds (satisfies ε = 60s from I6)

Derivation C: Redirect Lookup Optimization (Invariant I3, I7) #

Step 6.3.1 — Q1 (Scope): The redirect lookup is a read of ShortLink by slug. With 10B redirects/month (~3,800 req/sec average, 100K+ req/sec peak for viral URLs), this is read-heavy. Pattern: Cache-Aside.

Step 6.3.2 — Q4 (Access): Read » Write (10,000x). The correct pattern is a multi-layer cache:

  • Layer 1: CDN edge cache (geo-distributed) — serves redirects at the PoP nearest to the user. Eliminates 80%+ of traffic from reaching origin. Cache-Control headers with max-age=300 (5 minutes, matching I7’s ε_redirect = 5 seconds is too aggressive for CDN — use 30–300 seconds depending on expected change frequency; new links are rarely changed).
  • Layer 2: Redis in-memory cache — serves slugs not in CDN cache. Sub-millisecond. Cache TTL = 60 seconds for hot slugs. Capacity: 100M active slugs × 200 bytes = 20GB.
  • Layer 3: Postgres — authoritative source. Hit only on cache miss.

Step 6.3.3 — Hotspot problem: A viral URL (slug abc123) may receive millions of redirects/sec. All hit the same Redis key. This is the thundering herd / hotspot problem.

Mechanisms:

  1. CDN absorbs 95%+ of viral traffic — viral URLs have extremely high cache hit rates because many users in many regions access the same URL. CDN caches at edge.
  2. Local in-process cache in redirect service — each redirect service instance maintains a local LRU cache (10K entries, 1-second TTL). Viral slugs are served from process memory without Redis roundtrip.
  3. Redis read replicas — Redis Cluster with multiple read replicas per shard. Viral slug key can be read from any replica.

Step 6.3.4 — Q2 (Failure): Cache miss on viral URL going cold (TTL expires at CDN and Redis simultaneously):

  • Thundering herd: thousands of requests flood Redis/DB simultaneously.
  • Mechanism: probabilistic early re-caching (extend TTL before expiry for hot keys), OR request coalescing at Redis layer (SETNX-based mutex: first miss acquires “I am fetching” lock, others wait briefly then get result).

Step 6.3.5 — Cache invalidation on status change: When ShortLink.status changes (ACTIVE → DISABLED), cached entries must be invalidated:

  • Send invalidation message to Redis: DEL shortlink:{slug}.
  • CDN purge API call: PURGE https://bit.ly/{slug}.
  • Residual staleness = max(CDN purge propagation time, ~1–3 seconds).
  • Acceptable per I7 (ε_redirect = 5 seconds).

Step 6.3.6 — Required combination: Cache-Aside always needs a TTL (prevents stale-forever entries if invalidation fails). Applied: Redis TTL = 60 seconds, CDN max-age = 300 seconds (or purged on change).

Final mechanism for redirect lookup:

Client request: GET https://bit.ly/{slug}
  → CDN edge (layer 1):
      HIT: return 302, target_url, Cache-Control: max-age=300
      MISS: forward to redirect service

  → Redirect service (layer 2 — local process cache, 1s TTL LRU):
      HIT: return 302
      MISS: forward to Redis

  → Redis (layer 3 — 60s TTL):
      HIT: populate local cache, return 302
      MISS: coalesce concurrent fetches (SETNX lock)

  → Postgres (authoritative):
      SELECT target_url, status, expires_at FROM shortlinks WHERE slug = ?
      If status != ACTIVE or expires_at < now(): return 410
      Return 302, populate Redis, return to caller

Cache miss is a failure. The redirect path must never be allowed to become a DB read path under load. Redis must be provisioned to handle 100% of redirect traffic if CDN is bypassed.


Step 7 — Axiomatic Validation #

Source-of-Truth Table #

State DomainSource of TruthLocation
Slug → target_url mappingShortLink tablePostgres (primary)
ShortLink.statusShortLink tablePostgres (primary)
Click factsClickEvent tableClickHouse (append-only)
Click aggregatesClickAggregate materialized viewClickHouse (derived)
User accountsUser tablePostgres
Hot slug cacheRedis key shortlink:{slug}Redis (cache — not source of truth)
Idempotency keysidempotency_keys tablePostgres

Dependency table:

Derived ObjectDepends OnRebuild Path
ClickAggregateClickEventINSERT INTO click_agg SELECT slug, date, country, count(*) FROM click_events GROUP BY ...
Redis cache entryShortLink (Postgres)On cache miss: fetch from Postgres, repopulate Redis
UserLinkIndexShortLinkSELECT * FROM shortlinks WHERE owner_user_id = ? (secondary index)
CDN cached redirectShortLink (Postgres, via redirect service)Purge CDN + TTL expiry; rebuild on next request

Projections with rebuild paths:

  1. ClickAggregate rebuild: If ClickHouse is corrupted, replay ClickEvent log from Kafka (retain 30 days) → rebuild all aggregates. Duration: hours for full history, minutes for recent windows.

  2. Redis cache rebuild: If Redis is wiped, cache warms organically on first miss per slug. No explicit rebuild needed. Hot slugs repopulate within seconds under live traffic.

  3. UserLinkIndex: It is a secondary index on Postgres. If the index is dropped, it can be rebuilt in minutes with CREATE INDEX CONCURRENTLY.

Independence check:

  • ClickEvent is not derived from ShortLink. Click events reference slug as a foreign key (denormalized — slug may be deleted but click events are retained for historical analytics). This is correct: events are immutable facts; deleting the ShortLink does not delete the analytics history.
  • ClickAggregate is fully derivable from ClickEvent. No circular dependency.
  • Redis cache is fully derivable from Postgres. No circular dependency.

Step 8 — Algorithm Design #

Write Path 1: Short URL Creation (Auto-generated Slug) #

function createShortLink(request: CreateRequest, idempotency_key: string) -> ShortLink:

  // Step 1: Idempotency check
  existing = db.query(
    "SELECT shortlink_id FROM idempotency_keys WHERE key = $1",
    [idempotency_key]
  )
  if existing:
    return db.query("SELECT * FROM shortlinks WHERE id = $1", [existing.shortlink_id])

  // Step 2: Validate input
  if not isValidURL(request.target_url):
    raise InvalidURLError

  if request.expires_at != null and request.expires_at < now() + 60s:
    raise InvalidExpiryError  // must expire in the future

  // Step 3: Generate slug from auto-increment ID
  // The DB sequence guarantees monotonicity and uniqueness.
  // Base62 encoding: 0-9 = '0'-'9', 10-35 = 'a'-'z', 36-61 = 'A'-'Z'
  link_id = db.nextval('shortlinks_id_seq')
  slug = base62_encode(link_id)  // deterministic, no collision possible

  // Step 4: Insert ShortLink (cannot fail on slug uniqueness for auto-generated)
  db.execute("""
    INSERT INTO shortlinks (id, slug, target_url, owner_user_id, created_at, expires_at, status)
    VALUES ($1, $2, $3, $4, now(), $5, 'ACTIVE')
  """, [link_id, slug, request.target_url, request.user_id, request.expires_at])

  // Step 5: Record idempotency key atomically
  db.execute("""
    INSERT INTO idempotency_keys (key, shortlink_id, created_at)
    VALUES ($1, $2, now())
  """, [idempotency_key, link_id])

  // Step 6: Return created record
  return ShortLink{id: link_id, slug: slug, ...}

Idempotency: Steps 1–2 are atomic: the idempotency key is checked before any side effect. If the service crashes after step 4 but before step 5, the shortlink exists but the idempotency key is not recorded. On retry, step 4 would fail with a sequence ID conflict (different sequence value). Mitigation: wrap steps 4 and 5 in a single DB transaction.

Revised (correct) version:

BEGIN TRANSACTION
  link_id = nextval('shortlinks_id_seq')
  slug = base62_encode(link_id)
  INSERT INTO shortlinks (id, slug, ...)
  INSERT INTO idempotency_keys (key, shortlink_id, ...)
COMMIT

If the transaction rolls back, neither row is written. The client retries with the same idempotency key. The idempotency check at step 1 returns nothing, and the transaction is retried cleanly.


Write Path 2: Custom Slug Claim #

function claimCustomSlug(request: CustomSlugRequest, idempotency_key: string) -> ShortLink:

  // Step 1: Idempotency check (same as above)
  existing = checkIdempotency(idempotency_key)
  if existing: return existing

  // Step 2: Validate slug format
  if not isValidSlugFormat(request.slug):  // alphanum + hyphen, 3-50 chars
    raise InvalidSlugError

  // Step 3: Attempt atomic insert (CAS on slug uniqueness)
  BEGIN TRANSACTION
    result = db.execute("""
      INSERT INTO shortlinks (id, slug, target_url, owner_user_id, status, created_at)
      VALUES (nextval('shortlinks_id_seq'), $1, $2, $3, 'ACTIVE', now())
      ON CONFLICT (slug) DO NOTHING
      RETURNING id, slug
    """, [request.slug, request.target_url, request.user_id])

    if result.rowcount == 0:
      ROLLBACK
      raise SlugAlreadyTakenError(slug=request.slug)  // I4 enforced

    link_id = result.rows[0].id
    INSERT INTO idempotency_keys (key, shortlink_id, ...) VALUES (idempotency_key, link_id, now())
  COMMIT

  return ShortLink{slug: request.slug, ...}

State machine: ON CONFLICT (slug) DO NOTHING is the CAS. If two requests race for the same slug, Postgres serializes them at the unique index. Exactly one succeeds; the other sees rowcount == 0.


Write Path 3: Status Transition (Disable / Enable) #

function updateStatus(slug: string, new_status: Status, actor: User) -> ShortLink:

  // Step 1: Fetch current record
  link = db.query("SELECT id, status, version, owner_user_id FROM shortlinks WHERE slug = $1", [slug])
  if not link:
    raise NotFoundError

  // Step 2: Access control (I9)
  if link.owner_user_id != actor.user_id and not actor.is_admin:
    raise UnauthorizedError

  // Step 3: Validate state machine transition (I5)
  valid_transitions = {
    'ACTIVE': ['EXPIRED', 'DISABLED'],
    'DISABLED': ['ACTIVE'],
    'EXPIRED': []  // EXPIRED is terminal unless manually overridden by admin
  }
  if new_status not in valid_transitions[link.status]:
    raise InvalidTransitionError(from=link.status, to=new_status)

  // Step 4: CAS update on (status, version) to prevent concurrent conflicting writes
  result = db.execute("""
    UPDATE shortlinks
    SET status = $1, version = version + 1, updated_at = now()
    WHERE slug = $2 AND status = $3 AND version = $4
  """, [new_status, slug, link.status, link.version])

  if result.rowcount == 0:
    // Concurrent update; retry from Step 1 (optimistic locking)
    raise ConflictError  // caller should retry

  // Step 5: Invalidate cache
  redis.del(f"shortlink:{slug}")
  cdn.purge(f"https://bit.ly/{slug}")  // async, best-effort

  return fetchUpdated(slug)

Retry on conflict: The caller (service layer) retries up to 3 times with exponential backoff. Slug status changes are rare; conflict probability is negligible.


Read Path: Redirect #

function redirect(slug: string) -> HTTPResponse:

  // Layer 1: Process-local cache (LRU, 10K entries, 1-second TTL)
  cached = localCache.get(slug)
  if cached:
    if cached.status != 'ACTIVE' or cached.is_expired():
      return HTTP_410
    return HTTP_302(Location=cached.target_url)

  // Layer 2: Redis (60-second TTL)
  cached = redis.get(f"shortlink:{slug}")
  if cached:
    localCache.set(slug, cached, ttl=1s)
    if cached.status != 'ACTIVE' or cached.is_expired():
      return HTTP_410
    emit_click_event_async(slug, request)  // fire-and-forget to Kafka
    return HTTP_302(Location=cached.target_url)

  // Layer 3: Postgres (cache miss — this is a failure condition under load)
  // Use coalescing to prevent thundering herd
  lock_key = f"fetching:{slug}"
  acquired = redis.set(lock_key, '1', NX=true, EX=1)  // 1-second lock

  if not acquired:
    // Another request is fetching; wait briefly and check Redis again
    sleep(10ms)
    cached = redis.get(f"shortlink:{slug}")
    if cached:
      // Successfully coalesced
      ... (same as Layer 2 hit above)
    else:
      // Timeout; fall through to DB anyway
      pass

  link = db.query("""
    SELECT target_url, status, expires_at FROM shortlinks WHERE slug = $1
  """, [slug])

  redis.del(lock_key)

  if not link:
    redis.set(f"shortlink:{slug}", NEGATIVE_CACHE_SENTINEL, EX=60)  // negative cache
    return HTTP_404

  // Populate caches
  redis.set(f"shortlink:{slug}", serialize(link), EX=60)
  localCache.set(slug, link, ttl=1s)

  if link.status != 'ACTIVE' or (link.expires_at and link.expires_at < now()):
    return HTTP_410

  emit_click_event_async(slug, request)
  return HTTP_302(Location=link.target_url)

Negative caching: Non-existent slugs also get cached (with a sentinel value) to prevent DB hammering for 404s from bots.


Async Path: Click Event Emission #

// Fire-and-forget from redirect handler (runs in separate goroutine/thread)
function emit_click_event_async(slug: string, request: HTTPRequest):
  event = ClickEvent{
    event_id: uuid4(),       // globally unique, used for dedup in consumer
    slug: slug,
    occurred_at: now_utc(),
    country: geoip_country(request.client_ip),
    city: geoip_city(request.client_ip),
    referrer: request.headers.get('Referer', '')[:500],  // truncated
    user_agent: request.headers.get('User-Agent', '')[:500],
    ip_hash: sha256(request.client_ip)[:16]  // privacy-preserving
  }

  // Kafka append — partition by slug to preserve per-slug ordering
  kafka.produce(
    topic='click_events',
    key=slug,                // partition key
    value=protobuf_encode(event),
    acks='1'                 // leader ack only — async throughput over durability
  )
  // If Kafka produce fails, log and drop — analytics loss is acceptable
  // Do NOT block the redirect response on this

State Machine Diagram #

              CREATE
                │
                ▼
           ┌─────────┐
           │  ACTIVE  │◄────────────────────┐
           └─────────┘                      │ (admin re-activate)
               │   │                        │
     expires_at│   │owner/admin             │
       reached │   │ disables               │
               ▼   ▼                        │
           ┌─────────┐  ┌──────────────────┐│
           │ EXPIRED │  │    DISABLED      ├┘
           └─────────┘  └──────────────────┘
              (terminal  (owner can re-activate
               for users) if plan allows)

Step 9 — Logical Data Model #

CREATE TABLE shortlinks (
    id              BIGINT          PRIMARY KEY DEFAULT nextval('shortlinks_id_seq'),
    slug            VARCHAR(64)     NOT NULL,
    target_url      TEXT            NOT NULL,           -- max 8192 chars
    owner_user_id   BIGINT          NOT NULL REFERENCES users(id),
    status          VARCHAR(16)     NOT NULL DEFAULT 'ACTIVE'
                    CHECK (status IN ('ACTIVE', 'EXPIRED', 'DISABLED')),
    created_at      TIMESTAMPTZ     NOT NULL DEFAULT now(),
    expires_at      TIMESTAMPTZ,                        -- NULL = never expires
    updated_at      TIMESTAMPTZ     NOT NULL DEFAULT now(),
    version         BIGINT          NOT NULL DEFAULT 1, -- for CAS on status updates
    is_custom_slug  BOOLEAN         NOT NULL DEFAULT false,
    alias           VARCHAR(256),                       -- human-readable name
    custom_domain   VARCHAR(256),                       -- premium feature

    CONSTRAINT uq_slug UNIQUE (slug),                  -- I1, I4 enforcement
    CONSTRAINT uq_custom_domain_slug UNIQUE (custom_domain, slug)
);

-- Secondary index for user dashboard (I9, UserLinkIndex projection)
CREATE INDEX idx_shortlinks_owner ON shortlinks(owner_user_id, created_at DESC);

-- Index for expiry scheduler (TTL enforcement)
CREATE INDEX idx_shortlinks_expires ON shortlinks(expires_at) WHERE expires_at IS NOT NULL AND status = 'ACTIVE';

Partition key: slug — all redirect lookups are by slug. The unique index on slug IS the partition key for the redirect path.

Dedup key: id (sequence) for auto-generated slugs; slug (unique index) for custom slugs.


Table: idempotency_keys #

CREATE TABLE idempotency_keys (
    key             VARCHAR(128)    PRIMARY KEY,        -- client-supplied idempotency key
    shortlink_id    BIGINT          NOT NULL REFERENCES shortlinks(id),
    created_at      TIMESTAMPTZ     NOT NULL DEFAULT now()
);

-- TTL: rows older than 7 days can be deleted (cron job)
CREATE INDEX idx_idem_created ON idempotency_keys(created_at);

Table: users #

CREATE TABLE users (
    id              BIGINT          PRIMARY KEY DEFAULT nextval('users_id_seq'),
    email           VARCHAR(320)    NOT NULL UNIQUE,
    plan            VARCHAR(16)     NOT NULL DEFAULT 'FREE'
                    CHECK (plan IN ('FREE', 'PRO', 'ENTERPRISE')),
    created_at      TIMESTAMPTZ     NOT NULL DEFAULT now(),
    password_hash   TEXT            NOT NULL
);

Table: click_events (ClickHouse) #

-- ClickHouse DDL
CREATE TABLE click_events (
    event_id        UUID,
    slug            String,
    occurred_at     DateTime64(3, 'UTC'),
    country         LowCardinality(String),
    city            String,
    referrer        String,
    user_agent      String,
    ip_hash         FixedString(16)
)
ENGINE = ReplacingMergeTree(occurred_at)
PARTITION BY toYYYYMM(occurred_at)
ORDER BY (slug, occurred_at, event_id);
-- ReplacingMergeTree deduplicates on (slug, occurred_at, event_id) — I8 enforcement

Table: click_aggregates (ClickHouse Materialized View) #

CREATE MATERIALIZED VIEW click_agg_daily
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (slug, date, country)
AS SELECT
    slug,
    toDate(occurred_at)     AS date,
    country,
    countState()            AS click_count
FROM click_events
GROUP BY slug, date, country;

-- Query view:
CREATE VIEW click_agg_daily_view AS
SELECT slug, date, country, countMerge(click_count) AS clicks
FROM click_agg_daily
GROUP BY slug, date, country;

Step 10 — Technology Landscape #

Mapping procedure: capability → shape → specific product.

DPCapability RequiredShapeSelected ProductJustification
Atomic conditional insert, CAS updates, idempotency storeSerializable ACID transactions, unique indexesRelational OLTPPostgreSQL 16Proven unique index semantics; ON CONFLICT CAS; nextval() for sequence generation; wide ecosystem
Low-latency KV cache for redirect lookupSub-ms read, TTL, ~20GB dataset, high concurrencyIn-memory KV storeRedis 7 (Redis Cluster)<1ms p99 reads; TTL per key; cluster for horizontal read scaling; wide client support
Click event pipeline — durable ordered logHigh-throughput append, ordered per partition, at-least-once delivery, replayPartitioned durable logApache Kafka1M+ events/sec per partition; log retention for replay; partition-by-slug for ordering; mature ecosystem
Analytics storage — click events and aggregatesAppend-heavy writes, column-scan aggregation queries, time-series partitioningColumnar OLAPClickHouse1B+ rows/day insertions; ReplacingMergeTree for dedup; materialized views for pre-aggregation; 10–100x faster than Postgres for analytics
CDN edge redirect servingGeo-distributed caching, HTTP redirect serving, cache purge APICDNCloudflare (or Fastly)300+ PoPs; <5ms to 95% of world population; cache purge API; Cloudflare Workers for edge logic
Geo-IP resolution for click eventsIP → country/city mapping, <1ms latency, ~1GB datasetIn-process library with mmdbMaxMind GeoLite2 (mmdb)In-process lookup, no network roundtrip; updated weekly; covers 99%+ of IPs
Process-local cache in redirect serviceLRU eviction, 1s TTL, in-memoryIn-process LRUgo-cache / Caffeine (language-specific)Zero network roundtrip; fits in L2/L3 CPU cache for hot slugs
Expiry schedulerPeriodic TTL check and status transitionCron + DB querypg_cron (Postgres extension) or dedicated Go workerUPDATE shortlinks SET status='EXPIRED' WHERE expires_at < now() AND status='ACTIVE' — runs every 60 seconds

Step 11 — Deployment Topology #

Service Boundaries #

┌─────────────────────────────────────────────────────────────────────┐
│  CDN Edge (Cloudflare)                                              │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │  Redirect cache (max-age=300, surrogate-key={slug})          │   │
│  │  Cloudflare Workers: edge slug validation + cache miss proxy │   │
│  └─────────────────────────────────────────────────────────────┘   │
└──────────────────────────┬──────────────────────────────────────────┘
                           │ Cache miss only (~1-5% of traffic)
                           ▼
┌─────────────────────────────────────────────────────────────────────┐
│  Region: us-east-1 (Primary)        Region: eu-west-1 (Secondary)  │
│                                                                     │
│  ┌─────────────────┐   ┌─────────────────────────────────────┐    │
│  │  Redirect Service│   │  Write Service                      │    │
│  │  (stateless)    │   │  (ShortLink creation, status update) │    │
│  │  50 instances   │   │  10 instances                        │    │
│  │  Auto-scales    │   │                                      │    │
│  └────────┬────────┘   └──────────────────────┬──────────────┘    │
│           │                                    │                    │
│           ▼                                    ▼                    │
│  ┌─────────────────┐   ┌─────────────────────────────────────┐    │
│  │  Redis Cluster  │   │  PostgreSQL (Primary + 2 replicas)  │    │
│  │  6 shards       │   │  Primary: writes                    │    │
│  │  3 replicas each│   │  Replicas: dashboard reads          │    │
│  └────────┬────────┘   └──────────────────────────────────────┘   │
│           │                                                         │
│           ▼                                                         │
│  ┌─────────────────┐   ┌─────────────────────────────────────┐    │
│  │  Kafka Cluster  │──►│  Analytics Workers                  │    │
│  │  12 brokers     │   │  (Kafka → ClickHouse pipeline)      │    │
│  │  click_events   │   │  5 consumer instances               │    │
│  │  (slug-partitioned)│  └──────────────────────┬──────────────┘   │
│  └─────────────────┘                            ▼                  │
│                          ┌─────────────────────────────────────┐   │
│                          │  ClickHouse Cluster                 │   │
│                          │  3 shards, 2 replicas each          │   │
│                          └─────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────┘

Partition Topology #

ComponentPartition StrategyRationale
Postgres shortlinksSingle primary + read replicas (no sharding at initial scale)300M rows × 500 bytes = 150GB — fits one Postgres instance
Redis Cluster6 shards by slug (consistent hashing)Each shard holds ~3M hot slugs; horizontal scale
Kafka click_events48 partitions, partition key = slugPer-slug ordering; 48 consumers can process in parallel
ClickHouse3 shards, partition by monthTime-series queries partition-pruned by month

Failure Domains #

FailureScopeImpact
Single redirect service instance failsInstanceNone: load balancer removes from pool within 5s
Redis shard failure1/6 of hot slugsCache miss surge for affected slugs; DB absorbs for ~30s until Redis replica promotes
Postgres primary failureAll writesFailover to replica in ~30s; writes blocked during failover; reads continue from replica
Kafka broker failure1/12 of partitionsOther brokers take over; click events in-flight may be lost (analytics tolerance)
CDN PoP outageGeographic regionTraffic fails over to other PoPs; latency increase for affected region
ClickHouse node failureAnalytics onlyRedirects unaffected; analytics queries degrade until replica promotes

Step 12 — Consistency Model #

PathConsistency ModelReason
Short URL creation (write)Linearizable (strong)Postgres serializable transaction on unique slug index. Two concurrent requests for the same slug: exactly one commits.
Redirect lookup (cache hit)Bounded stale (ε = 60s Redis TTL + 300s CDN)Acceptable: a newly created URL may not be visible at CDN for up to 5 minutes. A disabled URL may still redirect for up to 60s.
Redirect lookup (cache miss, DB read)Linearizable readPostgres synchronous read from primary (or replica with synchronous_commit = remote_apply). Returns current state.
Status update visibilityBounded stale (ε = 5s post-invalidation)Redis DEL + CDN purge on status change. CDN purge propagates in ~1–3s. Redis DEL is synchronous.
Click event recordingEventualKafka at-least-once; ClickHouse async ingestion. Event appears in aggregates within ε = 60s.
Click aggregate stalenessEventual (ε = 60s)Materialized view refreshes every 60 seconds. Acceptable for analytics dashboards.
Dashboard (user link list)Eventually consistent (replica lag ε < 1s)Dashboard reads from Postgres replica. Replica lag < 100ms under normal conditions.

Reasoning for key choices:

The redirect lookup being bounded-stale is correct. The invariant I3 requires ACTIVE-only redirects, but the ε is set by operational requirements (not a hard safety guarantee). A 60-second window where a disabled URL still redirects is acceptable; a 60-second window where a newly created URL fails to redirect is acceptable. This is a product decision, not an architectural flaw.

The click event pipeline being eventual is explicitly sanctioned by invariant I6 (ε = 60s). Analytics is not in the critical path of any user action.


Step 13 — Scaling Model #

Scale Type Classification #

ComponentScale TypePrimary Bottleneck
Redirect pathRead-heavy + hotspot-heavyViral slug concentrates millions of req/sec on single cache key
URL creationWrite-heavy (but low absolute volume)Sequence generation is single-threaded in Postgres
Click event pipelineWrite-heavy, fanout-heavy10B events/month → Kafka producer throughput
Analytics aggregationAggregation-heavyClickHouse query scan over billions of rows

Hotspot Keys #

HotspotLocationMechanism
Viral slug (e.g., Super Bowl ad link)Redis key + CDN URLMulti-layer cache; CDN absorbs 99%; local process cache absorbs most of remainder
Postgres shortlinks_id_seqPostgresSequence generation is fast (<1μs per nextval); batching nextval in groups of 1000 reduces contention
Kafka slug partitionPartition for viral slugIf one slug gets 100K/sec, its Kafka partition is a bottleneck for the consumer (analytics, not redirect). Redirect does not use Kafka on the hot path.

Scaling Strategies #

Redirect service (read-heavy):

  • Horizontal scale: stateless Go/Rust service; 50 instances behind L7 load balancer.
  • Process-local LRU absorbs viral slugs at CPU speed.
  • CDN absorbs geo-distributed traffic before it reaches the origin.
  • Redis Cluster scales read capacity with additional replicas per shard.

URL creation (low-volume write):

  • Single Postgres primary is sufficient for 115 writes/sec.
  • If write rate grows 100x: switch to per-region Postgres with UUID-based IDs (no cross-region sequence required). Custom slugs still require a global uniqueness check (cross-region coordination via a global shard or pessimistic reservation).

Click event ingestion (high-volume append):

  • Kafka scales horizontally; 48 partitions × 1M events/sec/partition = 48M events/sec capacity.
  • At 10B/month = ~3,800 events/sec, current capacity is 12,000x surplus.
  • ClickHouse ingestion scales with more shards.

Analytics query serving (aggregation-heavy):

  • Pre-aggregate with ClickHouse materialized views. Dashboard queries hit the aggregate, not raw events.
  • Cache dashboard results in Redis with 60-second TTL.
  • For enterprise customers with custom date ranges: ClickHouse ad-hoc query over raw event table.

Step 14 — Failure Model #

Failure Taxonomy #

Failure ModeCan It Happen?Correct BehaviorMechanismRecovery
Duplicate slug creation (two concurrent requests for same custom slug)Yes — two users simultaneously claim bit.ly/launchOne succeeds (HTTP 201), one fails (HTTP 409 SlugTaken)Postgres unique index, ON CONFLICT DO NOTHING returns 0 rowsClient-side: show error to losing requester
Duplicate click event counted twice in analyticsYes — Kafka at-least-once delivery; consumer crashes mid-batchExactly one count per physical clickReplacingMergeTree + consumer dedup on event_idSelf-healing: ClickHouse dedup occurs at merge time
Cache miss storm on viral URL (CDN + Redis TTL expire simultaneously)Yes — TTL expiry is a sharp boundaryDB absorbs spike; does not crashRequest coalescing via Redis SETNX lock; CDN staggered TTL (CDN cache TTL > Redis TTL)Redis TTL = 60s; CDN TTL = 300s; they expire at different times reducing simultaneity
Redirect service returns expired URL as ACTIVEYes — cache entry is stale during ε windowClient receives 302 to a URL that should have returned 410Cache TTL bounds the window; expiry scheduler flips status; Redis invalidation on status changeSelf-healing within ε = 60s; explicit invalidation on EXPIRED transition
Postgres primary failure (crash)YesWrites blocked; reads continue from replicaPostgres streaming replication + automatic failover (Patroni/Pgbouncer)Replica promotes in ~30s; write path unavailable during promotion
Kafka broker failureYesClick events produced to that broker’s partition are buffered or droppedKafka replication factor 3; producer retries for ~5sSurviving brokers take over within seconds; events in-flight during failure may be lost (analytics tolerance)
Analytics consumer crashes mid-batchYes — consumer commits Kafka offset after writing to ClickHouseEvents in uncommitted batch are re-read and re-processedKafka consumer group offset management; ClickHouse dedup on event_idAuto-restart consumer; dedup prevents double-counting
GeoIP lookup failureYes — mmdb file becomes unavailableClick event recorded with empty country fieldFail-open: emit event with country = ''; do not block redirectGeoIP is optional metadata; blank is acceptable
Redis cluster split-brain during network partitionYesReads may return stale data; writes to both sides of partitionRedis Cluster majority-quorum for writes; minority nodes reject writesPost-partition: minority side cache invalidated; repopulates from Postgres
Idempotency key collision (different clients generate same key)Astronomically unlikely with UUID4Second client’s request is treated as a retry of the firstUUID4 has 2^122 space; collision probability for 1B requests = 10^-28Not a practical concern
Short URL target is a malicious or phishing URLAlways possibleRedirect proceeds (Bit.ly is not a content filter by default)Rate-limit creation per user; optional URL scanning integration (VirusTotal API async)Flag URL; admin can DISABLE the ShortLink

Step 15 — SLOs #

Redirect Path (Hot Path) #

MetricTargetMeasurement Method
P95 redirect latency (CDN hit)< 5msCDN edge timing headers
P99 redirect latency (CDN hit)< 15msCDN edge timing headers
P95 redirect latency (Redis hit, CDN miss)< 10msServer-side histogram in redirect service
P99 redirect latency (Redis hit, CDN miss)< 25msServer-side histogram
P95 redirect latency (DB hit, cache miss)< 100msServer-side histogram
P99 redirect latency (DB hit, cache miss)< 200msServer-side histogram
Redirect availability> 99.99% (52 min/year downtime)Synthetic probes every 10s from 5 regions
Redirect correctness (correct 302 target)> 99.999%Automated canary: create URL, follow redirect, verify target

A cache miss is a latency failure, not a correctness failure. P99 of 25ms for Redis-hit cases is the operational target, not 200ms.

Write Path (Creation) #

MetricTarget
P95 URL creation latency< 200ms
P99 URL creation latency< 500ms
Slug uniqueness correctness100% — no two shortlinks may share a slug
Idempotency correctness100% — same idempotency key returns same result
Creation availability> 99.9% (8.7 hours/year downtime)

Analytics Path #

MetricTarget
Click count staleness< 60 seconds for 99th percentile
Dashboard load latency (P95)< 2 seconds
Click count accuracy> 99.9% of actual clicks counted (0.1% loss acceptable for analytics)
Analytics availability> 99.5% (43 hours/year downtime — analytics is non-critical)

Throughput #

MetricSustainedPeak (10x)
Redirects4,000 req/sec40,000 req/sec
URL creations120 req/sec1,200 req/sec
Click event ingestion4,000 events/sec40,000 events/sec

Step 16 — Operational Parameters #

Every tunable lever with its range and effect.

ParameterLocationDefaultRangeEffect if IncreasedEffect if Decreased
redis_ttl_secondsRedirect service config6010–3600Fewer DB reads; more stale data servedMore DB reads; fresher data
cdn_max_age_secondsCDN cache rule30030–3600Fewer origin hits; more stale data at edgeMore origin hits; fresher edge data
local_cache_size_entriesRedirect service10,0001K–100KMore memory per pod; fewer Redis hitsLess memory; more Redis hits
local_cache_ttl_secondsRedirect service10.1–10Longer stale window for viral slugs; fewer Redis hitsMore Redis hits; fresher data
kafka_producer_acksClick event producer1 (leader ack)0, 1, allall: durability, more latency; 0: fire-forget, possible lossLower acks = lower latency, more data loss risk
kafka_consumer_batch_sizeAnalytics consumer10,0001K–100KLarger ClickHouse inserts (more efficient); higher lagMore frequent small inserts; lower lag
clickhouse_merge_intervalClickHouse config600s60–3600sLess dedup merge frequency; more storage used; faster ingestMore frequent merges; more CPU; faster dedup
postgres_max_connectionsPostgres20050–1000More concurrent queries; more memory per connectionConnection starvation under load
idempotency_key_retention_daysCron job71–30More storage; longer idempotency windowLess storage; shorter idempotency window
expiry_scheduler_interval_secondsExpiry worker6010–300Less frequent expiry; expired links may redirect briefly past deadlineMore frequent; lower lag but more DB load
redis_coalesce_lock_ttl_msRedirect service1000100–5000More waiting during coalesce; prevents more thundering herdFaster fallback to DB; less coalescing benefit
redis_cluster_read_replicas_per_shardRedis Cluster21–5More read capacity per shard; more memoryLess read capacity

Step 17 — Runbooks #

Runbook R1: Viral URL Cache Miss Storm #

Trigger: redirect_db_hit_rate > 5% for more than 2 minutes. (Normally < 0.1%.)

Diagnosis:

  1. Check redis_keyspace_miss_rate metric — high indicates cache pressure.
  2. Check top_slugs_by_request_rate dashboard — identify the viral slug.
  3. Check Redis memory usage — if near 100%, eviction is happening.

Mitigations (in order):

  1. Immediate: Force Redis re-population for the viral slug:
    redis-cli SET shortlink:{slug} {value} EX 3600
    
    This sets a 1-hour TTL, giving time to address root cause.
  2. If Redis memory full: Increase Redis maxmemory by adding a read replica to the affected shard.
  3. If DB is overwhelmed: Enable read connection pooling on PgBouncer; scale out DB read replicas.
  4. Long-term: Increase redis_ttl_seconds for slugs with request_rate > 10,000/min.

Recovery signal: redirect_db_hit_rate returns below 0.5%.


Runbook R2: Postgres Primary Failover #

Trigger: PagerDuty alert postgres_primary_unreachable.

Immediate action:

  1. Do NOT manually intervene for 60 seconds — Patroni automatic failover is running.
  2. Verify Patroni status: patronictl -c /etc/patroni/config.yml list
  3. If automatic failover completes: verify replica is now primary, application connections have reconnected via PgBouncer.
  4. If failover has not completed after 90 seconds: manually promote the replica:
    patronictl failover bitly-postgres --master {old_primary} --candidate {replica}
    

During failover (30–90 seconds):

  • Redirect traffic: unaffected (Redis serving most requests).
  • URL creation: fails with 503. Client should retry with idempotency key.
  • Dashboard reads: may fail or return stale data.

Post-failover:

  1. Verify new primary accepts writes: INSERT INTO shortlinks ... ON CONFLICT DO NOTHING.
  2. Verify replication lag on new replica: SELECT now() - pg_last_xact_replay_timestamp().
  3. Monitor for Redis cache miss increase (new primary may be slower initially).
  4. File incident report and investigate old primary.

Runbook R3: Analytics Lag Spike #

Trigger: click_aggregate_lag_seconds > 120 for more than 5 minutes.

Diagnosis:

  1. Check Kafka consumer lag: kafka-consumer-groups.sh --describe --group analytics-consumer
  2. Check ClickHouse insert queue depth.
  3. Check analytics worker CPU and memory.

Mitigations:

  1. If Kafka lag growing: Scale out analytics consumer instances (add 2–3 more).
  2. If ClickHouse insert slow: Reduce kafka_consumer_batch_size (smaller batches insert faster under write pressure).
  3. If analytics worker OOM: Increase pod memory limit.
  4. If ClickHouse node down: Check ClickHouse replica status; failover to replica.

Recovery signal: click_aggregate_lag_seconds < 60 for 10 minutes.

User impact: Analytics dashboards show counts up to lag seconds stale. No impact on redirects.


Runbook R4: Slug Uniqueness Violation (Invariant Breach) #

Trigger: duplicate_slug_count_in_postgres > 0 (this should never fire; it indicates a bug).

Immediate actions:

  1. Freeze all write traffic to the creation service.
  2. Run: SELECT slug, count(*) FROM shortlinks GROUP BY slug HAVING count(*) > 1.
  3. For each duplicate slug: determine which is the legitimate record (lowest id = created first).
  4. Rename the duplicate to a new auto-generated slug.
  5. Notify affected user of the slug change.

Root cause investigation: This invariant cannot be violated if the unique index exists. Check: \d shortlinks to verify CONSTRAINT uq_slug UNIQUE (slug) is present. If missing, re-add immediately:

CREATE UNIQUE INDEX CONCURRENTLY uq_slug ON shortlinks(slug);

Step 18 — Observability #

Metrics #

MetricComponentTypeAlert Threshold
redirect_latency_p95_msRedirect serviceHistogram> 25ms for > 2 min
redirect_latency_p99_msRedirect serviceHistogram> 100ms for > 2 min
redirect_cache_hit_rateRedirect serviceGauge< 95% for > 5 min
redirect_db_hit_rateRedirect serviceGauge> 5% for > 2 min
redis_memory_used_bytesRedis ClusterGauge> 85% of maxmemory
redis_keyspace_misses_per_secRedis ClusterCounter> 1000/sec
postgres_replication_lag_secondsPostgres replicasGauge> 10s
postgres_connections_activePostgres primaryGauge> 180 (of 200)
kafka_consumer_lag_recordsAnalytics consumerGauge> 100,000 records
click_events_produced_per_secRedirect serviceCounterAlert on sharp drop: < 50% of 5-min avg
clickhouse_insert_errors_per_minAnalytics workerCounter> 10
short_url_creation_rate_per_secWrite serviceCounterAlert on >10x normal (abuse detection)
slug_collision_rateWrite serviceCounter> 0 for custom slugs (expected 0 on success path)
http_5xx_rateAll servicesCounter> 1% for > 1 min
cdn_origin_hit_rateCDNGauge> 10% of total CDN traffic

Distributed Traces #

Every redirect request carries a trace_id header (OpenTelemetry W3C TraceContext). Spans emitted:

SpanComponentKey Attributes
redirect.handleRedirect serviceslug, cache_layer (local/redis/db), status_code
redis.getRedis clientkey, hit (bool), latency_ms
postgres.selectDB clienttable=shortlinks, rows_returned, latency_ms
kafka.produceKafka clienttopic=click_events, partition, latency_ms
geoip.lookupGeoIP modulecountry, latency_us

Structured Logs (Sampled) #

{
  "ts": "2026-04-01T12:00:01.234Z",
  "level": "info",
  "event": "redirect",
  "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
  "slug": "AbCd3F",
  "cache_layer": "redis",
  "status_code": 302,
  "latency_ms": 3.2,
  "country": "US",
  "referrer_domain": "twitter.com"
}

Log sampling: 1% of redirects for hot slugs; 100% for cache misses and errors.

Dashboards #

  1. Redirect health dashboard: P50/P95/P99 latency, cache hit rates by layer, error rate, top 10 slugs by request rate.
  2. Analytics pipeline dashboard: Kafka consumer lag, ClickHouse insert rate, aggregate staleness.
  3. Write path dashboard: URL creation rate, custom slug collision rate, Postgres write latency.
  4. Infrastructure dashboard: Redis memory, Postgres replication lag, Kafka broker health.

Step 19 — Cost Model #

Growth Drivers per Component #

ComponentPrimary Cost DriverUnit CostProjected Monthly Cost at Scale
Postgres (primary + 2 replicas)Storage (150GB + growth) + compute for IOPS~$500/node~$1,500/month (3 nodes)
Redis Cluster (6 shards × 3 nodes)Memory (20GB/shard for hot slug cache)~$300/node (r7g.large)~$5,400/month (18 nodes)
Redirect service (50 instances)CPU (high req/sec, mostly in-memory work)~$100/instance (c6g.medium)~$5,000/month
Kafka (12 brokers + ZooKeeper)Network throughput + storage (30-day retention)~$400/broker~$4,800/month
ClickHouse (3 shards × 2 nodes)Storage (time-series event data) + compute for merges~$600/node~$3,600/month
Analytics workers (5 instances)CPU (Kafka consume + ClickHouse insert)~$100/instance~$500/month
CDN (Cloudflare)Bandwidth + requests$0.01/GB + $1/M requests~$2,000/month at 10B redirects
Total~$22,800/month

Cost Growth Model #

Growth EventPrimary Cost DriverComponent That Scales
10x redirect trafficCDN bandwidth scales linearlyCDN cost grows 10x (~$20K)
10x URL creation ratePostgres write IOPSMay require sharding or larger instance (~$3K)
10x click event volumeKafka storage + ClickHouse storageStorage-linear growth (~$10K)
10x active slug countRedis memory (200 bytes × 10x slugs)More Redis nodes (~$15K)

Dominant cost at current scale: Redis (24%) and redirect service compute (22%). CDN becomes dominant at 10x traffic.

Cost optimization levers:

  1. Increase CDN cache hit rate → reduces redirect service instance count.
  2. Use slug access frequency to evict cold slugs from Redis → reduce Redis tier cost.
  3. ClickHouse data tiering: move partitions older than 90 days to S3-backed cold storage.

Step 20 — Evolution #

Stage 1: MVP (0 → 1M short URLs, < 100 req/sec) #

Architecture: Single Postgres, single Redis, no Kafka.

Click analytics: Write directly to Postgres click_events table (synchronous write during redirect). Not viable at scale but acceptable at < 100 req/sec.

Upgrade signal: P99 redirect latency > 100ms (DB bottleneck), or Postgres write IOPS > 80% capacity.

Changes needed to advance:

  • Introduce Kafka for async click event pipeline (decouple analytics from redirect path).
  • Introduce Redis as a caching layer.

Stage 2: Growth (1M → 100M short URLs, 100 → 5,000 req/sec) #

Architecture: Postgres primary + replicas, Redis Cluster (2 shards), Kafka + ClickHouse analytics pipeline, CDN integration.

Click analytics: Async via Kafka → ClickHouse. Redirect path has zero analytics latency.

Upgrade signal: Redis memory > 80% capacity, Postgres replication lag > 1s under load, CDN origin hit rate > 20%.

Changes needed to advance:

  • Expand Redis cluster to 6 shards.
  • Introduce process-local cache in redirect service (reduces Redis load by 10x for viral slugs).
  • Consider geo-distributed deployment for sub-50ms latency in non-US regions.

Stage 3: Scale (100M → 1B short URLs, 5,000 → 50,000 req/sec) #

Architecture: Multi-region active-active, Postgres with read replicas per region (writes to primary region only), Redis per-region, Kafka cross-region replication.

Custom slug challenge: Custom slug uniqueness across regions requires a global uniqueness layer. Options:

  • Option A: Route all custom slug creation requests to a single “slug authority” region (single writer, globally consistent). Acceptable if creation latency is not latency-sensitive.
  • Option B: Use a global distributed key-value store (e.g., Google Spanner) as the slug uniqueness arbiter. Higher operational complexity.

Upgrade signal: Single-region creation latency > 500ms (DB round-trip from non-US users), or Postgres max connections reached.

Changes needed to advance:

  • Introduce global slug authority service.
  • Consider Postgres sharding for shortlinks table (horizontal partition by slug hash range).
  • Introduce auto-generated slug pre-allocation (batch fetch sequence ranges from DB, generate slugs locally in redirect service).

Stage 4: Hyperscale (> 1B short URLs, > 100K req/sec) #

Architecture: Full multi-region active-active with CRDT-based slug reservation, ClickHouse sharded to 20+ nodes, Redis Federation, CDN custom logic (Cloudflare Workers) for edge-side slug validation.

Key architectural evolution:

  • Redirect path fully served at CDN edge with Cloudflare Workers reading from Cloudflare KV (edge KV store, not Postgres). Postgres is the source of truth but is not in the hot path at all.
  • Click events processed at the CDN edge before returning to origin.

Upgrade signal: Cross-region DB roundtrip > 200ms for slug uniqueness check, or CDN origin cost exceeds $50K/month.


Summary: Upgrade Decision Matrix #

SignalUpgrade Action
redirect_db_hit_rate > 5%Expand Redis capacity; increase CDN TTL
postgres_write_iops > 80%Add read replicas; introduce Kafka analytics pipeline
redis_memory_used > 80%Add Redis shards; evict cold slugs (access-frequency TTL)
cdn_origin_hit_rate > 20%Increase CDN TTL; audit cache invalidation frequency
creation_latency_p99 > 500msShard Postgres; pre-allocate slug ranges
analytics_lag > 5minScale out Kafka consumers; increase ClickHouse shards
Single-region creation latency > 1s for global usersMulti-region active-active + global slug authority

End of Bit.ly system design derivation. All 20 steps have produced explicit output artifacts. The design is derivable from the invariants; the invariants are derivable from the normalized requirements; the normalized requirements are derivable from the product requirements. No design choice is unmotivated.

There's no articles to list here yet.