Skip to main content
  1. System Design Components/

DNS Resolver / Authoritative DNS Analysis Note

DNS Resolver / Authoritative DNS Analysis Note #

This note captures the full step-by-step analysis for a DNS system that includes both authoritative serving and recursive resolution/caching: zone truth, record updates, delegation, cache state, and bounded-stale resolver responses.

Step 1 — Normalize #

Assume the baseline prompt is:

  • design a DNS system supporting authoritative DNS and recursive resolution
  • zone owners update DNS records
  • authoritative servers answer from zone truth
  • recursive resolvers cache upstream answers subject to TTL
  • delegation and NXDOMAIN/negative caching matter
  • system scales globally
RequirementActorOperationState touchedPriority
Zone owner creates/updates DNS recordClientoverwrite stateS1
update target
ZoneRecord
C1
Client queries authoritative DNS for name/typeClientread sourceS1
read source target
ZoneRecord
C1
Client queries recursive resolver for name/typeClientread projectionS1
read projection target
ResolverCacheEntry
C1
Resolver fetches missing/expired answer from upstream/authoritativeSystemasync processS1
hidden write target
ResolverCacheEntry
C1
Zone owner updates delegation / NS recordsClientoverwrite stateS1
update target
DelegationState
C1
System publishes zone snapshot to authoritative serversSystemasync processS1
hidden write target
AuthoritativeSnapshot
C1
Client reads DNS status/analyticsClientread projectionS1
read projection target
DNSStatusView
R2

Notes on normalization:

  • authoritative record updates are overwrite-state config changes
  • authoritative answer path is read source
  • recursive answer path is read projection
    • resolver cache is a derived projection of authoritative/upstream truth
  • recursive refresh is async process
  • delegation state is explicit because it changes resolution path and authority
  • authoritative snapshot propagation is control-plane dissemination

This system is a composition of:

  • Origin Projection + Edge Delivery Plane
  • Control Plane + Data Plane

with authoritative DNS as source-of-truth serving and recursive DNS as cache/projection serving.

Step 2 — Critical Path Selection #

RequirementPriority classWhy
Update DNS recordC1changes authoritative naming truth
Answer authoritative queryC1wrong answer breaks name resolution truth
Answer recursive queryC1resolver must obey TTL/bounded staleness rules
Refresh resolver cache from upstreamC1resolver correctness depends on valid refresh path
Update delegation stateC1changes authority chain and resolution correctness
Publish authoritative snapshotC1stale auth servers can serve wrong zone data
Read DNS status/analyticsR2operational only

Critical paths:

  • P1 update zone record
  • P2 answer authoritative query
  • P3 answer recursive query
  • P4 refresh resolver cache
  • P5 update delegation state
  • P6 publish authoritative snapshot

Step 3 — Primary State Extraction #

Candidate object labelCandidate sourceCandidate needed for C1/R1?Candidate decomposition actionClassPrimary?OwnerEvolutionScope kindScope value
ZoneRecorddirect nounYeskeep as candidateentityYesserviceoverwriterelationzone_id + name + type
DelegationStatedirect nounYeskeep as candidateentityYesserviceoverwriteinstancezone_id or delegation_scope
AuthoritativeSnapshothidden write targetYeskeep as candidateprojectionYesserviceoverwriteinstanceauth_server_id + zone_id
ResolverCacheEntryhidden write targetYeskeep as candidateprojectionYesserviceoverwriterelationresolver_id + qname + qtype
NegativeCacheEntryhidden write targetNosplit candidateprojectionYesserviceoverwriterelationresolver_id + qname + qtype
DNSStatusViewderived read modelNoreject as UI artifactprojectionNoderivedoverwritecollectionzone_id or resolver_cluster

Minimal primary set:

  • ZoneRecord
  • DelegationState
  • AuthoritativeSnapshot
  • ResolverCacheEntry

Important modeling choices:

ZoneRecord #

Primary because:

  • it is the authoritative naming truth for a zone

DelegationState #

Primary because:

  • NS/delegation records determine which authority is responsible

AuthoritativeSnapshot #

Worth modeling because:

  • authoritative servers typically serve from propagated zone snapshots, not from a central DB on every query
  • versioning/freshness matter

ResolverCacheEntry #

Primary enough to model explicitly because:

  • recursive serving correctness depends on current cached answer, TTL, and validation state

NegativeCacheEntry #

Can be folded into ResolverCacheEntry with a result type if desired.

Step 4 — Hard Invariants #

PathTierTypeInvariant statement
P1 update zone recordHARDorderingZone-record revisions are ordered by monotonic version within (zone_id, name, type).
P2 answer authoritative queryHARDeligibilityanswer_authoritative_query is valid only if returned record/delegation data is derived from current authoritative zone snapshot for that zone scope.
P3 answer recursive queryHARDfreshnessRecursive answer reflects authoritative/upstream truth within TTL and cache-validation rules for (qname, qtype).
P4 refresh resolver cacheHARDaccountingResolverCacheEntry equals function of current upstream/authoritative answer and TTL semantics modulo bounded refresh delay.
P5 update delegation stateHARDorderingDelegation revisions are ordered by monotonic version within delegation scope.
P6 publish authoritative snapshotHARDfreshnessAuthoritativeSnapshot(auth_server, zone) reflects authoritative zone truth within configured propagation bound.

What matters most:

  • authoritative answers must come from correct zone version
  • recursive answers must obey TTL / negative caching / refresh rules
  • delegation changes must move monotonically
  • auth server snapshots must converge to current zone truth

Step 5 — Execution Context #

FieldValueWhy
Topologysingle service distributedone logical DNS platform with authoritative and recursive serving fleets
Write coordination scopeper object scopecorrectness is per zone record, delegation scope, auth snapshot, and resolver cache entry
Read consistency targetbounded stale allowedrecursive caches and authoritative snapshots are naturally bounded-stale projections
Holder modelnoneno lease-like per-request ownership is central
Compensation acceptable?Nowrong DNS answers are not compensable correctness failures

Derived:

  • bounded_staleness_allowed = true
  • exclusive_claim_required = false
  • guarded_by_current_state = true

This implies:

  • authoritative zone truth plus propagated serving snapshots
  • bounded-stale recursive cache serving
  • versioned zone/snapshot propagation

Step 6 — Deterministic Mechanism Selection #

PathWrite shapeBase mechanismRequired companions
P1 update zone recordoverwrite current valueCAS on versionzone version
P4 refresh resolver cacheoverwrite current valuesingle writer cache refresh or CAS on versionTTL / expiry metadata
P5 update delegation stateoverwrite current valueCAS on versiondelegation version
P6 publish authoritative snapshotoverwrite current valuesingle writer snapshot publicationsnapshot version

Hot paths:

  • P2 authoritative answer is a read path
  • P3 recursive answer is a read path

Why these fit:

  • zone and delegation data are authoritative current-state config
  • recursive cache entries are current-state projections with TTL
  • authoritative snapshots are versioned propagated projections

Step 7 — Read Model / Source of Truth #

ConceptTruthRead pathRebuild path
C1 zone record truthZoneRecordread source directlyauthoritative zone store
C2 delegation truthDelegationStateread source directlyauthoritative delegation store
C3 authoritative server local zone viewAuthoritativeSnapshotmaterialized viewrebuild from latest zone/delegation truth
C4 recursive cache entryResolverCacheEntrymaterialized viewrefresh from upstream/authoritative answer
C5 DNS status/analyticsderivedmaterialized viewrecompute from primary state

Important point:

  • authoritative query path should read local authoritative snapshot
  • recursive query path should read local resolver cache entry if valid
  • neither hot path should synchronously depend on central control-plane state

Step 8 — Failure Handling #

PathRetryCompeting writersCrash after commitPublish failureStale holder
zone/delegation updateretry with versionstale update loses CAScommitted zone state survives crash if persistedsnapshot propagation may lagn/a
authoritative snapshot publicationretry with snapshot versionolder snapshot loses to newer versionauth server keeps last good snapshot until refreshfailed push retried or pulledn/a
recursive cache refreshretry with current TTL/validation metadatalatest valid refresh winscache refresh crash just delays refresh; next query/refresher retriesupstream fetch may fail and cache may serve stale/negative answer per policyn/a
authoritative answerclient retries are network-levelmultiple auth nodes can answer from same zone snapshot versionone auth node crash affects only local requestsn/astale snapshot bounded by propagation/version discipline
recursive answerclient retries are network-levelmany resolvers can answer from their own valid cache entriesone resolver crash affects local cache onlyupstream lookup retry/backoffstale cache bounded by TTL/refresh policy

What matters most:

  • authoritative snapshots move monotonically by zone version
  • recursive caches obey TTL and negative caching rules
  • stale answers are bounded by explicit freshness policy

Step 9 — Scale Adjustments #

HotspotTypeFirst response
massive global query volumeread hotspotadd more anycast edge/resolver capacity and keep hot paths local
hot zone record churnfan-out hotspotbatch zone updates and publish incremental zone snapshots
cache-miss storms on popular namescontention hotspotrequest coalescing and stale-while-revalidate patterns
negative-cache churnread hotspotcompact negative cache representation and respect TTLs carefully
snapshot fanout to auth fleetfan-out hotspotincremental zone diff propagation and pull-on-version-miss
analytics/status readsread hotspotderived views only

What scales well:

  • authoritative and recursive serving both scale horizontally with local snapshots/caches
  • central zone truth remains narrow and versioned

What fails first:

  • cache-miss storms
  • zone update fanout bursts
  • stale snapshot propagation under very high change rate

Canonical design conclusion:

  • archetype composition:
    • Origin Projection + Edge Delivery Plane
    • Control Plane + Data Plane
  • primary truth:
    • ZoneRecord
    • DelegationState
    • AuthoritativeSnapshot
    • ResolverCacheEntry
  • hot paths:
    • authoritative answer from local zone snapshot
    • recursive answer from local resolver cache under TTL

Concrete Substrate #

  • authoritative zone truth in strongly consistent zone-management store
  • zone publication control plane in Go/Java
  • authoritative DNS fleet serving from local versioned zone snapshots
  • recursive resolver fleet serving from local cache with TTL/negative-cache semantics
  • snapshot propagation via watch streams / pull-on-version-miss / signed zone transfer

Operation Layer #

  1. PutZoneRecord(zone_id, name, type, value, ttl, expected_version?)
  • entry point: zone-management API
  • authoritative decider: zone store owner
  • transition: overwrite ZoneRecord
  1. PutDelegation(zone_id, ns_records, expected_version?)
  • entry point: zone-management API
  • authoritative decider: delegation store owner
  • transition: overwrite DelegationState
  1. internal zone publication
  • rebuild latest AuthoritativeSnapshot(zone_id, version)
  • publish to authoritative servers
  1. AnswerAuthoritative(qname, qtype)
  • entry point: authoritative DNS node
  • authoritative decider: local AuthoritativeSnapshot
  • transition: none
  • response: DNS answer/delegation/NXDOMAIN from local zone snapshot
  1. ResolveRecursive(qname, qtype)
  • entry point: recursive resolver
  • authoritative decider: local ResolverCacheEntry if valid, otherwise upstream resolution path
  • transition: optional cache refresh/update
  • response: cached or freshly resolved answer
  1. internal cache refresh
  • validate TTL/expiry
  • overwrite ResolverCacheEntry

Entry Point vs Decider vs Responder #

PathEntry pointAuthoritative deciderPhysical responderLogical responder
authoritative answerauthoritative DNS nodelocal authoritative snapshotauthoritative nodeDNS platform
recursive answerresolver nodelocal resolver cache / upstream resolution pathresolver nodeDNS platform
zone/delegation updatezone-management APIzone/delegation store ownercontrol-plane nodeDNS platform
zone snapshot publicationauth server / control planesnapshot publishercontrol/data-planeDNS platform
cache refreshresolver nodelocal refresh worker + upstream answerresolver nodeDNS platform

Concrete HLD #

Main components:

  • zone-management API
  • authoritative zone/delegation store
  • zone snapshot publication layer
  • authoritative DNS fleet
  • recursive resolver fleet
  • DNS status/analytics views

Short interview version #

“I’d design DNS as authoritative zone truth plus bounded-stale serving layers. Zone records and delegation state are authoritative, then published as versioned snapshots to the authoritative DNS fleet. Recursive resolvers serve from local cache entries under TTL and negative-caching rules, refreshing from upstream when entries expire or miss. The hot paths are fully local reads, while control plane handles versioned zone updates and snapshot publication.”