Service Registry / Service Discovery
Service Registry / Service Discovery #
This note models a service registry / service discovery system where service instances register themselves, maintain heartbeats or leases, consumers resolve current healthy endpoints, and the system propagates membership changes safely at scale.
Step 1 - Normalize #
Assume the baseline prompt is:
- design a service registry / service discovery system
- service instances register and deregister themselves
- instances heartbeat or renew leases
- clients discover healthy endpoints for a service
- membership changes should propagate to clients quickly
- system scales across many services and instances
Normalize into state-affecting paths.
| Requirement | Actor | Operation | State touched | Priority |
|---|---|---|---|---|
| Service instance registers endpoint | Client | state transition | S1update targetServiceInstanceState | C1 |
| Service instance renews lease / heartbeat | Client | state transition | S1update targetServiceInstanceState | C1 |
| Service instance deregisters endpoint | Client | state transition | S1update targetServiceInstanceState | C1 |
| System expires stale instance | System | async process | S1hidden write targetServiceInstanceState | C1 |
| Client resolves service endpoints | Client | read source | S1read source targetServiceMembershipState | R1 |
| System updates effective membership / health view | System | state transition | S1update targetServiceMembershipState | C1 |
| Client registers watch on service | Client | append event | S1create targetDiscoveryWatchRegistration | R1 |
| System emits membership-change watch event | System | async process | S1hidden write targetDiscoveryWatchEvent | R1 |
| System routes service/shard to current owner | System | read source | S1read source targetPartitionMap | C1 |
| System reassigns shard ownership after node failure | System | state transition | S1update targetPartitionOwnership | C1 |
Notes on normalization #
Important choices:
- register/renew/deregister are lifecycle transitions
- stale expiry is explicit because instance crashes are central correctness cases
- client endpoint lookup is a read path over current membership truth
- effective membership state is distinct from raw individual instance lifecycle
- watch registration and watch events are separate from source truth
This system is fundamentally:
membership + lease + lookup
not:
- log replay
- queue delivery
Step 2 - Critical Path Selection #
| Requirement | Priority class | Why |
|---|---|---|
| Register endpoint | C1 | wrong membership truth breaks all downstream routing |
| Renew heartbeat / lease | C1 | stale renewals affect endpoint validity |
| Deregister endpoint | C1 | removal correctness affects traffic safety |
| Expire stale instance | C1 | crash recovery depends on safe expiry |
| Resolve endpoints | R1 | core serving path |
| Update effective membership view | C1 | consumers need correct healthy endpoint set |
| Register / deliver watch | R1 | important for propagation, but downstream of membership truth |
| Route to shard owner | C1 | wrong routing can split membership truth |
| Reassign shard ownership | C1 | failover must preserve membership correctness |
Baseline critical paths #
Main C1 paths:
P1register endpointP2renew leaseP3deregister endpointP4expire stale instanceP5update effective membershipP6route to shard ownerP7reassign shard ownership
Main R1 paths:
P8resolve endpointsP9watch registration and delivery
This design is driven by:
- one authoritative current lifecycle per instance
- current healthy endpoint set per service
- lease expiry on crash
- fast propagation to consumers
Step 3 - Primary State Extraction #
For a service registry, the minimal primary state is the individual instance lifecycle, effective membership view, client/session or lease validity, and routing/ownership state.
| Candidate object label | Candidate source | Candidate needed for C1/R1? | Candidate decomposition action | Class | Primary? | Owner | Evolution | Scope kind | Scope value |
|---|---|---|---|---|---|---|---|---|---|
| ServiceInstanceState | direct noun | Yes | keep as candidate | process | Yes | service | state machine | instance | service_id + instance_id |
| ServiceMembershipState | hidden write target | Yes | keep as candidate | entity | Yes | service | overwrite | instance | service_id |
| ClientSession | hidden write target | Yes | keep as candidate | process | Yes | service | state machine | instance | session_id |
| DiscoveryWatchRegistration | direct noun | Yes | keep as candidate | relationship | Yes | service | append-only | relation | client_id + service_id |
| DiscoveryWatchEvent | hidden write target | No | keep as candidate | event | No | derived | append-only | collection | service_id |
| PartitionOwnership | hidden write target | Yes | keep as candidate | process | Yes | service | state machine | instance | shard_id |
| PartitionMap | hidden write target | Yes | keep as candidate | entity | Yes | service | overwrite | collection | service shards |
| RegistryStatusView | derived read model | No | reject as UI artifact | projection | No | derived | overwrite | collection | tenant or cluster |
Important modeling choices #
ServiceInstanceState #
This is the central instance-lifecycle object.
Likely fields:
service_idinstance_idendpointzone/regionhealth/serving statussession_idexpirystate
States:
REGISTEREDHEALTHYUNHEALTHYDEREGISTEREDEXPIRED
ServiceMembershipState #
Primary because:
- consumers usually read the effective healthy endpoint set per service
- derived but still authoritative current view for lookup
ClientSession #
Primary because:
- registrations are often tied to a lease/session lifecycle
Minimal strict primary set #
The strongest minimal set is:
ServiceInstanceStateServiceMembershipStateClientSessionPartitionOwnershipPartitionMap
With:
DiscoveryWatchRegistrationas an optional explicit primary object
Step 4 - Hard Invariants #
For a service registry / discovery system, the hard invariants are about one authoritative lifecycle per instance, valid renew/deregister only by current lease holder, and correct current healthy membership per service.
| Path | Tier | Type | Invariant statement |
|---|---|---|---|
P1 register endpoint | HARD | uniqueness | Key (service_id, instance_id) maps to at most one logical outcome current authoritative instance lifecycle within instance scope. |
P1 register endpoint | HARD | eligibility | Action register_instance is valid only if current session is active and current instance state is registerable at decision time. |
P2 renew lease | HARD | eligibility | Action renew_instance is valid only if current ServiceInstanceState is owned by the same session and lease/epoch matches at decision time. |
P3 deregister endpoint | HARD | eligibility | Action deregister_instance is valid only if current ServiceInstanceState is owned by the same session and lease/epoch matches at decision time. |
P4 expire stale instance | HARD | eligibility | Action expire_instance is valid only if current instance is still registered, expiry has passed, and lease/epoch is unchanged at decision time. |
P5 update effective membership | HARD | accounting | ServiceMembershipState(service_id) equals the current authoritative set of eligible healthy instances for that service scope. |
P6 route to shard owner | HARD | uniqueness | Key shard_id maps to at most one logical outcome current authoritative owner within shard_id. |
P7 reassign shard ownership | HARD | eligibility | Action reassign_shard is valid only if current owner is failed or relinquished and candidate owner is eligible and sufficiently current on shard_id at decision time. |
P8 resolve endpoints | HARD | freshness | Lookup reflects authoritative membership and instance state within configured consistency bound. |
P9 watch delivery | SOFT | freshness | Watch stream reflects authoritative membership changes within propagation bound. |
What matters most #
1. One authoritative lifecycle per instance #
This prevents split or stale endpoint truth.
2. Membership view must correspond to valid healthy instances #
Consumers should not receive endpoints that are expired or deregistered.
3. Renew/deregister are fenced #
Only the current registering session/epoch may continue to mutate the instance record.
4. Watch delivery is secondary to membership truth #
Clients should treat watch streams as propagation help, not as sole truth.
Step 5 - Execution Context #
For the strict baseline service registry:
| Field | Value | Why |
|---|---|---|
| Topology | single service distributed | one logical discovery service spread across many nodes |
| Write coordination scope | per object scope | correctness is per instance, service membership, and shard ownership scope |
| Read consistency target | strong only | stale endpoint reads can route traffic to dead instances |
| Holder model | client | service instances temporarily hold registrations through sessions/leases |
| Compensation acceptable? | No | wrong endpoint membership can send production traffic to dead or unauthorized instances |
Derived implications #
holder_may_crash = true- service instances can crash while registered
cross_service_write = false- baseline keeps instance, membership, and ownership state in one logical service
bounded_staleness_allowed = false- correctness-critical resolution should use authoritative or tightly controlled fresh state
cross_service_atomicity_required = false- no multi-service transaction required in baseline
exclusive_claim_required = true- shard ownership and per-instance lease ownership must be exclusive
guarded_by_current_state = true- register, renew, deregister, and expiry all depend on current state
What this implies #
This pushes us toward:
- one authoritative writer per service shard
- lease-backed instance records
- effective membership derived from authoritative instance state
- watch propagation derived from committed membership changes
Step 6 - Deterministic Mechanism Selection #
| Path | Write shape | Base mechanism | Required companions |
|---|---|---|---|
P1 register endpoint | guarded state transition | CAS on (state, version) or single writer per shard | session/lease id, epoch |
P2 renew lease | guarded state transition | CAS on (state, version) | session/lease id, epoch |
P3 deregister endpoint | guarded state transition | CAS on (state, version) | session/lease id, epoch |
P4 expire stale instance | guarded state transition | leader-applied guarded transition | epoch, timeout scan |
P5 update effective membership | overwrite current value | single writer recompute | membership version |
P6 route to shard owner | exclusive claim | lease | fencing token, heartbeat |
P7 reassign shard ownership | guarded state transition | CAS on (state, version) | fencing token, shard catch-up check |
Why these fit #
Register/renew/deregister #
These all depend on current session/epoch ownership and current lifecycle state, so guarded transitions fit.
Effective membership #
The current endpoint set per service is a current-value view derived from instance states, so overwrite fits.
Routing #
One current owner per shard is required for correctness, so exclusive claim fits.
Canonical substrate implied #
The baseline now points to:
- sharded registry service
- one authoritative owner per service shard
- lease-backed instance records
- current membership view per service
- watch propagation from committed membership changes
Step 7 - Read Model / Source of Truth #
For a service registry, truth is direct source state for instance and membership data. Watches are derived.
| Concept | Truth | Read path | Rebuild path |
|---|---|---|---|
C1 current instance lifecycle | ServiceInstanceState | read source directly | authoritative instance-state store |
C2 current healthy endpoint set | ServiceMembershipState | read source directly | recompute from authoritative instance state |
C3 current session / lease validity | ClientSession | read source directly | authoritative session store |
C4 shard ownership | PartitionOwnership | read source directly | authoritative ownership store |
C5 shard routing map | PartitionMap | read source directly | authoritative routing metadata |
C6 watch stream | committed membership changes | materialized view | rebuild from authoritative instance/membership transitions |
C7 dashboards / status | derived from instance and membership state | materialized view | recompute from authoritative state |
Important point #
For the core semantics:
- resolution reads authoritative
ServiceMembershipState - membership recomputes from authoritative instance lifecycle state
- watches are derived propagation
Step 8 - Failure Handling #
| Path | Retry | Competing writers | Crash after commit | Publish failure | Stale holder |
|---|---|---|---|---|---|
P1 register endpoint | retry with same session/epoch safe | stale re-register loses guarded transition | committed registration survives crash if persisted | watch delivery may lag | stale instance/session blocked by epoch |
P2 renew lease | retry with current session/epoch | stale renew loses guarded transition | committed renewal survives crash if persisted | watch delivery may lag | old epoch rejected |
P3 deregister endpoint | retry with current session/epoch | stale deregister loses guarded transition | committed deregistration survives crash if persisted | watch delivery may lag | old epoch rejected |
P4 expire stale instance | timeout scan retry safe | only one expiry transition should win for current expired state | scanner crash delays cleanup; next scan retries | watch delivery may lag | prior holder blocked once epoch/version advanced |
P5 membership recompute | recompute retry safe from source inputs | single recompute/version wins | recompute reruns after crash | watch/update propagation may lag | n/a |
P6 route to shard owner | retry after refreshing shard map | only one valid owner should exist | if owner changed, refreshed map points to new owner | n/a | stale owner rejected by fencing token |
P7 reassign shard ownership | retry failover transition safely | only one reassignment wins current ownership state | promoted owner crash triggers later reassignment | n/a | old owner fenced and must not continue serving |
P8 resolve endpoints | read retry safe | many readers coexist | node crash drops query only | n/a | stale read should be disallowed or tightly bounded |
What matters most #
1. Lease/epoch fencing #
This prevents crashed or partitioned instances from continuing to mutate or appear healthy after losing authority.
2. Membership truth comes from instance truth #
The effective endpoint set must not outlive current instance validity.
3. Watch lag must not affect correctness #
Consumers should reconcile against current membership state when needed.
Step 9 - Scale Adjustments #
| Hotspot | Type | First response |
|---|---|---|
| hot services with many instances | contention hotspot | shard by service and isolate very large services |
| renewal traffic | write throughput hotspot | lengthen lease duration within acceptable failover bounds and batch renewals |
| membership watch fanout | fan-out hotspot | derive watch delivery from committed membership stream and decouple it from source truth |
| strong reads on popular services | read hotspot | cache only under strict freshness/versioning or colocate reads with authoritative owners |
| failover churn | contention hotspot | stabilize leadership and avoid aggressive reassignment |
| reconnect storms after outage | contention hotspot | stagger heartbeats and client watch/session restoration |
What scales well #
A registry scales for relatively small coordination data.
It scales by:
- sharding services and instances
- keeping instance records compact
- deriving current membership efficiently
- treating watch delivery as secondary
What fails first #
Usually:
- one or a few very large services
- heartbeat storms
- watch fanout spikes
- clients depending on watches alone instead of source truth
Canonical design conclusion #
The mechanical outcome is:
- primary state:
ServiceInstanceStateServiceMembershipStateClientSessionPartitionOwnershipPartitionMap
- critical invariants:
- one authoritative lifecycle per instance
- renew/deregister valid only for current session/epoch
- current membership equals healthy eligible instances
- exclusive shard ownership for membership truth
- mechanisms:
- guarded register/renew/deregister/expiry transitions
- overwrite current membership view
leasefor ownership/session validity- fenced shard ownership
- reads:
- direct authoritative reads for resolution and membership truth
- watches as derived notifications
Polished interview answer #
I’d design the service registry as a sharded strongly consistent membership service with one authoritative owner per service shard. Each instance registers an endpoint under a lease-backed
ServiceInstanceStaterecord, renews that lease with heartbeats, and is removed either explicitly or by expiry if it crashes. The registry maintains an authoritative currentServiceMembershipStateper service, which is the set of healthy eligible endpoints derived from instance records. Clients resolve endpoints from that membership view and can subscribe to watch streams for change propagation, but correctness comes from authoritative membership state, not the watch stream. The main scaling levers are more shards, longer but bounded leases, efficient membership recompute, and decoupled watch fanout.
Concrete Substrate #
I’ll choose a sharded strongly consistent registry service with lease-backed instance records and derived membership views as the concrete baseline, because it matches the mechanics we derived:
- guarded instance lifecycle transitions
- lease-backed validity
- current membership view
- one owner per shard
Concrete tech family:
- registry service in
GoorJava - authoritative state in a replicated metadata store or service-owned Raft state machine
- metadata/control:
- built-in Raft consensus per shard or a small etcd-like control layer
Each shard leader stores:
ServiceInstanceState(service_id, instance_id)ServiceMembershipState(service_id)ClientSession(session_id)- watch registrations
- expiry index
This is effectively the same substrate family as Consul/etcd-backed service discovery, with the product surface centered on membership lookup.
Operation Layer #
1. Register instance #
API
RegisterInstance(service_id, instance_id, endpoint, metadata, session_id, ttl)
Initiator
- service instance / sidecar
Entry point
- gateway or any registry node
Authoritative decider
- current shard leader for
service_id
Precondition
- session active
- instance state registerable
Transition
- create or update
ServiceInstanceState - recompute
ServiceMembershipState(service_id)
Response
{registered: true, expiry}
2. Renew lease #
API
Heartbeat(service_id, instance_id, session_id, epoch, ttl)
Initiator
- service instance / sidecar
Entry point
- gateway or any node
Authoritative decider
- shard leader
Precondition
- current instance state owned by
session_id - epoch matches current state
Transition
- extend expiry
Response
{renewed: true, expiry}
3. Deregister instance #
API
DeregisterInstance(service_id, instance_id, session_id, epoch)
Initiator
- service instance / sidecar
Entry point
- gateway or any node
Authoritative decider
- shard leader
Precondition
- current instance state owned by
session_id - epoch matches current state
Transition
REGISTERED/HEALTHY -> DEREGISTERED- recompute
ServiceMembershipState(service_id)
Response
{deregistered: true}
4. Resolve endpoints #
API
Resolve(service_id)
Initiator
- client / downstream service
Entry point
- gateway, resolver, or any node
Authoritative decider
- shard leader or tightly controlled fresh read path
Precondition
- none
Transition
- none
Response
- current endpoint set
5. Expire stale instance #
API
- internal background process
Initiator
- system
Entry point
- shard leader
Authoritative decider
- shard leader
Precondition
- current time > expiry
- instance state and epoch unchanged
Transition
- mark instance expired
- recompute
ServiceMembershipState(service_id)
Entry Point vs Decider vs Responder #
| Path | Entry point | Authoritative decider | Physical responder | Logical responder |
|---|---|---|---|---|
RegisterInstance | gateway / any node | shard leader | leader or front node | service registry |
Heartbeat | gateway / any node | shard leader | leader or front node | service registry |
DeregisterInstance | gateway / any node | shard leader | leader or front node | service registry |
Resolve | resolver / any node | shard leader or strong read path | resolver node | service registry |
| expiry | shard leader | shard leader | internal | service registry |
| watch | watch endpoint | committed membership stream / shard leader | watch-serving node | service registry |
| shard failover | follower / coordination layer | shard quorum / lease store | new leader / control plane | service registry |
Concrete HLD #
Main components:
- client / instance gateway
- routes register, heartbeat, and resolve operations
- shard leaders
- authoritative owners of instance, session, and membership state
- maintain expiry index
- shard followers
- replicate committed state
- watch service
- emits membership-change notifications from committed state transitions
- metadata/control service
- tracks shard ownership and routing
Short Interview Version #
I’d design the service registry as a sharded strongly consistent membership service with one authoritative owner per service shard. Each instance registers an endpoint under a lease-backed
ServiceInstanceStaterecord, renews that lease with heartbeats, and is removed either explicitly or by expiry if it crashes. The registry maintains an authoritative currentServiceMembershipStateper service, which is the set of healthy eligible endpoints derived from instance records. Clients resolve endpoints from that membership view and can subscribe to watch streams for change propagation, but correctness comes from authoritative membership state, not the watch stream. The main scaling levers are more shards, longer but bounded leases, efficient membership recompute, and decoupled watch fanout.