Skip to main content
  1. System Design Components/

Unified Tagging System for Atlassian Products

Unified Tagging System for Atlassian Products #

This note models a unified tagging system across Jira tickets, Confluence documents, and Bitbucket pull requests where users can add or remove tags on any supported content, click a tag to see associated items across products, and view a dashboard of the top K most popular tags.


Step 1 - Normalize #

Assume the baseline prompt is:

  • design a unified tagging system across multiple Atlassian products
  • users can add, remove, and update tags on content
  • clicking a tag shows all tagged items across products
  • users can see the top K most popular tags
  • tagging should work across Jira, Confluence, and Bitbucket

Normalize into state-affecting paths.

RequirementActorOperationState touchedPriority
User adds tag to contentUserstate transitionS1
update target
TagAssignment
C1
User removes tag from contentUserstate transitionS1
update target
TagAssignment
C1
User renames or updates tag metadataUser/Adminoverwrite stateS1
update target
TagDefinition
C1
User views items for a tag across productsUserread sourceS1
read source target
TagAssignment
R1
User views top K popular tagsUserread projectionS1
read projection target
PopularTagsView
R1
System updates tag popularity countsSystemstate transitionS1
update target
TagStats
C1
Product content metadata is resolved for displaySystemread sourceS1
read source target
ContentReference
R1
System routes tenant/tag shard to current ownerSystemread sourceS1
read source target
PartitionMap
C1
System reassigns shard ownership after node failureSystemstate transitionS1
update target
PartitionOwnership
C1

Notes on normalization #

Important choices:

  • add/remove tag is modeled as TagAssignment lifecycle change
  • tag rename/update is current-value tag metadata
  • cross-product tag search is a source read over tag-to-content edges
  • top K is a derived popularity view, not primary truth
  • popularity maintenance is explicit because it is user-visible but derived

This system is a hybrid of:

  • cross-object relationship state
  • shared metadata/catalog state
  • derived ranking view

Step 2 - Critical Path Selection #

RequirementPriority classWhy
Add tag to contentC1primary product mutation
Remove tag from contentC1primary product mutation
Update tag metadataC1current tag identity/display depends on it
View items for tagR1core serving path
View top K tagsR1core user-visible read path, but derived
Update popularity countsC1derived but required for top K correctness
Resolve content references for displayR1needed to render cross-product result list
Route to shard ownerC1wrong routing can split tag truth
Reassign shard ownershipC1failover must preserve tagging correctness

Baseline critical paths #

Main C1 paths:

  • P1 add tag
  • P2 remove tag
  • P3 update tag metadata
  • P4 update tag popularity stats
  • P5 route to shard owner
  • P6 reassign shard ownership

Main R1 paths:

  • P7 list items by tag
  • P8 read top K tags
  • P9 resolve content references

This design is driven by:

  • one authoritative tag-assignment truth per content-tag edge
  • shared tag identity across products
  • derived top-K popularity

Step 3 - Primary State Extraction #

For a unified tagging system, the minimal primary state is tag metadata, content-tag assignment state, content references, popularity stats, and routing/ownership state.

Candidate object labelCandidate sourceCandidate needed for C1/R1?Candidate decomposition actionClassPrimary?OwnerEvolutionScope kindScope value
TagDefinitiondirect nounYeskeep as candidateentityYesserviceoverwriteinstancetag_id
TagAssignmentdirect nounYeskeep as candidaterelationshipYesservicestate machinerelationtag_id + content_ref
ContentReferencehidden read targetYeskeep as candidateentityYesserviceoverwriteinstancecontent_ref
TagStatshidden write targetYeskeep as candidateentityYesserviceoverwriteinstancetag_id
PopularTagsViewderived read modelYeskeep as candidateprojectionNoderivedoverwritecollectiontenant or workspace
PartitionOwnershiphidden write targetYeskeep as candidateprocessYesservicestate machineinstanceshard_id
PartitionMaphidden write targetYeskeep as candidateentityYesserviceoverwritecollectiontenant/shard map

Important modeling choices #

TagDefinition #

Primary because:

  • shared tag identity and display metadata need one source of truth
  • fields may include:
    • tag_id
    • display_name
    • normalized_name
    • color
    • created_by

TagAssignment #

This is the core cross-product relationship object.

Fields likely include:

  • tag_id
  • content_type
  • content_id
  • product
  • state
  • added_by
  • added_at

Lifecycle:

  • ACTIVE
  • REMOVED

ContentReference #

Primary because:

  • cross-product results need a normalized representation of content for listing
  • may be hydrated from product systems or mirrored into this system

TagStats #

Primary because:

  • top-K needs a stable count source, even if eventually reflected into a projection

Minimal strict primary set #

The strongest minimal set is:

  • TagDefinition
  • TagAssignment
  • ContentReference
  • TagStats
  • PartitionOwnership
  • PartitionMap

Step 4 - Hard Invariants #

For a unified tagging system, the hard invariants are about one authoritative assignment state per tag-content edge, correct shared tag metadata, and accurate popularity accounting derived from active assignments.

PathTierTypeInvariant statement
P1 add tagHARDuniquenessKey (tag_id, content_ref) maps to at most one logical outcome current authoritative assignment state within assignment scope.
P1 add tagHARDeligibilityAction add_tag is valid only if current content exists and current assignment state is addable at decision time.
P2 remove tagHARDeligibilityAction remove_tag is valid only if current assignment state is active/removable at decision time.
P3 update tag metadataHARDorderingTag-definition revisions are ordered by monotonic version within tag scope.
P4 update popularity statsHARDaccountingTagStats(tag_id).active_assignment_count equals the number of currently active TagAssignments for that tag scope.
P7 list items by tagHARDfreshnessTag listing reflects authoritative active tag assignments within configured consistency bound.
P8 top K tagsSOFTfreshnessPopularTagsView reflects current TagStats within configured projection lag.
P9 resolve content referencesHARDfreshnessDisplay metadata for a tagged item reflects current or approved cached ContentReference within configured consistency bound.
P5 route to shard ownerHARDuniquenessKey shard_id maps to at most one logical outcome current authoritative owner within shard_id.
P6 reassign shard ownershipHARDeligibilityAction reassign_shard is valid only if current owner is failed or relinquished and candidate owner is eligible and sufficiently current on shard_id at decision time.

What matters most #

1. One authoritative assignment per tag-content edge #

This prevents duplicate add/remove confusion.

2. Popularity must count active assignments only #

Removed tags should not continue inflating top-K.

3. Top K is derived #

Exact tag-assignment truth matters more than instant ranking freshness.

4. Cross-product content identity must normalize cleanly #

The system needs a stable content_ref abstraction across Jira, Confluence, and Bitbucket.


Step 5 - Execution Context #

For the baseline unified tagging system:

FieldValueWhy
Topologysingle service distributedone logical tagging system serving multiple products
Write coordination scopeper object scopecorrectness is per tag, assignment edge, content ref, and shard ownership scope
Read consistency targetbounded stale allowedtop-K and display hydration can tolerate small lag; core assignment reads should be fairly fresh
Holder modelnoneno lease-like client ownership is central to tagging correctness
Compensation acceptable?Mostly nowrong tag assignment is user-visible and should not rely on later compensation

Derived implications #

  • holder_may_crash = false

    • clients may fail, but no lock-style ownership is central here
  • cross_service_write = true-ish logically

    • the tagging service may need to validate or hydrate content from product systems
    • but authoritative tag truth should remain inside one logical tagging service
  • bounded_staleness_allowed = true

    • projections like top-K and cached content display can be slightly stale
  • cross_service_atomicity_required = false

    • no cross-product distributed transaction is required in the baseline
  • exclusive_claim_required = true

    • shard ownership still needs one current owner
  • guarded_by_current_state = true

    • add/remove transitions depend on current assignment state

What this implies #

This pushes us toward:

  • one authoritative owner per tenant/tag shard
  • current tag and assignment state inside the shared tagging service
  • popularity as derived maintained state
  • content hydration through normalized references

Step 6 - Deterministic Mechanism Selection #

PathWrite shapeBase mechanismRequired companions
P1 add tagguarded state transitionCAS on (state, version) or single writer per shardassignment version
P2 remove tagguarded state transitionCAS on (state, version)assignment version
P3 update tag metadataoverwrite current valueCAS on versiontag version
P4 update popularity statscommutative merge or single writer recomputecounter update from assignment deltaidempotent delta application
P7 list items by tagread sourcedirect source read or indexed lookuptag-to-content index
P8 top K tagsmaterialized view updateincremental ranking projectionstats version
P9 resolve content referencesread source or cached snapshotdirect source read / cached hydrationcontent version
P5 route to shard ownerexclusive claimleasefencing token, heartbeat
P6 reassign shard ownershipguarded state transitionCAS on (state, version)fencing token, shard catch-up check

Why these fit #

Add/remove tag #

These are current-state transitions on an assignment edge, so guarded transitions fit.

Tag metadata #

Current display identity is current-value state, so overwrite fits.

Popularity stats #

Popularity can be maintained either:

  • by idempotent increment/decrement deltas
  • or by periodic recompute from assignments

In practice:

  • TagStats is current-value state
  • fed by assignment deltas

Top K #

Top K is clearly a derived projection.

Canonical substrate implied #

The baseline now points to:

  • sharded tagging service
  • one owner per tag or tenant shard
  • current tag definitions and assignment edges
  • derived popularity stats and top-K view

Step 7 - Read Model / Source of Truth #

For a unified tagging system, truth is mostly direct source state for tags and assignments. Top K is derived.

ConceptTruthRead pathRebuild path
C1 tag metadataTagDefinitionread source directlyauthoritative tag store
C2 active tag-content relationshipsTagAssignmentread source directlyauthoritative assignment store
C3 normalized content display recordContentReferenceread source directly or cachedauthoritative content-reference store or product hydration
C4 popularity count per tagTagStatsread source directlyrecompute from active assignments
C5 top K tagsderived from TagStatsmaterialized viewrebuild from current tag stats
C6 shard ownershipPartitionOwnershipread source directlyauthoritative ownership store
C7 shard routing mapPartitionMapread source directlyauthoritative routing metadata

Important point #

For the core semantics:

  • tag-to-content listing should read authoritative assignments
  • top-K can come from a derived projection
  • content display can come from a normalized cache or hydration layer

Step 8 - Failure Handling #

PathRetryCompeting writersCrash after commitPublish failureStale holder
P1 add tagretry with assignment version or idempotency keyconcurrent add on same edge collapses to one active assignmentcommitted add survives crash if persistedpopularity/top-K projection may lagstale shard owner blocked by fencing token
P2 remove tagretry with assignment versionstale remove loses guarded transitioncommitted remove survives crash if persistedpopularity/top-K projection may lagstale shard owner blocked by fencing token
P3 update tag metadataretry with tag versionstale update loses CAScommitted metadata survives crash if persistedUI caches may lagstale shard owner blocked by fencing token
P4 update popularity statsretry with idempotent delta or recomputeconcurrent updates merge through idempotent counting or single shard ownercommitted stats survive crash if persisted or can be rebuilttop-K view may lagn/a
P7 list items by tagread retry safemany readers coexistnode crash drops query onlycontent hydration may partially failstale reads bounded by configured consistency
P8 top K tagsread retry safemany readers coexistnode crash drops query onlyprojection lag acceptablestale top-K bounded by projection freshness
P9 resolve content referencesretry safemany readers coexisthydration cache miss can refetchproduct lookup may fail transientlycached content freshness bounded
P5 route to shard ownerretry after refreshing shard maponly one valid owner should existif owner changed, refreshed map points to new ownern/astale owner rejected by fencing token
P6 reassign shard ownershipretry failover transition safelyonly one reassignment wins current ownership statepromoted owner crash triggers later reassignmentn/aold owner fenced and must not continue serving

What matters most #

1. Idempotent add/remove semantics #

Tagging UX often retries. The same add should not create duplicate active edges.

2. Popularity can be rebuilt #

TagStats and top-K are derived; assignment truth is primary.

3. Cross-product display hydration is secondary #

If content titles lag slightly, the tagging system can still be correct.

4. Rename semantics need product choice #

If tag rename changes the display for all products, TagDefinition should be global/shared within the tenant scope.


Step 9 - Scale Adjustments #

HotspotTypeFirst response
very hot tags with many assignmentscontention/read hotspotshard assignment index by tag and paginate tag result sets
top-K recomputation loadread/write hotspotmaintain incremental stats and heap/ranked projection
cross-product content hydrationread hotspotcache ContentReference snapshots and hydrate asynchronously
noisy assignment churnwrite hotspotbatch popularity delta updates and isolate hot tenants
large tag result setsread hotspotuse cursor pagination and product-filtered secondary indexes
dashboard trafficread hotspotserve top-K from projection, not from full assignment scans

What scales well #

This system scales by:

  • sharding tag and assignment state by tenant/tag key
  • keeping tag assignments as compact edges
  • deriving top-K from maintained stats
  • caching normalized content references

What fails first #

Usually:

  • a few very hot tags
  • expensive cross-product hydration
  • recomputing popularity from scratch too often
  • large fanout result sets for global tags

Canonical design conclusion #

The mechanical outcome is:

  • primary state:
    • TagDefinition
    • TagAssignment
    • ContentReference
    • TagStats
    • PartitionOwnership
    • PartitionMap
  • critical invariants:
    • one authoritative assignment state per tag-content edge
    • current tag metadata by version
    • popularity counts equal active assignments
    • exclusive shard ownership for tag truth
  • mechanisms:
    • guarded add/remove transitions
    • overwrite current tag metadata
    • derived stats and top-K projection
    • fenced shard ownership
  • reads:
    • authoritative tag listing from assignment index
    • top-K from derived projection
    • content display from normalized references or hydration cache

Polished interview answer #

I’d design the unified tagging system as a shared metadata service with one authoritative tag and assignment store across Jira, Confluence, and Bitbucket. The core truth is a TagAssignment edge between a normalized content_ref and a shared tag_id, plus shared TagDefinition metadata. Adding or removing a tag is a guarded transition on that edge, clicking a tag reads the tag-to-content index across all products, and top-K popular tags is a derived projection built from maintained TagStats rather than a live full scan. The main scaling levers are sharding by tenant and tag, caching normalized content references, and maintaining popularity incrementally rather than recomputing it from scratch.


Concrete Substrate #

I’ll choose a shared tagging service with authoritative tag/assignment storage plus derived popularity views as the concrete baseline, because it matches the mechanics we derived:

  • shared tag metadata
  • guarded tag-assignment lifecycle
  • derived stats and top-K
  • one owner per shard

Concrete tech family:

  • service in Go, Java, or Kotlin
  • authoritative state store:
    • replicated relational DB or RocksDB-backed service state
  • metadata/control:
    • internal shard routing or a small strongly consistent metadata layer
  • optional indexing/search layer for tag listing acceleration

Each shard owner stores:

  • TagDefinition
  • TagAssignment
  • TagStats
  • ContentReference cache or normalized reference table

Derived layer stores:

  • PopularTagsView

Operation Layer #

1. Add tag to content #

API

  • AddTag(content_ref, tag_input, actor, request_id?)

Initiator

  • user

Entry point

  • tagging API

Authoritative decider

  • shard owner for tenant/tag scope

Precondition

  • content exists or is valid
  • actor authorized to tag that content
  • current assignment edge is addable

Transition

  • create or resolve TagDefinition
  • set TagAssignment(tag_id, content_ref) -> ACTIVE
  • update TagStats

Response

  • {tag_id, assignment_state}

2. Remove tag from content #

API

  • RemoveTag(content_ref, tag_id, actor, expected_version?)

Initiator

  • user

Entry point

  • tagging API

Authoritative decider

  • shard owner for tenant/tag scope

Precondition

  • assignment edge currently active
  • actor authorized

Transition

  • set TagAssignment -> REMOVED
  • decrement or recompute TagStats

Response

  • {removed: true}

3. List items by tag #

API

  • ListItemsByTag(tag_id_or_name, filters, cursor, limit)

Initiator

  • user

Entry point

  • tagging query API

Authoritative decider

  • assignment index / shard owner

Precondition

  • tag exists

Transition

  • none

Response

  • paginated content refs plus display metadata

API

  • GetTopTags(k, filters?)

Initiator

  • user

Entry point

  • dashboard/query API

Authoritative decider

  • popularity projection

Precondition

  • none

Transition

  • none

Response

  • ranked tags with counts

5. Update tag metadata #

API

  • UpdateTag(tag_id, patch, expected_version?)

Initiator

  • user/admin

Entry point

  • tagging API

Authoritative decider

  • shard owner for tag scope

Precondition

  • tag exists
  • actor authorized

Transition

  • overwrite TagDefinition

Entry Point vs Decider vs Responder #

PathEntry pointAuthoritative deciderPhysical responderLogical responder
add/remove/update tagtagging APItag shard ownerAPI nodetagging service
list items by tagtagging query APIassignment index / shard ownerquery nodetagging service
top K tagsdashboard/query APIpopularity projectionquery nodetagging service
content hydrationtagging query APIcontent-ref cache or product adapterquery nodetagging service
shard failoverfollower / coordination layershard quorum / lease storenew leader / control planetagging service

Concrete HLD #

Main components:

  • tagging write API
    • handles add/remove/update operations
  • tag shard owners
    • authoritative owners of tag definitions, assignments, and stats
  • tagging query API
    • handles list-by-tag and top-K reads
  • content-reference normalization layer
    • stores or hydrates display metadata for Jira/Confluence/Bitbucket objects
  • popularity projection
    • maintains ranked top-K views
  • metadata/control service
    • tracks shard ownership and routing

Short Interview Version #

I’d design the unified tagging system as a shared metadata service with one authoritative tag and assignment store across Jira, Confluence, and Bitbucket. The core truth is a TagAssignment edge between a normalized content_ref and a shared tag_id, plus shared TagDefinition metadata. Adding or removing a tag is a guarded transition on that edge, clicking a tag reads the tag-to-content index across all products, and top-K popular tags is a derived projection built from maintained TagStats rather than a live full scan. The main scaling levers are sharding by tenant and tag, caching normalized content references, and maintaining popularity incrementally rather than recomputing it from scratch.