Unified Tagging System for Atlassian Products
Unified Tagging System for Atlassian Products #
This note models a unified tagging system across Jira tickets, Confluence documents, and Bitbucket pull requests where users can add or remove tags on any supported content, click a tag to see associated items across products, and view a dashboard of the top K most popular tags.
Step 1 - Normalize #
Assume the baseline prompt is:
- design a unified tagging system across multiple Atlassian products
- users can add, remove, and update tags on content
- clicking a tag shows all tagged items across products
- users can see the top K most popular tags
- tagging should work across Jira, Confluence, and Bitbucket
Normalize into state-affecting paths.
| Requirement | Actor | Operation | State touched | Priority |
|---|---|---|---|---|
| User adds tag to content | User | state transition | S1update targetTagAssignment | C1 |
| User removes tag from content | User | state transition | S1update targetTagAssignment | C1 |
| User renames or updates tag metadata | User/Admin | overwrite state | S1update targetTagDefinition | C1 |
| User views items for a tag across products | User | read source | S1read source targetTagAssignment | R1 |
| User views top K popular tags | User | read projection | S1read projection targetPopularTagsView | R1 |
| System updates tag popularity counts | System | state transition | S1update targetTagStats | C1 |
| Product content metadata is resolved for display | System | read source | S1read source targetContentReference | R1 |
| System routes tenant/tag shard to current owner | System | read source | S1read source targetPartitionMap | C1 |
| System reassigns shard ownership after node failure | System | state transition | S1update targetPartitionOwnership | C1 |
Notes on normalization #
Important choices:
- add/remove tag is modeled as
TagAssignmentlifecycle change - tag rename/update is current-value tag metadata
- cross-product tag search is a source read over tag-to-content edges
- top K is a derived popularity view, not primary truth
- popularity maintenance is explicit because it is user-visible but derived
This system is a hybrid of:
cross-object relationship stateshared metadata/catalog statederived ranking view
Step 2 - Critical Path Selection #
| Requirement | Priority class | Why |
|---|---|---|
| Add tag to content | C1 | primary product mutation |
| Remove tag from content | C1 | primary product mutation |
| Update tag metadata | C1 | current tag identity/display depends on it |
| View items for tag | R1 | core serving path |
| View top K tags | R1 | core user-visible read path, but derived |
| Update popularity counts | C1 | derived but required for top K correctness |
| Resolve content references for display | R1 | needed to render cross-product result list |
| Route to shard owner | C1 | wrong routing can split tag truth |
| Reassign shard ownership | C1 | failover must preserve tagging correctness |
Baseline critical paths #
Main C1 paths:
P1add tagP2remove tagP3update tag metadataP4update tag popularity statsP5route to shard ownerP6reassign shard ownership
Main R1 paths:
P7list items by tagP8read top K tagsP9resolve content references
This design is driven by:
- one authoritative tag-assignment truth per content-tag edge
- shared tag identity across products
- derived top-K popularity
Step 3 - Primary State Extraction #
For a unified tagging system, the minimal primary state is tag metadata, content-tag assignment state, content references, popularity stats, and routing/ownership state.
| Candidate object label | Candidate source | Candidate needed for C1/R1? | Candidate decomposition action | Class | Primary? | Owner | Evolution | Scope kind | Scope value |
|---|---|---|---|---|---|---|---|---|---|
| TagDefinition | direct noun | Yes | keep as candidate | entity | Yes | service | overwrite | instance | tag_id |
| TagAssignment | direct noun | Yes | keep as candidate | relationship | Yes | service | state machine | relation | tag_id + content_ref |
| ContentReference | hidden read target | Yes | keep as candidate | entity | Yes | service | overwrite | instance | content_ref |
| TagStats | hidden write target | Yes | keep as candidate | entity | Yes | service | overwrite | instance | tag_id |
| PopularTagsView | derived read model | Yes | keep as candidate | projection | No | derived | overwrite | collection | tenant or workspace |
| PartitionOwnership | hidden write target | Yes | keep as candidate | process | Yes | service | state machine | instance | shard_id |
| PartitionMap | hidden write target | Yes | keep as candidate | entity | Yes | service | overwrite | collection | tenant/shard map |
Important modeling choices #
TagDefinition #
Primary because:
- shared tag identity and display metadata need one source of truth
- fields may include:
tag_iddisplay_namenormalized_namecolorcreated_by
TagAssignment #
This is the core cross-product relationship object.
Fields likely include:
tag_idcontent_typecontent_idproductstateadded_byadded_at
Lifecycle:
ACTIVEREMOVED
ContentReference #
Primary because:
- cross-product results need a normalized representation of content for listing
- may be hydrated from product systems or mirrored into this system
TagStats #
Primary because:
- top-K needs a stable count source, even if eventually reflected into a projection
Minimal strict primary set #
The strongest minimal set is:
TagDefinitionTagAssignmentContentReferenceTagStatsPartitionOwnershipPartitionMap
Step 4 - Hard Invariants #
For a unified tagging system, the hard invariants are about one authoritative assignment state per tag-content edge, correct shared tag metadata, and accurate popularity accounting derived from active assignments.
| Path | Tier | Type | Invariant statement |
|---|---|---|---|
P1 add tag | HARD | uniqueness | Key (tag_id, content_ref) maps to at most one logical outcome current authoritative assignment state within assignment scope. |
P1 add tag | HARD | eligibility | Action add_tag is valid only if current content exists and current assignment state is addable at decision time. |
P2 remove tag | HARD | eligibility | Action remove_tag is valid only if current assignment state is active/removable at decision time. |
P3 update tag metadata | HARD | ordering | Tag-definition revisions are ordered by monotonic version within tag scope. |
P4 update popularity stats | HARD | accounting | TagStats(tag_id).active_assignment_count equals the number of currently active TagAssignments for that tag scope. |
P7 list items by tag | HARD | freshness | Tag listing reflects authoritative active tag assignments within configured consistency bound. |
P8 top K tags | SOFT | freshness | PopularTagsView reflects current TagStats within configured projection lag. |
P9 resolve content references | HARD | freshness | Display metadata for a tagged item reflects current or approved cached ContentReference within configured consistency bound. |
P5 route to shard owner | HARD | uniqueness | Key shard_id maps to at most one logical outcome current authoritative owner within shard_id. |
P6 reassign shard ownership | HARD | eligibility | Action reassign_shard is valid only if current owner is failed or relinquished and candidate owner is eligible and sufficiently current on shard_id at decision time. |
What matters most #
1. One authoritative assignment per tag-content edge #
This prevents duplicate add/remove confusion.
2. Popularity must count active assignments only #
Removed tags should not continue inflating top-K.
3. Top K is derived #
Exact tag-assignment truth matters more than instant ranking freshness.
4. Cross-product content identity must normalize cleanly #
The system needs a stable content_ref abstraction across Jira, Confluence, and Bitbucket.
Step 5 - Execution Context #
For the baseline unified tagging system:
| Field | Value | Why |
|---|---|---|
| Topology | single service distributed | one logical tagging system serving multiple products |
| Write coordination scope | per object scope | correctness is per tag, assignment edge, content ref, and shard ownership scope |
| Read consistency target | bounded stale allowed | top-K and display hydration can tolerate small lag; core assignment reads should be fairly fresh |
| Holder model | none | no lease-like client ownership is central to tagging correctness |
| Compensation acceptable? | Mostly no | wrong tag assignment is user-visible and should not rely on later compensation |
Derived implications #
holder_may_crash = false- clients may fail, but no lock-style ownership is central here
cross_service_write = true-ish logically- the tagging service may need to validate or hydrate content from product systems
- but authoritative tag truth should remain inside one logical tagging service
bounded_staleness_allowed = true- projections like top-K and cached content display can be slightly stale
cross_service_atomicity_required = false- no cross-product distributed transaction is required in the baseline
exclusive_claim_required = true- shard ownership still needs one current owner
guarded_by_current_state = true- add/remove transitions depend on current assignment state
What this implies #
This pushes us toward:
- one authoritative owner per tenant/tag shard
- current tag and assignment state inside the shared tagging service
- popularity as derived maintained state
- content hydration through normalized references
Step 6 - Deterministic Mechanism Selection #
| Path | Write shape | Base mechanism | Required companions |
|---|---|---|---|
P1 add tag | guarded state transition | CAS on (state, version) or single writer per shard | assignment version |
P2 remove tag | guarded state transition | CAS on (state, version) | assignment version |
P3 update tag metadata | overwrite current value | CAS on version | tag version |
P4 update popularity stats | commutative merge or single writer recompute | counter update from assignment delta | idempotent delta application |
P7 list items by tag | read source | direct source read or indexed lookup | tag-to-content index |
P8 top K tags | materialized view update | incremental ranking projection | stats version |
P9 resolve content references | read source or cached snapshot | direct source read / cached hydration | content version |
P5 route to shard owner | exclusive claim | lease | fencing token, heartbeat |
P6 reassign shard ownership | guarded state transition | CAS on (state, version) | fencing token, shard catch-up check |
Why these fit #
Add/remove tag #
These are current-state transitions on an assignment edge, so guarded transitions fit.
Tag metadata #
Current display identity is current-value state, so overwrite fits.
Popularity stats #
Popularity can be maintained either:
- by idempotent increment/decrement deltas
- or by periodic recompute from assignments
In practice:
TagStatsis current-value state- fed by assignment deltas
Top K #
Top K is clearly a derived projection.
Canonical substrate implied #
The baseline now points to:
- sharded tagging service
- one owner per tag or tenant shard
- current tag definitions and assignment edges
- derived popularity stats and top-K view
Step 7 - Read Model / Source of Truth #
For a unified tagging system, truth is mostly direct source state for tags and assignments. Top K is derived.
| Concept | Truth | Read path | Rebuild path |
|---|---|---|---|
C1 tag metadata | TagDefinition | read source directly | authoritative tag store |
C2 active tag-content relationships | TagAssignment | read source directly | authoritative assignment store |
C3 normalized content display record | ContentReference | read source directly or cached | authoritative content-reference store or product hydration |
C4 popularity count per tag | TagStats | read source directly | recompute from active assignments |
C5 top K tags | derived from TagStats | materialized view | rebuild from current tag stats |
C6 shard ownership | PartitionOwnership | read source directly | authoritative ownership store |
C7 shard routing map | PartitionMap | read source directly | authoritative routing metadata |
Important point #
For the core semantics:
- tag-to-content listing should read authoritative assignments
- top-K can come from a derived projection
- content display can come from a normalized cache or hydration layer
Step 8 - Failure Handling #
| Path | Retry | Competing writers | Crash after commit | Publish failure | Stale holder |
|---|---|---|---|---|---|
P1 add tag | retry with assignment version or idempotency key | concurrent add on same edge collapses to one active assignment | committed add survives crash if persisted | popularity/top-K projection may lag | stale shard owner blocked by fencing token |
P2 remove tag | retry with assignment version | stale remove loses guarded transition | committed remove survives crash if persisted | popularity/top-K projection may lag | stale shard owner blocked by fencing token |
P3 update tag metadata | retry with tag version | stale update loses CAS | committed metadata survives crash if persisted | UI caches may lag | stale shard owner blocked by fencing token |
P4 update popularity stats | retry with idempotent delta or recompute | concurrent updates merge through idempotent counting or single shard owner | committed stats survive crash if persisted or can be rebuilt | top-K view may lag | n/a |
P7 list items by tag | read retry safe | many readers coexist | node crash drops query only | content hydration may partially fail | stale reads bounded by configured consistency |
P8 top K tags | read retry safe | many readers coexist | node crash drops query only | projection lag acceptable | stale top-K bounded by projection freshness |
P9 resolve content references | retry safe | many readers coexist | hydration cache miss can refetch | product lookup may fail transiently | cached content freshness bounded |
P5 route to shard owner | retry after refreshing shard map | only one valid owner should exist | if owner changed, refreshed map points to new owner | n/a | stale owner rejected by fencing token |
P6 reassign shard ownership | retry failover transition safely | only one reassignment wins current ownership state | promoted owner crash triggers later reassignment | n/a | old owner fenced and must not continue serving |
What matters most #
1. Idempotent add/remove semantics #
Tagging UX often retries. The same add should not create duplicate active edges.
2. Popularity can be rebuilt #
TagStats and top-K are derived; assignment truth is primary.
3. Cross-product display hydration is secondary #
If content titles lag slightly, the tagging system can still be correct.
4. Rename semantics need product choice #
If tag rename changes the display for all products, TagDefinition should be global/shared within the tenant scope.
Step 9 - Scale Adjustments #
| Hotspot | Type | First response |
|---|---|---|
| very hot tags with many assignments | contention/read hotspot | shard assignment index by tag and paginate tag result sets |
| top-K recomputation load | read/write hotspot | maintain incremental stats and heap/ranked projection |
| cross-product content hydration | read hotspot | cache ContentReference snapshots and hydrate asynchronously |
| noisy assignment churn | write hotspot | batch popularity delta updates and isolate hot tenants |
| large tag result sets | read hotspot | use cursor pagination and product-filtered secondary indexes |
| dashboard traffic | read hotspot | serve top-K from projection, not from full assignment scans |
What scales well #
This system scales by:
- sharding tag and assignment state by tenant/tag key
- keeping tag assignments as compact edges
- deriving top-K from maintained stats
- caching normalized content references
What fails first #
Usually:
- a few very hot tags
- expensive cross-product hydration
- recomputing popularity from scratch too often
- large fanout result sets for global tags
Canonical design conclusion #
The mechanical outcome is:
- primary state:
TagDefinitionTagAssignmentContentReferenceTagStatsPartitionOwnershipPartitionMap
- critical invariants:
- one authoritative assignment state per tag-content edge
- current tag metadata by version
- popularity counts equal active assignments
- exclusive shard ownership for tag truth
- mechanisms:
- guarded add/remove transitions
- overwrite current tag metadata
- derived stats and top-K projection
- fenced shard ownership
- reads:
- authoritative tag listing from assignment index
- top-K from derived projection
- content display from normalized references or hydration cache
Polished interview answer #
I’d design the unified tagging system as a shared metadata service with one authoritative tag and assignment store across Jira, Confluence, and Bitbucket. The core truth is a
TagAssignmentedge between a normalizedcontent_refand a sharedtag_id, plus sharedTagDefinitionmetadata. Adding or removing a tag is a guarded transition on that edge, clicking a tag reads the tag-to-content index across all products, and top-K popular tags is a derived projection built from maintainedTagStatsrather than a live full scan. The main scaling levers are sharding by tenant and tag, caching normalized content references, and maintaining popularity incrementally rather than recomputing it from scratch.
Concrete Substrate #
I’ll choose a shared tagging service with authoritative tag/assignment storage plus derived popularity views as the concrete baseline, because it matches the mechanics we derived:
- shared tag metadata
- guarded tag-assignment lifecycle
- derived stats and top-K
- one owner per shard
Concrete tech family:
- service in
Go,Java, orKotlin - authoritative state store:
- replicated relational DB or
RocksDB-backed service state
- replicated relational DB or
- metadata/control:
- internal shard routing or a small strongly consistent metadata layer
- optional indexing/search layer for tag listing acceleration
Each shard owner stores:
TagDefinitionTagAssignmentTagStatsContentReferencecache or normalized reference table
Derived layer stores:
PopularTagsView
Operation Layer #
1. Add tag to content #
API
AddTag(content_ref, tag_input, actor, request_id?)
Initiator
- user
Entry point
- tagging API
Authoritative decider
- shard owner for tenant/tag scope
Precondition
- content exists or is valid
- actor authorized to tag that content
- current assignment edge is addable
Transition
- create or resolve
TagDefinition - set
TagAssignment(tag_id, content_ref) -> ACTIVE - update
TagStats
Response
{tag_id, assignment_state}
2. Remove tag from content #
API
RemoveTag(content_ref, tag_id, actor, expected_version?)
Initiator
- user
Entry point
- tagging API
Authoritative decider
- shard owner for tenant/tag scope
Precondition
- assignment edge currently active
- actor authorized
Transition
- set
TagAssignment -> REMOVED - decrement or recompute
TagStats
Response
{removed: true}
3. List items by tag #
API
ListItemsByTag(tag_id_or_name, filters, cursor, limit)
Initiator
- user
Entry point
- tagging query API
Authoritative decider
- assignment index / shard owner
Precondition
- tag exists
Transition
- none
Response
- paginated content refs plus display metadata
4. Get top K popular tags #
API
GetTopTags(k, filters?)
Initiator
- user
Entry point
- dashboard/query API
Authoritative decider
- popularity projection
Precondition
- none
Transition
- none
Response
- ranked tags with counts
5. Update tag metadata #
API
UpdateTag(tag_id, patch, expected_version?)
Initiator
- user/admin
Entry point
- tagging API
Authoritative decider
- shard owner for tag scope
Precondition
- tag exists
- actor authorized
Transition
- overwrite
TagDefinition
Entry Point vs Decider vs Responder #
| Path | Entry point | Authoritative decider | Physical responder | Logical responder |
|---|---|---|---|---|
| add/remove/update tag | tagging API | tag shard owner | API node | tagging service |
| list items by tag | tagging query API | assignment index / shard owner | query node | tagging service |
| top K tags | dashboard/query API | popularity projection | query node | tagging service |
| content hydration | tagging query API | content-ref cache or product adapter | query node | tagging service |
| shard failover | follower / coordination layer | shard quorum / lease store | new leader / control plane | tagging service |
Concrete HLD #
Main components:
- tagging write API
- handles add/remove/update operations
- tag shard owners
- authoritative owners of tag definitions, assignments, and stats
- tagging query API
- handles list-by-tag and top-K reads
- content-reference normalization layer
- stores or hydrates display metadata for Jira/Confluence/Bitbucket objects
- popularity projection
- maintains ranked top-K views
- metadata/control service
- tracks shard ownership and routing
Short Interview Version #
I’d design the unified tagging system as a shared metadata service with one authoritative tag and assignment store across Jira, Confluence, and Bitbucket. The core truth is a
TagAssignmentedge between a normalizedcontent_refand a sharedtag_id, plus sharedTagDefinitionmetadata. Adding or removing a tag is a guarded transition on that edge, clicking a tag reads the tag-to-content index across all products, and top-K popular tags is a derived projection built from maintainedTagStatsrather than a live full scan. The main scaling levers are sharding by tenant and tag, caching normalized content references, and maintaining popularity incrementally rather than recomputing it from scratch.