Unified Tagging System for Atlassian Products #

This note models a unified tagging system across Jira tickets, Confluence documents, and Bitbucket pull requests where users can add or remove tags on any supported content, click a tag to see associated items across products, and view a dashboard of the top K most popular tags.

Step 1 - Normalize #

Assume the baseline prompt is:

design a unified tagging system across multiple Atlassian products
users can add, remove, and update tags on content
clicking a tag shows all tagged items across products
users can see the top K most popular tags
tagging should work across Jira, Confluence, and Bitbucket

Normalize into state-affecting paths.

Requirement	Actor	Operation	State touched	Priority
User adds tag to content	User	state transition	`S1` `update target` `TagAssignment`	C1
User removes tag from content	User	state transition	`S1` `update target` `TagAssignment`	C1
User renames or updates tag metadata	User/Admin	overwrite state	`S1` `update target` `TagDefinition`	C1
User views items for a tag across products	User	read source	`S1` `read source target` `TagAssignment`	R1
User views top K popular tags	User	read projection	`S1` `read projection target` `PopularTagsView`	R1
System updates tag popularity counts	System	state transition	`S1` `update target` `TagStats`	C1
Product content metadata is resolved for display	System	read source	`S1` `read source target` `ContentReference`	R1
System routes tenant/tag shard to current owner	System	read source	`S1` `read source target` `PartitionMap`	C1
System reassigns shard ownership after node failure	System	state transition	`S1` `update target` `PartitionOwnership`	C1

Notes on normalization #

Important choices:

add/remove tag is modeled as TagAssignment lifecycle change
tag rename/update is current-value tag metadata
cross-product tag search is a source read over tag-to-content edges
top K is a derived popularity view, not primary truth
popularity maintenance is explicit because it is user-visible but derived

This system is a hybrid of:

cross-object relationship state
shared metadata/catalog state
derived ranking view

Step 2 - Critical Path Selection #

Requirement	Priority class	Why
Add tag to content	C1	primary product mutation
Remove tag from content	C1	primary product mutation
Update tag metadata	C1	current tag identity/display depends on it
View items for tag	R1	core serving path
View top K tags	R1	core user-visible read path, but derived
Update popularity counts	C1	derived but required for top K correctness
Resolve content references for display	R1	needed to render cross-product result list
Route to shard owner	C1	wrong routing can split tag truth
Reassign shard ownership	C1	failover must preserve tagging correctness

Baseline critical paths #

Main C1 paths:

P1 add tag
P2 remove tag
P3 update tag metadata
P4 update tag popularity stats
P5 route to shard owner
P6 reassign shard ownership

Main R1 paths:

P7 list items by tag
P8 read top K tags
P9 resolve content references

This design is driven by:

one authoritative tag-assignment truth per content-tag edge
shared tag identity across products
derived top-K popularity

Step 3 - Primary State Extraction #

For a unified tagging system, the minimal primary state is tag metadata, content-tag assignment state, content references, popularity stats, and routing/ownership state.

Candidate object label	Candidate source	Candidate needed for C1/R1?	Candidate decomposition action	Class	Primary?	Owner	Evolution	Scope kind	Scope value
TagDefinition	direct noun	Yes	keep as candidate	entity	Yes	service	overwrite	instance	tag_id
TagAssignment	direct noun	Yes	keep as candidate	relationship	Yes	service	state machine	relation	tag_id + content_ref
ContentReference	hidden read target	Yes	keep as candidate	entity	Yes	service	overwrite	instance	content_ref
TagStats	hidden write target	Yes	keep as candidate	entity	Yes	service	overwrite	instance	tag_id
PopularTagsView	derived read model	Yes	keep as candidate	projection	No	derived	overwrite	collection	tenant or workspace
PartitionOwnership	hidden write target	Yes	keep as candidate	process	Yes	service	state machine	instance	shard_id
PartitionMap	hidden write target	Yes	keep as candidate	entity	Yes	service	overwrite	collection	tenant/shard map

Important modeling choices #

`TagDefinition` #

Primary because:

shared tag identity and display metadata need one source of truth
fields may include:
- tag_id
- display_name
- normalized_name
- color
- created_by

`TagAssignment` #

This is the core cross-product relationship object.

Fields likely include:

tag_id
content_type
content_id
product
state
added_by
added_at

Lifecycle:

ACTIVE
REMOVED

`ContentReference` #

Primary because:

cross-product results need a normalized representation of content for listing
may be hydrated from product systems or mirrored into this system

`TagStats` #

Primary because:

top-K needs a stable count source, even if eventually reflected into a projection

Minimal strict primary set #

The strongest minimal set is:

TagDefinition
TagAssignment
ContentReference
TagStats
PartitionOwnership
PartitionMap

Step 4 - Hard Invariants #

For a unified tagging system, the hard invariants are about one authoritative assignment state per tag-content edge, correct shared tag metadata, and accurate popularity accounting derived from active assignments.

Path	Tier	Type	Invariant statement
`P1` add tag	HARD	uniqueness	Key `(tag_id, content_ref)` maps to at most one logical outcome `current authoritative assignment state` within assignment scope.
`P1` add tag	HARD	eligibility	Action `add_tag` is valid only if current content exists and current assignment state is addable at decision time.
`P2` remove tag	HARD	eligibility	Action `remove_tag` is valid only if current assignment state is active/removable at decision time.
`P3` update tag metadata	HARD	ordering	Tag-definition revisions are ordered by monotonic version within tag scope.
`P4` update popularity stats	HARD	accounting	`TagStats(tag_id).active_assignment_count` equals the number of currently active `TagAssignment`s for that tag scope.
`P7` list items by tag	HARD	freshness	Tag listing reflects authoritative active tag assignments within configured consistency bound.
`P8` top K tags	SOFT	freshness	`PopularTagsView` reflects current `TagStats` within configured projection lag.
`P9` resolve content references	HARD	freshness	Display metadata for a tagged item reflects current or approved cached `ContentReference` within configured consistency bound.
`P5` route to shard owner	HARD	uniqueness	Key `shard_id` maps to at most one logical outcome `current authoritative owner` within `shard_id`.
`P6` reassign shard ownership	HARD	eligibility	Action `reassign_shard` is valid only if `current owner is failed or relinquished and candidate owner is eligible and sufficiently current` on `shard_id` at decision time.

What matters most #

1. One authoritative assignment per tag-content edge #

This prevents duplicate add/remove confusion.

2. Popularity must count active assignments only #

Removed tags should not continue inflating top-K.

3. Top K is derived #

Exact tag-assignment truth matters more than instant ranking freshness.

4. Cross-product content identity must normalize cleanly #

The system needs a stable content_ref abstraction across Jira, Confluence, and Bitbucket.

Step 5 - Execution Context #

For the baseline unified tagging system:

Field	Value	Why
Topology	single service distributed	one logical tagging system serving multiple products
Write coordination scope	per object scope	correctness is per tag, assignment edge, content ref, and shard ownership scope
Read consistency target	bounded stale allowed	top-K and display hydration can tolerate small lag; core assignment reads should be fairly fresh
Holder model	none	no lease-like client ownership is central to tagging correctness
Compensation acceptable?	Mostly no	wrong tag assignment is user-visible and should not rely on later compensation

Derived implications #

holder_may_crash = false
- clients may fail, but no lock-style ownership is central here
cross_service_write = true-ish logically
- the tagging service may need to validate or hydrate content from product systems
- but authoritative tag truth should remain inside one logical tagging service
bounded_staleness_allowed = true
- projections like top-K and cached content display can be slightly stale
cross_service_atomicity_required = false
- no cross-product distributed transaction is required in the baseline
exclusive_claim_required = true
- shard ownership still needs one current owner
guarded_by_current_state = true
- add/remove transitions depend on current assignment state

What this implies #

This pushes us toward:

one authoritative owner per tenant/tag shard
current tag and assignment state inside the shared tagging service
popularity as derived maintained state
content hydration through normalized references

Step 6 - Deterministic Mechanism Selection #

Path	Write shape	Base mechanism	Required companions
`P1` add tag	guarded state transition	CAS on `(state, version)` or single writer per shard	assignment version
`P2` remove tag	guarded state transition	CAS on `(state, version)`	assignment version
`P3` update tag metadata	overwrite current value	CAS on version	tag version
`P4` update popularity stats	commutative merge or single writer recompute	counter update from assignment delta	idempotent delta application
`P7` list items by tag	read source	direct source read or indexed lookup	tag-to-content index
`P8` top K tags	materialized view update	incremental ranking projection	stats version
`P9` resolve content references	read source or cached snapshot	direct source read / cached hydration	content version
`P5` route to shard owner	exclusive claim	lease	fencing token, heartbeat
`P6` reassign shard ownership	guarded state transition	CAS on `(state, version)`	fencing token, shard catch-up check

Why these fit #

Add/remove tag #

These are current-state transitions on an assignment edge, so guarded transitions fit.

Tag metadata #

Current display identity is current-value state, so overwrite fits.

Popularity stats #

Popularity can be maintained either:

by idempotent increment/decrement deltas
or by periodic recompute from assignments

In practice:

TagStats is current-value state
fed by assignment deltas

Top K #

Top K is clearly a derived projection.

Canonical substrate implied #

The baseline now points to:

sharded tagging service
one owner per tag or tenant shard
current tag definitions and assignment edges
derived popularity stats and top-K view

Step 7 - Read Model / Source of Truth #

For a unified tagging system, truth is mostly direct source state for tags and assignments. Top K is derived.

Concept	Truth	Read path	Rebuild path
`C1` tag metadata	`TagDefinition`	read source directly	authoritative tag store
`C2` active tag-content relationships	`TagAssignment`	read source directly	authoritative assignment store
`C3` normalized content display record	`ContentReference`	read source directly or cached	authoritative content-reference store or product hydration
`C4` popularity count per tag	`TagStats`	read source directly	recompute from active assignments
`C5` top K tags	derived from `TagStats`	materialized view	rebuild from current tag stats
`C6` shard ownership	`PartitionOwnership`	read source directly	authoritative ownership store
`C7` shard routing map	`PartitionMap`	read source directly	authoritative routing metadata

Important point #

For the core semantics:

tag-to-content listing should read authoritative assignments
top-K can come from a derived projection
content display can come from a normalized cache or hydration layer

Step 8 - Failure Handling #

Path	Retry	Competing writers	Crash after commit	Publish failure	Stale holder
`P1` add tag	retry with assignment version or idempotency key	concurrent add on same edge collapses to one active assignment	committed add survives crash if persisted	popularity/top-K projection may lag	stale shard owner blocked by fencing token
`P2` remove tag	retry with assignment version	stale remove loses guarded transition	committed remove survives crash if persisted	popularity/top-K projection may lag	stale shard owner blocked by fencing token
`P3` update tag metadata	retry with tag version	stale update loses CAS	committed metadata survives crash if persisted	UI caches may lag	stale shard owner blocked by fencing token
`P4` update popularity stats	retry with idempotent delta or recompute	concurrent updates merge through idempotent counting or single shard owner	committed stats survive crash if persisted or can be rebuilt	top-K view may lag	n/a
`P7` list items by tag	read retry safe	many readers coexist	node crash drops query only	content hydration may partially fail	stale reads bounded by configured consistency
`P8` top K tags	read retry safe	many readers coexist	node crash drops query only	projection lag acceptable	stale top-K bounded by projection freshness
`P9` resolve content references	retry safe	many readers coexist	hydration cache miss can refetch	product lookup may fail transiently	cached content freshness bounded
`P5` route to shard owner	retry after refreshing shard map	only one valid owner should exist	if owner changed, refreshed map points to new owner	n/a	stale owner rejected by fencing token
`P6` reassign shard ownership	retry failover transition safely	only one reassignment wins current ownership state	promoted owner crash triggers later reassignment	n/a	old owner fenced and must not continue serving

What matters most #

1. Idempotent add/remove semantics #

Tagging UX often retries. The same add should not create duplicate active edges.

2. Popularity can be rebuilt #

TagStats and top-K are derived; assignment truth is primary.

3. Cross-product display hydration is secondary #

If content titles lag slightly, the tagging system can still be correct.

4. Rename semantics need product choice #

If tag rename changes the display for all products, TagDefinition should be global/shared within the tenant scope.

Step 9 - Scale Adjustments #

Hotspot	Type	First response
very hot tags with many assignments	contention/read hotspot	shard assignment index by tag and paginate tag result sets
top-K recomputation load	read/write hotspot	maintain incremental stats and heap/ranked projection
cross-product content hydration	read hotspot	cache `ContentReference` snapshots and hydrate asynchronously
noisy assignment churn	write hotspot	batch popularity delta updates and isolate hot tenants
large tag result sets	read hotspot	use cursor pagination and product-filtered secondary indexes
dashboard traffic	read hotspot	serve top-K from projection, not from full assignment scans

What scales well #

This system scales by:

sharding tag and assignment state by tenant/tag key
keeping tag assignments as compact edges
deriving top-K from maintained stats
caching normalized content references

What fails first #

Usually:

a few very hot tags
expensive cross-product hydration
recomputing popularity from scratch too often
large fanout result sets for global tags

Canonical design conclusion #

The mechanical outcome is:

primary state:
- TagDefinition
- TagAssignment
- ContentReference
- TagStats
- PartitionOwnership
- PartitionMap
critical invariants:
- one authoritative assignment state per tag-content edge
- current tag metadata by version
- popularity counts equal active assignments
- exclusive shard ownership for tag truth
mechanisms:
- guarded add/remove transitions
- overwrite current tag metadata
- derived stats and top-K projection
- fenced shard ownership
reads:
- authoritative tag listing from assignment index
- top-K from derived projection
- content display from normalized references or hydration cache

Polished interview answer #

I’d design the unified tagging system as a shared metadata service with one authoritative tag and assignment store across Jira, Confluence, and Bitbucket. The core truth is a TagAssignment edge between a normalized content_ref and a shared tag_id, plus shared TagDefinition metadata. Adding or removing a tag is a guarded transition on that edge, clicking a tag reads the tag-to-content index across all products, and top-K popular tags is a derived projection built from maintained TagStats rather than a live full scan. The main scaling levers are sharding by tenant and tag, caching normalized content references, and maintaining popularity incrementally rather than recomputing it from scratch.

Concrete Substrate #

I’ll choose a shared tagging service with authoritative tag/assignment storage plus derived popularity views as the concrete baseline, because it matches the mechanics we derived:

shared tag metadata
guarded tag-assignment lifecycle
derived stats and top-K
one owner per shard

Concrete tech family:

service in Go, Java, or Kotlin
authoritative state store:
- replicated relational DB or RocksDB-backed service state
metadata/control:
- internal shard routing or a small strongly consistent metadata layer
optional indexing/search layer for tag listing acceleration

Each shard owner stores:

TagDefinition
TagAssignment
TagStats
ContentReference cache or normalized reference table

Derived layer stores:

PopularTagsView

Operation Layer #

1. Add tag to content #

API

AddTag(content_ref, tag_input, actor, request_id?)

Initiator

user

Entry point

tagging API

Authoritative decider

shard owner for tenant/tag scope

Precondition

content exists or is valid
actor authorized to tag that content
current assignment edge is addable

Transition

create or resolve TagDefinition
set TagAssignment(tag_id, content_ref) -> ACTIVE
update TagStats

Response

{tag_id, assignment_state}

2. Remove tag from content #

API

RemoveTag(content_ref, tag_id, actor, expected_version?)

Initiator

user

Entry point

tagging API

Authoritative decider

shard owner for tenant/tag scope

Precondition

assignment edge currently active
actor authorized

Transition

set TagAssignment -> REMOVED
decrement or recompute TagStats

Response

{removed: true}

3. List items by tag #

API

ListItemsByTag(tag_id_or_name, filters, cursor, limit)

Initiator

user

Entry point

tagging query API

Authoritative decider

assignment index / shard owner

Precondition

tag exists

Transition

none

Response

paginated content refs plus display metadata

4. Get top K popular tags #

API

GetTopTags(k, filters?)

Initiator

user

Entry point

dashboard/query API

Authoritative decider

popularity projection

Precondition

none

Transition

none

Response

ranked tags with counts

5. Update tag metadata #

API

UpdateTag(tag_id, patch, expected_version?)

Initiator

user/admin

Entry point

tagging API

Authoritative decider

shard owner for tag scope

Precondition

tag exists
actor authorized

Transition

overwrite TagDefinition

Entry Point vs Decider vs Responder #

Path	Entry point	Authoritative decider	Physical responder	Logical responder
add/remove/update tag	tagging API	tag shard owner	API node	tagging service
list items by tag	tagging query API	assignment index / shard owner	query node	tagging service
top K tags	dashboard/query API	popularity projection	query node	tagging service
content hydration	tagging query API	content-ref cache or product adapter	query node	tagging service
shard failover	follower / coordination layer	shard quorum / lease store	new leader / control plane	tagging service

Concrete HLD #

Main components:

tagging write API
- handles add/remove/update operations
tag shard owners
- authoritative owners of tag definitions, assignments, and stats
tagging query API
- handles list-by-tag and top-K reads
content-reference normalization layer
- stores or hydrates display metadata for Jira/Confluence/Bitbucket objects
popularity projection
- maintains ranked top-K views
metadata/control service
- tracks shard ownership and routing

Short Interview Version #

I’d design the unified tagging system as a shared metadata service with one authoritative tag and assignment store across Jira, Confluence, and Bitbucket. The core truth is a TagAssignment edge between a normalized content_ref and a shared tag_id, plus shared TagDefinition metadata. Adding or removing a tag is a guarded transition on that edge, clicking a tag reads the tag-to-content index across all products, and top-K popular tags is a derived projection built from maintained TagStats rather than a live full scan. The main scaling levers are sharding by tenant and tag, caching normalized content references, and maintaining popularity incrementally rather than recomputing it from scratch.

Unified Tagging System for Atlassian Products #

Step 1 - Normalize #

Notes on normalization #

Step 2 - Critical Path Selection #

Baseline critical paths #

Step 3 - Primary State Extraction #

Important modeling choices #

TagDefinition #

TagAssignment #

ContentReference #

TagStats #

Minimal strict primary set #

Step 4 - Hard Invariants #

What matters most #

1. One authoritative assignment per tag-content edge #

2. Popularity must count active assignments only #

3. Top K is derived #

4. Cross-product content identity must normalize cleanly #

Step 5 - Execution Context #

Derived implications #

What this implies #

Step 6 - Deterministic Mechanism Selection #

Why these fit #

Add/remove tag #

Tag metadata #

Popularity stats #

Top K #

Canonical substrate implied #

Step 7 - Read Model / Source of Truth #

Important point #

Step 8 - Failure Handling #

What matters most #

1. Idempotent add/remove semantics #

2. Popularity can be rebuilt #

3. Cross-product display hydration is secondary #

4. Rename semantics need product choice #

Step 9 - Scale Adjustments #

What scales well #

What fails first #

Canonical design conclusion #

Polished interview answer #

Concrete Substrate #

Operation Layer #

1. Add tag to content #

2. Remove tag from content #

3. List items by tag #

4. Get top K popular tags #

5. Update tag metadata #

Entry Point vs Decider vs Responder #

Concrete HLD #

Short Interview Version #

`TagDefinition` #

`TagAssignment` #

`ContentReference` #

`TagStats` #