Skip to main content

Quickwit Internals: A Substrate Decomposition

Quickwit Internals: A Substrate Decomposition #

Quickwit is a cloud-native search engine built for log and trace analytics. It indexes documents into immutable splits stored on object storage (S3, GCS, Azure Blob) and answers queries by scatter-gathering across nodes. Its architecture is unusual: there is no persistent local disk required for correctness, segments are never mutated after upload, and the cluster membership layer is a custom gossip protocol rather than ZooKeeper or etcd.

This series decomposes Quickwit into its constituent substrates, grounded in the source at quickwit/quickwit/.

Substrate Map #

SubstratePrimary Crate / ModuleRole
Actor Frameworkquickwit-actorsSupervised async actors: mailboxes, backpressure, health monitoring
Ingestquickwit-ingest / IngesterWrite path: WAL (mrecordlog), shard management, replication
Indexing Pipelinequickwit-indexing / actorsChain: SourceActor → DocProcessor → Indexer → Packager → Uploader → Publisher
Split Storagequickwit-storage / BundleStorageImmutable split bundles on object storage with hotcache
Search Rootquickwit-search / root.rsScatter-gather: job placement via Rendezvous hashing, merge collection
Search Leafquickwit-search / leaf.rsPer-split search: footer cache, Tantivy searcher, warmup
Cluster/Membershipquickwit-cluster / ChitchatGossip-based membership, failure detection, service discovery
Metastorequickwit-metastoreSplit lifecycle (Staged → Published → ScheduledForDelete), checkpoints

Chapter List #

  1. The Actor Framework Substratequickwit-actors: Actor and Handler traits, Mailbox priority channels, ActorContext, KillSwitch, supervision and health monitoring.
  2. Ingest SubstrateIngester, mrecordlog WAL, shard lifecycle, IngesterState, replication factor, persist request flow.
  3. Indexing Pipeline Substrate — the eight-actor chain, IndexingPipeline supervision loop, CommitTrigger, PublishLock, Sequencer ordering guarantee.
  4. Split Storage SubstrateSplitPayloadBuilder, BundleStorageFileOffsets, split bundle format, hotcache layout, upload semaphore.
  5. Search Root SubstrateSearchJob, SearchJobPlacer Rendezvous hashing, assign_jobs LPT algorithm, make_merge_collector, scatter-gather fan-out.
  6. Search Leaf Substrateopen_split_bundle, get_split_footer_from_cache_or_fetch, MemorySizedCache, HotDirectory, Tantivy warmup phase.
  7. Cluster/Membership Substrate — Chitchat gossip protocol, ChitchatConfig, FailureDetectorConfig Phi accrual, ClusterMember key-value node state, gRPC catchup.
  8. Metastore SubstrateMetastoreService trait, split lifecycle state machine, IndexCheckpointDelta, Janitor service, PostgreSQL and file backends.