Search Indices: From Protocol to Production
Table of Contents
A distributed search index is a set of protocol commitments — about where documents live, how writes propagate, how queries fan out and merge, and what “visible” means. This series builds the mental model from the ground up, treating OpenSearch as the reference implementation of a generic distributed search protocol.
Chapters #
- The Index as a Distributed Object — shards, replicas, routing, primary/replica protocol, cluster manager role
- The Write Path — translog, refresh, flush, primary-replica replication, sequence numbers, NRT visibility
- The Read Path — scatter-gather, distributed scoring, relevance aggregation across shards
- Mapping as a Contract — field types, dynamic mapping hazards, why mappings are immutable
- Query DSL as a Semantic Protocol — term vs full-text vs compound queries, scoring model, bool algebra
- Aggregations as Distributed Computation — bucket, metric, pipeline aggregations, approximate vs exact tradeoffs
- Index Lifecycle: Visibility, Retention, and Aliases — refresh semantics, segment merge visibility, aliases as indirection