Mapping as a Contract

Table of Contents

Mapping as a Contract #

A mapping in OpenSearch is not documentation — it is a contract between the application and the index. The mapping determines how a field is stored, whether it is analyzed, how it participates in scoring, and whether it can be sorted or aggregated. Violating the contract — sending an integer where a keyword was expected, searching for tokens in a field that was never analyzed — produces silent failures: empty results, unexpected sort orders, or incomplete aggregations.

The contract metaphor is deliberate. Once established, mappings are largely immutable. Changing them requires renegotiating the contract via a full reindex.

Field Types: What the Mapping Controls #

Every field in a document has a type. The type determines three things:

How the value is stored in the inverted index — a text field is tokenized and stored as individual terms; a keyword field is stored as-is; a date field is stored as a millisecond epoch.
What queries can operate on it — full-text queries (match, match_phrase) work on analyzed text fields; exact-match queries (term, terms) work on keyword and numeric fields; range queries work on numeric and date fields.
Whether it supports aggregations and sorting — analyzed text fields do not support sorting or aggregations by default (the tokenized terms have no stable single value). keyword, numeric, and date fields do.

The types you will use most frequently:

Type	Analyzed	Sortable/Aggregatable	Use case
`text`	Yes — via analyzer	No (unless fielddata enabled)	Full-text search
`keyword`	No — exact	Yes	IDs, status codes, tags, enum values
`integer`, `long`, `float`	No	Yes	Numeric values
`date`	No	Yes	Timestamps
`boolean`	No	Yes	Flags
`nested`	Per sub-field	Per sub-field	Arrays of objects with correlated fields
`object`	Per sub-field	Per sub-field	Embedded documents (flattened)

Text vs Keyword: The Most Common Mapping Mistake #

The text vs keyword distinction is the source of more operational confusion than any other mapping decision.

A text field goes through an analysis chain: the raw string is passed through a character filter (optional), a tokenizer (required), and one or more token filters. The result is a list of normalized tokens stored in the inverted index.

PUT /orders/_mapping
{
  "properties": {
    "description": { "type": "text" },
    "status":      { "type": "keyword" }
  }
}

A search for status: "PENDING" on a keyword field matches exactly. A search for status: "PENDING" on a text field would be analyzed first — converted to pending by the lowercase token filter — and the inverted index entry for PENDING would never be found.

Conversely, a keyword field does not support match queries (they work, but the query term goes through analysis while the indexed value does not — results are unpredictable for mixed-case input).

Multi-Fields: Indexing One Value Two Ways #

The resolution is multi-fields: index the same content under two field names with different types.

PUT /orders/_mapping
{
  "properties": {
    "status": {
      "type": "text",
      "fields": {
        "keyword": { "type": "keyword" }
      }
    }
  }
}

status — analyzed, supports match queries.
status.keyword — exact, supports term, sorting, and aggregations.

Multi-fields cost storage (the value is indexed twice) but eliminate the need to choose between full-text search and exact-match behavior.

Dynamic Mapping: Convenient but Hazardous #

By default, OpenSearch infers field types from the first document that contains each field. This is dynamic mapping.

PUT /orders/_doc/1
{ "order_id": "A123", "amount": 99.99, "status": "pending" }

OpenSearch infers: order_id: keyword, amount: float, status: text with a .keyword sub-field.

The hazards:

Type inference can be wrong. A field named user_id containing "12345" will be inferred as long if the value is numeric, or text/keyword if it is alphanumeric. The first document wins. If the second document sends user_id: "user_abc_12345", the indexing fails with a type conflict.

Mapping explosion. In high-cardinality schemas (event logs, JSON payloads with variable keys), dynamic mapping creates a new field entry for each unique key encountered. A mapping with tens of thousands of fields consumes significant memory in the cluster state and degrades all operations that scan mapping metadata.

Nested objects are flattened. OpenSearch does not preserve object nesting in the inverted index by default — it flattens address.city and address.zip into independent fields. Querying address.city == "Seattle" AND address.zip == "98101" on the object type matches documents where city appears anywhere in any address and zip appears anywhere in any address. For correlated nested field queries, use nested type.

Dynamic Mapping Modes #

Control dynamic mapping behavior per mapping or per field:

PUT /orders/_mapping
{
  "dynamic": "strict",
  "properties": {
    "order_id": { "type": "keyword" },
    "amount":   { "type": "float" }
  }
}

`dynamic` value	Behavior
`true` (default)	Auto-create fields for new keys
`false`	Ignore unknown fields — not indexed, not searchable, stored in `_source`
`strict`	Reject documents with unknown fields — indexing returns an error
`runtime`	Add unknown fields as runtime fields — computed on query, not indexed

strict is the correct choice for production schemas with known structure. It surfaces data-contract violations immediately rather than silently creating unmaintainable mappings.

Mapping Immutability: Why You Cannot Change a Field Type #

Once a field is mapped as keyword, it cannot be remapped as text in place. The inverted index structure for a keyword field is fundamentally different from that of a text field — changing the type would require rebuilding the entire inverted index for that field across all segments.

OpenSearch enforces this: attempting to change an existing field’s type returns an error.

illegal_argument_exception: mapper [status] of different type, current_mapper [KeywordFieldMapper],
new_mapper [TextFieldMapper]

What you can do:

Add new fields — mappings are append-only for new fields.
Add a new sub-field — add status.analyzed: text to an existing keyword field.
Reindex — create a new index with the corrected mapping and copy the data.

Zero-Downtime Reindex via Alias Swap #

The standard pattern for correcting a mapping error in production:

// 1. Create new index with corrected mapping
PUT /orders-v2
{
  "mappings": {
    "properties": {
      "status": { "type": "keyword" }
    }
  }
}

// 2. Reindex data from old index to new
POST /_reindex
{
  "source": { "index": "orders-v1" },
  "dest":   { "index": "orders-v2" }
}

// 3. Atomically swap the alias
POST /_aliases
{
  "actions": [
    { "remove": { "index": "orders-v1", "alias": "orders" } },
    { "add":    { "index": "orders-v2", "alias": "orders" } }
  ]
}

During reindex, applications continue writing to orders-v1 via the alias. The alias swap is atomic — no request sees a gap. Documents written to orders-v1 after the reindex started must be re-indexed separately (query orders-v1 for documents modified after reindex start, or use a timestamp field to identify them).

Index Templates: Enforcing Mapping at Index Creation #

For time-series indices (logs, metrics) where a new index is created on each rollover, index templates apply consistent mappings automatically:

PUT /_index_template/orders-template
{
  "index_patterns": ["orders-*"],
  "template": {
    "mappings": {
      "dynamic": "strict",
      "properties": {
        "order_id": { "type": "keyword" },
        "amount":   { "type": "float" },
        "status":   { "type": "keyword" },
        "created":  { "type": "date" }
      }
    },
    "settings": {
      "number_of_shards": 2,
      "number_of_replicas": 1,
      "refresh_interval": "5s"
    }
  }
}

Any index whose name matches orders-* at creation time inherits this mapping and settings. Component templates allow factoring shared mapping blocks (standard timestamp fields, common metadata) across multiple index templates.