Skip to content

Conversation

@kosabogi
Copy link
Contributor

@kosabogi kosabogi commented Nov 25, 2025

⚠️ This is a WIP PR. I didn’t mark it as draft so it remains reviewable.

📷 Preview

Semantic text field type

Summary

This PR restructures and refines the semantic_text field type page.

Main changes

1. Content and wording refinements

  • Renamed sections to be more concise and shorter
  • Standardized terminology
  • Added relevant links
  • Simplified language where applicable
  • For how-to content, added stepper syntax where it was applicable (more than 1 step)

2. Content restructuring

Restructured the semantic_text documentation into three focused pages:

  • Main page (semantic-text.md): Converted to an overview page with an introduction explaining what semantic_text is and an overview section linking to the reference and how-to pages.

  • Reference page (semantic-text-reference.md): New dedicated reference page containing parameter descriptions, inference endpoint configurations, chunking behavior, update operations, querying options, and limitations.

  • How-to guides page (how-to-semantic-text.md): New dedicated how-to page containing procedure descriptions and examples for common tasks, including configuring inference endpoints, pre-chunking content, retrieving embeddings, highlighting fragments, and cross-cluster search.

Our main reasons for splitting the docs this way:

  • Separating reference from how-to content follows documentation best practices
  • It scales better as we add more guides without making the main page too long
  • Improves readability: people looking up parameters don't have to skip through procedures, and people following guides don't have to read through parameter details

Feedback and suggestions are welcome!

  • What do you think about this new structure? Does splitting the page by content type improve readability for users?
  • Do you have suggestions for moving any content to different pages?
  • Any comments or suggestions on naming, titles, or organization?

Related issue: elastic/docs-content#3836

@kosabogi kosabogi added >docs General docs changes Team:Docs Meta label for docs team labels Nov 25, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/core-docs (Team:Docs)

@github-actions
Copy link
Contributor

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Copy link
Contributor

@leemthompo leemthompo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really nice already, I left some comments and ideas, but particularly the how-to ideas might be best left until SMEs have had the time to digest the initial proposal of breaking these pages up, which will have to wait until next week.

I think the landing page feedback is probably actionable right now though if you want to rework that a bit :)

serverless: ga
---

# How-to guides for `semantic_text`
Copy link
Contributor

@leemthompo leemthompo Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level I think we can organize the how-to's into 3 clear categories:

  • Setup and configuration
  • Ingestion
  • Search and retrieval

Maybe they could be subpages, but please don't feel obliged to jump on this immediately and also of course feel free to push back if that sounds like overkill. Here's the overview of what that might look like, and note that it also implies moving a couple of things out of the reference section which might belong together more naturally under how-to. But this might require additional refactoring that isn't worth the ROI immediately. So just take this as food for thought :)

Expand to see what a potential how-to subpages restructuring would look like
# How-to guides for `semantic_text`

## Setup & configuration

### Configure inference endpoints
- Use default and preconfigured endpoints
- Use ELSER on EIS  
- Use a custom inference endpoint
- Use dedicated endpoints for ingestion and search
- **[MOVED FROM REF]** Inference endpoint validation

## Ingestion

### Index pre-chunked content
- Disable automatic chunking
- Index documents

### Use copy_to and multi-fields with semantic_text
- Use copy_to
- Use multi-fields

### **[MOVED FROM REF]** Update documents with semantic_text fields
- Full document updates (examples)
- Partial updates using Bulk API (examples)
- Partial updates using Update API (examples)
- Scripted updates

## Search & retrieval

### **[MOVED FROM REF]** Query semantic_text fields
- Using match queries
- Using kNN queries  
- Using sparse vector queries
- Using semantic queries (legacy)

### Retrieve indexed chunks

### Return semantic_text field embeddings
- Return semantic field embeddings in _source
- Return semantic field embeddings using fields

### Highlight the most relevant fragments
- Highlight semantic_text fields
- Enforce semantic highlighter
- Retrieve fragments in original order

### Perform cross-cluster search (CCS) for semantic_text

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this organization idea! I'll wait for SME feedback before implementing. If they agree, I'll reorganize the how-to content accordingly.

Copy link
Contributor

@leemthompo leemthompo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview page is looking good! I have a few language suggestions, and a few optional nits at this stage :)

1. In this example, there is no `inference_id` is specified, so `semantic_text` uses a [default inference endpoint](./semantic-text-how-tos.md#default-and-preconfigured-endpoints).

:::
For a complete walkthrough on how to perform semantic search with `semantic_text` fields, refer to the [Semantic search with semantic_text](docs-content://solutions/search/semantic-search/semantic-search-semantic-text.md) tutorial.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would make this a TIP for purely visual appeal

the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put)
to create the endpoint. If not specified, the {{infer}} endpoint defined by
`inference_id` will be used at both index and query time.
The `semantic_text` field type documentation is organized into reference content and how-to guides.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use subheadings here to demarcate the Reference and How-to lists in the overview, makes it easier to scan the page via the OTP too :)

: (Optional, object) Specifies the index options to override default values
for the field. Currently, `dense_vector` and `sparse_vector` index options are supported.
For text embeddings, `index_options` may match any allowed.
The [Reference](./semantic-text-reference.md) page provides parameter descriptions, inference endpoint configurations, chunking behavior, querying options, limitations, and other technical specifications, for instance:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lead-in sentence is potentially overly detailed, considering the list that follows repeats much of the same info. I'd just make it a shorter and more general lead-in.

documents are ingested into the index. When the first document is indexed, the
`inference_id` will be used to generate underlying indexing structures for the
field.
The [How-to guides](./semantic-text-how-tos.md) page contains procedure descriptions and examples for configuring inference endpoints, pre-chunking content, retrieving embeddings, highlighting fragments, cross-cluster search, and other common tasks, for instance:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about the lead-in sentence

::::{warning}
When updating an `inference_id` it is important to ensure the new {{infer}} endpoint produces embeddings compatible with those already indexed. This typically means using the same underlying model.
::::
1. In this example, there is no `inference_id` is specified, so `semantic_text` uses a [default inference endpoint](./semantic-text-how-tos.md#default-and-preconfigured-endpoints).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. In this example, there is no `inference_id` is specified, so `semantic_text` uses a [default inference endpoint](./semantic-text-how-tos.md#default-and-preconfigured-endpoints).
1. In this example, the `semantic_text` field uses a [default inference endpoint](./semantic-text-how-tos.md#default-and-preconfigured-endpoints) because the `inference_id` parameter isn't specified.

```{applies_to}
serverless: ga
```
The following example creates an index mapping with a `semantic_text` field:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following example creates an index mapping with a `semantic_text` field:
## Basic `semantic_text` mapping example
The following example creates an index mapping with a `semantic_text` field:

totally optional, very in the weeds here 😄, but maybe it would be nice to set the example off under its own subheading

1. In this example, there is no `inference_id` is specified, so `semantic_text` uses a [default inference endpoint](./semantic-text-how-tos.md#default-and-preconfigured-endpoints).

:::
For a complete walkthrough on how to perform semantic search with `semantic_text` fields, refer to the [Semantic search with semantic_text](docs-content://solutions/search/semantic-search/semantic-search-semantic-text.md) tutorial.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For a complete walkthrough on how to perform semantic search with `semantic_text` fields, refer to the [Semantic search with semantic_text](docs-content://solutions/search/semantic-search/semantic-search-semantic-text.md) tutorial.
For a complete example, refer to the [Semantic search with `semantic_text`](docs-content://solutions/search/semantic-search/semantic-search-semantic-text.md) tutorial.

keeps the sentence concise, because the main info is already embedded in the link text

content using an inference endpoint. Long passages
are [automatically chunked](#auto-text-chunking) to smaller sections to enable
the processing of larger corpuses of text.
The `semantic_text` field type simplifies [semantic search](docs-content://solutions/search/semantic-search.md) by providing defaults for the required infrastructure. Using `semantic_text`, you don't have to manually configure mappings, set up ingestion pipelines, or handle chunking. The field type automatically:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `semantic_text` field type simplifies [semantic search](docs-content://solutions/search/semantic-search.md) by providing defaults for the required infrastructure. Using `semantic_text`, you don't have to manually configure mappings, set up ingestion pipelines, or handle chunking. The field type automatically:
The `semantic_text` field type simplifies [semantic search](docs-content://solutions/search/semantic-search.md) by providing sensible defaults that automate most of the manual work typically required for vector search. Using `semantic_text`, you don't have to manually configure mappings, set up ingestion pipelines, or handle chunking. The field type automatically:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't perfect but I the original doesn't read exactly right to me, feel free to tweak of course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>docs General docs changes Team:Docs Meta label for docs team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants