Skip to content

Adding support to exclude semantic_text subfields #127664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/127664.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 127664
summary: Exclude `semantic_text` subfields from field capabilities API
area: "Mapping"
type: enhancement
issues: []
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,11 @@
import org.elasticsearch.core.Nullable;
import org.elasticsearch.index.IndexService;
import org.elasticsearch.index.engine.Engine;
import org.elasticsearch.index.mapper.KeywordFieldMapper;
import org.elasticsearch.index.mapper.MappedFieldType;
import org.elasticsearch.index.mapper.RuntimeField;
import org.elasticsearch.index.mapper.vectors.DenseVectorFieldMapper;
import org.elasticsearch.index.mapper.vectors.SparseVectorFieldMapper;
import org.elasticsearch.index.query.MatchAllQueryBuilder;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.SearchExecutionContext;
Expand Down Expand Up @@ -149,6 +152,18 @@ private FieldCapabilitiesIndexResponse doFetch(
return new FieldCapabilitiesIndexResponse(shardId.getIndexName(), indexMappingHash, responseMap, true, indexMode);
}

/**
* Returns true if the field should be excluded from the field capabilities response.
* This is used to exclude fields that are not useful for the user, such as
* offset_source and inference chunk embeddings.
*/
private static boolean shouldExcludeField(MappedFieldType ft) {
return ft.typeName().equals("offset_source")
|| ((ft instanceof SparseVectorFieldMapper.SparseVectorFieldType
|| ft instanceof DenseVectorFieldMapper.DenseVectorFieldType
|| ft instanceof KeywordFieldMapper.KeywordFieldType) && ft.name().contains(".inference.chunks"));
}
Comment on lines +155 to +165
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reiterating my message offline, this is a brittle solution. We shouldn't be hard-coding field names to exclude from field caps. Instead, I recommend investigating a solution where we add a flag to MappedFieldType to control if a field is excluded from field caps.


static Map<String, IndexFieldCapabilities> retrieveFieldCaps(
SearchExecutionContext context,
Predicate<String> fieldNameFilter,
Expand All @@ -173,7 +188,8 @@ static Map<String, IndexFieldCapabilities> retrieveFieldCaps(
MappedFieldType ft = entry.getValue();
if ((includeEmptyFields || ft.fieldHasValue(fieldInfos))
&& (fieldPredicate.test(ft.name()) || context.isMetadataField(ft.name()))
&& (filter == null || filter.test(ft))) {
&& (filter == null || filter.test(ft))
&& shouldExcludeField(ft) == false) {
IndexFieldCapabilities fieldCap = new IndexFieldCapabilities(
field,
ft.familyTypeName(),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -359,3 +359,23 @@ setup:
index: test-always-include-inference-id-index

- exists: test-always-include-inference-id-index.mappings.properties.semantic_field.inference_id

---
"Field caps exclude chunks and embedding fields":
- requires:
cluster_features: "gte_v8.16.0"
reason: field_caps support for semantic_text added in 8.16.0
Comment on lines +365 to +367
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to define a new cluster feature? As per my understanding, these fields are not expected from field_caps API so excluding these should not have an impact on the API level or discover. We have also covered backward compatibility through other yaml file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to create a test feature for these tests.


- do:
field_caps:
include_empty_fields: true
index: test-index
fields: "*"

- match: { indices: [ "test-index" ] }
- exists: fields.sparse_field
- exists: fields.dense_field
- not_exists: fields.sparse_field.chunks.embeddings
- not_exists: fields.sparse_field.chunks.offset
- not_exists: fields.dense_field.chunks.embeddings
- not_exists: fields.dense_field.chunks.offset
Original file line number Diff line number Diff line change
Expand Up @@ -307,3 +307,22 @@ setup:
another_field:
type: keyword

---
"Field caps exclude chunks embedding and text fields":
- requires:
cluster_features: "gte_v8.16.0"
reason: field_caps support for semantic_text added in 8.16.0

- do:
field_caps:
include_empty_fields: true
index: test-index
fields: "*"

- match: { indices: [ "test-index" ] }
- exists: fields.sparse_field
- exists: fields.dense_field
- not_exists: fields.sparse_field.inference.chunks.embeddings
- not_exists: fields.sparse_field.inference.chunks.text
- not_exists: fields.dense_field.inference.chunks.embeddings
- not_exists: fields.dense_field.inference.chunks.text
Loading