-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase OpenSearch mapping limit dynamically during indexing of csv/jsonl data #3257
Conversation
…jsonl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add either / or:
- unit tests
- e2e tests
otherwise looks good imho, I have fixed a small typo already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - let's make sure we monitor this for unforeseen cluster consequences.
…ketch into index_mapping_limit_handling
Added prometheus metric collection to the |
* Enforce upper limit * Add e2e test
Test looks good |
This change dynamically increases the OpenSearch mapping limit during the indexing process to ensure successful data ingestion.
Problem:
Timesketch, when indexing timelines with a large number of unique fields, can encounter OpenSearch's default mapping limit (typically 1000 fields). This results in indexing failures and data loss.
Solution:
This PR introduces a mechanism to:
index.mapping.total_fields.limit
setting in OpenSearch to the newly calculated limit if it exceeds the current limit.Configuration:
Two new configuration options are added to
timesketch.conf
:OPENSEARCH_MAPPING_BUFFER
: A float representing the percentage buffer to add to the calculated mapping limit (default: 0.2 = 20%).OPENSEARCH_MAPPING_UPPER_LIMIT
: An integer representing the maximum allowed mapping limit (default: 2000).Benefits:
Note:
Increasing the mapping limit can impact OpenSearch cluster performance and storage requirements. Users should carefully consider the
OPENSEARCH_MAPPING_UPPER_LIMIT
setting and monitor their cluster's resource usage.Alternatives considered