Search query limited to 10000 records #357

tanganellilore · 2024-03-07T17:09:23Z

Hi team,

i notice a Bug on search query, probably is connected to this issue and this old not migrated issue https://issues.sonatype.org/browse/NEXUS-16917.

I notice that if I perform a search query on a large repository, with more than 10000 elements but with pagination, i recevie a error from api like this:

RemoteTransportException[[159FCCBA-DE3F55B4-695C3AB7-3D759962-AA738D59][local[1]][indices:data/read/search[phase/query]]]; nested: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10050]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter.]; }

Log above is an example, but is very long and repated.

Some suggestion to solve it?

Thanks

P.S. repo have multiple folder with a lot of docker images, so i need to extract all of them.

The text was updated successfully, but these errors were encountered:

tanganellilore · 2024-03-11T11:50:09Z

probably releated to this:
https://github.com/sonatype/nexus-public/blame/26b9f7155c65c503129f6c6fdf2610d21b8e80be/components/nexus-repository-services/src/main/java/org/sonatype/nexus/repository/search/query/ElasticSearchQueryServiceImpl.java#L100

nblair · 2024-03-12T15:08:23Z

Thanks for opening an issue @tanganellilore. The limit applied to search responses is intentional - such large datasets don't scale well for a system with an embedded database and embedded search engine. Without that in place, it's a recipe for OOM, which can cause the application to fail unexpectedly and result in database/index corruption and/or data loss.

What is your use case for queries that have such large result sets?

elmbrain · 2024-03-26T08:31:59Z

We have the same problem. The repository contains many artifacts and a search is needed for them all. Users have the right to decide how to limit the output. Previously, the index.max_result_window parameter in the elasticsearch configuration file worked. And it was a revelation to us that it was broken. It’s unclear why I hardcoded the parameter directly in the code. Set the parameter at the configuration file level so that it can be changed.

tanganellilore · 2024-03-26T10:14:06Z

Thanks for opening an issue @tanganellilore. The limit applied to search responses is intentional - such large datasets don't scale well for a system with an embedded database and embedded search engine. Without that in place, it's a recipe for OOM, which can cause the application to fail unexpectedly and result in database/index corruption and/or data loss.

What is your use case for queries that have such large result sets?

Hi @nblair ,
I notice only now the answer, sorry for delay.
I need simply export all "metadata" of all assets in all repos and subfolder, like checksum, last download etc.... and save in a external DB, to track changes and delation for internal process, (without usage of audit webhook).

In my case i have a big repo with a lot of subfolder, all of them with like 30/100 assets. So this repo is bigger than 10k elements.
Via api, we can't get simply a list of subpath in the repo, to iterate over it (to reduce number of assets), so this is why i receive this error on call.

I understood that this limit is to avoid OOM, but with api we can't do it.

For my use case i used a groovy script that can be called and return this type of object per repository, but i notice also this that we have warning.

tanganellilore added the pending label Mar 7, 2024

tanganellilore assigned nblair Mar 7, 2024

nblair added enhancement question and removed pending enhancement labels Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search query limited to 10000 records #357

Search query limited to 10000 records #357

tanganellilore commented Mar 7, 2024

tanganellilore commented Mar 11, 2024

nblair commented Mar 12, 2024

elmbrain commented Mar 26, 2024

tanganellilore commented Mar 26, 2024

Search query limited to 10000 records #357

Search query limited to 10000 records #357

Comments

tanganellilore commented Mar 7, 2024

tanganellilore commented Mar 11, 2024

nblair commented Mar 12, 2024

elmbrain commented Mar 26, 2024

tanganellilore commented Mar 26, 2024