Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of context overflow in Local LLMs #33295

Merged
merged 13 commits into from
Feb 17, 2025

Conversation

glebashnik
Copy link
Contributor

@glebashnik glebashnik commented Feb 11, 2025

I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.

This is part of the work on generate indexing expression for document augmentation.
It improves local LLM component, which is a wrapper over llama.cpp for running LLMs locally:

  • Adds maxPromptToken config parameter to truncate larger prompts for better control over context size
  • Adds handling of context sizes that are larger than configured LLM context size, controlled by contextOverflowPolicy
  • Improved error handling when transitioning from async to sync completion method
  • Adds blocking wait with timeout for adding requests to executor queue, which prevents immediate error when the queue is full
  • Tests that verify content of LLM output, checking for hallucinations because of context overflow and other LLM runtime issues.

bjorncs
bjorncs previously approved these changes Feb 11, 2025

@Test
public void testContextOverflowPolicySkip() {
downloadFileIfMissing(SMALL_LLM_URL, SMALL_LLM_PATH);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using @BeforeClass for shared initialization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not shared by all test methods.

@glebashnik glebashnik removed the request for review from arnej27959 February 11, 2025 14:38
bjorncs
bjorncs previously approved these changes Feb 12, 2025
// Todo: handle prompt context size - such as give a warning when prompt exceeds context size
contextSize = config.contextSize();
}
logger.fine(() -> String.format("Loaded model %s in %.2f sec", modelFile, (loadTime*1.0/1000000000)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using Duration in the future for time calculation. This have earlier been a typical source for bugs (although not that important here as this is just for logging).

@glebashnik glebashnik merged commit 432e2f9 into master Feb 17, 2025
3 checks passed
@glebashnik glebashnik deleted the glebashnik/local-llm-context-size-management branch February 17, 2025 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants