-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of context overflow in Local LLMs #33295
Handling of context overflow in Local LLMs #33295
Conversation
…naming for config params
… LLMs. Added tests that verifies outputs of a small LLM.
model-integration/src/main/resources/configdefinitions/llm-local-client.def
Outdated
Show resolved
Hide resolved
model-integration/src/main/resources/configdefinitions/llm-local-client.def
Outdated
Show resolved
Hide resolved
|
||
@Test | ||
public void testContextOverflowPolicySkip() { | ||
downloadFileIfMissing(SMALL_LLM_URL, SMALL_LLM_PATH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using @BeforeClass
for shared initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not shared by all test methods.
model-integration/src/main/java/ai/vespa/llm/clients/LocalLLM.java
Outdated
Show resolved
Hide resolved
…attern and expected exceptions.
model-integration/src/main/java/ai/vespa/llm/clients/LocalLLM.java
Outdated
Show resolved
Hide resolved
// Todo: handle prompt context size - such as give a warning when prompt exceeds context size | ||
contextSize = config.contextSize(); | ||
} | ||
logger.fine(() -> String.format("Loaded model %s in %.2f sec", modelFile, (loadTime*1.0/1000000000))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using Duration
in the future for time calculation. This have earlier been a typical source for bugs (although not that important here as this is just for logging).
I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.
This is part of the work on
generate
indexing expression for document augmentation.It improves local LLM component, which is a wrapper over llama.cpp for running LLMs locally:
maxPromptToken
config parameter to truncate larger prompts for better control over context sizecontextOverflowPolicy