Handling of context overflow in Local LLMs #33295

glebashnik · 2025-02-11T12:58:37Z

I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.

This is part of the work on generate indexing expression for document augmentation.
It improves local LLM component, which is a wrapper over llama.cpp for running LLMs locally:

Adds maxPromptToken config parameter to truncate larger prompts for better control over context size
Adds handling of context sizes that are larger than configured LLM context size, controlled by contextOverflowPolicy
Improved error handling when transitioning from async to sync completion method
Adds blocking wait with timeout for adding requests to executor queue, which prevents immediate error when the queue is full
Tests that verify content of LLM output, checking for hallucinations because of context overflow and other LLM runtime issues.

…naming for config params

… LLMs. Added tests that verifies outputs of a small LLM.

model-integration/src/main/resources/configdefinitions/llm-local-client.def

bjorncs · 2025-02-11T13:46:31Z

model-integration/src/test/java/ai/vespa/llm/clients/LocalLLMTest.java

+
+    @Test
+    public void testContextOverflowPolicySkip() {
+        downloadFileIfMissing(SMALL_LLM_URL, SMALL_LLM_PATH);


Consider using @BeforeClass for shared initialization.

It is not shared by all test methods.

model-integration/src/main/java/ai/vespa/llm/clients/LocalLLM.java

model-integration/src/test/java/ai/vespa/llm/clients/LocalLLMTest.java

…attern and expected exceptions.

model-integration/src/main/java/ai/vespa/llm/clients/LocalLLM.java

…or name changes

bjorncs · 2025-02-17T14:12:10Z

model-integration/src/main/java/ai/vespa/llm/clients/LocalLLM.java

-        // Todo: handle prompt context size - such as give a warning when prompt exceeds context size
-        contextSize = config.contextSize();
-    }
+        logger.fine(() -> String.format("Loaded model %s in %.2f sec", modelFile, (loadTime*1.0/1000000000)));


Consider using Duration in the future for time calculation. This have earlier been a typical source for bugs (although not that important here as this is just for logging).

glebashnik added 5 commits February 6, 2025 16:18

First draft of locall llm context size management implemenation

7058b5b

Added Completion enum change

86b9e7e

Renamed config option for context size management

71dcb70

Moved context reservation inside executor, added some tests, refined …

11e4aca

…naming for config params

Adds config parameters and logic to handle context overflow for local…

da073a3

… LLMs. Added tests that verifies outputs of a small LLM.

glebashnik requested review from bjorncs and arnej27959 February 11, 2025 13:02

bjorncs previously approved these changes Feb 11, 2025

View reviewed changes

glebashnik removed the request for review from arnej27959 February 11, 2025 14:38

Better handling of local LLM exceptions. Updated tests with builder p…

5de4fde

…attern and expected exceptions.

glebashnik dismissed bjorncs’s stale review via 5de4fde February 12, 2025 14:33

bjorncs previously approved these changes Feb 12, 2025

View reviewed changes

model-integration/src/main/java/ai/vespa/llm/clients/LocalLLM.java Outdated Show resolved Hide resolved

Debug logging is now lambdas.

a9b2894

glebashnik dismissed bjorncs’s stale review via a9b2894 February 12, 2025 15:00

Fixed debug logging, using String format now.

243ad8b

bjorncs previously approved these changes Feb 12, 2025

View reviewed changes

Improved handling of timeouts for concurrent use of local llm.

456c3c2

glebashnik dismissed bjorncs’s stale review via 456c3c2 February 14, 2025 23:00

glebashnik added 4 commits February 15, 2025 23:34

Minor refactoring and improved comments in LocalLLM

39cd678

Replaced string equality with edit distance in output quality LLM tests.

582ae1d

Added a separate config param to local llm for enqueue wait time, min…

f631a63

…or name changes

Added abi specs for new local llm config parameter.

dc5199e

bjorncs approved these changes Feb 17, 2025

View reviewed changes

glebashnik merged commit 432e2f9 into master Feb 17, 2025
3 checks passed

glebashnik deleted the glebashnik/local-llm-context-size-management branch February 17, 2025 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling of context overflow in Local LLMs #33295

Handling of context overflow in Local LLMs #33295

glebashnik commented Feb 11, 2025 •

edited

Loading

bjorncs Feb 11, 2025

glebashnik Feb 12, 2025

bjorncs Feb 17, 2025

Handling of context overflow in Local LLMs #33295

Handling of context overflow in Local LLMs #33295

Conversation

glebashnik commented Feb 11, 2025 • edited Loading

bjorncs Feb 11, 2025

Choose a reason for hiding this comment

glebashnik Feb 12, 2025

Choose a reason for hiding this comment

bjorncs Feb 17, 2025

Choose a reason for hiding this comment

glebashnik commented Feb 11, 2025 •

edited

Loading