Fix guided decoding crashes #811

kzawora-intel · 2025-02-10T17:22:39Z

This PR mostly ports vllm-project#11389 to the design introduced by #358 and makes the custom caching code a little bit more robust.
Currently there are two problems with guided decode:

mask[list(allowed_tokens)] = 0 is causing crashes due to allowed_tokens containing tensors. Pretty easy fix.
The value type of self._fsm_state was changed from int to union of int and outlines.state.CFGState, which may cause self._cached_get_mask_tensor(state_id, scores.size(-1), scores.device) to crash, as outlines.state.CFGState is not hashable. This PR changes the caching mechanism so that if function arguments are not hashable, their id is taken as key. This might cause some cache misses, but that's better than crashing, as it does right now.
None of the above is problem on upstream, as this stems from code introduced in HPU: offload logits processing to CPU #358.
I've also added guided decode tests to CI suite.

vllm/model_executor/guided_decoding/outlines_logits_processors.py

madamczykhabana · 2025-02-11T06:31:21Z

vllm/model_executor/guided_decoding/outlines_logits_processors.py

@@ -36,12 +38,40 @@
 def _cached(fn):
    cache: Dict[Any, Any] = {}

+    def is_hashable(obj):


Hmm, I'm thinking whether we should take a slightly different approach. Let me give you an example:
foo = [1, CFGState]
is_hashable(foo) = false => key(foo) = id(foo)
but if we change the logic so that instead of checking if everything is hashable we hash everything we can and use id where it's not possible? i.e.

semi_hash(obj) = case Iterable -> hash(semi_hash(sub) for sub in obj) case CFGState -> id(obj) case Hashable -> hash(obj)

This way we could hash foo like this:
semi_hash(foo) = hash(hash(int), id(CFGState))

I adopted this approach in the new hash_args function, but I decided not to make a special case for CFGState, as it's an iterable (namedtuple) containing some non-hashable stuff - so now it's something like

hash_args(obj) = case Iterable -> hash(tuple(hash_args(sub) for sub in obj)) case Hashable -> hash(obj) case _ -> hash(id(obj))

so with foo = [1, CFGState]
the hash would look like this:
hash_args(foo) = hash(hash(int), hash(tuple(id(PartialParserState), hash(tuple(int, int, ...)))

the hasher now has one drawback that is currently a non-issue - namely if CFGState object share the same PartialParserState and the state is modified with each call, the id (and thus the hash) will stay the same, but the contents aren't - fortunately for us, it seems like the second field of CFGState is going to give that away; if we encounter bugs here, it probably would be a better idea to use hashes of some random uuids rather than unhashable object id

madamczykhabana

LGTM

kzawora-intel added 3 commits February 10, 2025 18:01

Fix mask tensor creation & CFGState serialization

4299dc4

fix handling of hashable elements in cache key

30bdd3e

add tests

60a4d73

kzawora-intel requested review from madamczykhabana, michalkuligowski, mgawarkiewicz, vivekgoe and afierka-intel as code owners February 10, 2025 17:22