-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reasoning parser #3859
base: main
Are you sure you want to change the base?
Reasoning parser #3859
Conversation
self.think_start_token = "<think>" | ||
self.think_end_token = "</think>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extend this to all reasoning models? Not just dpsk R1. There might be different thinking tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think different reasoning models need different parers, and I add docs for it.
|
However, I can not pass my tests with |
…m:xihuai18/sglang into reasoning-parser
possible related issue: #3730 (comment) |
self.think_start_token = think_start_token | ||
self.think_end_token = think_end_token | ||
self.pattern = re.compile( | ||
rf"{self.think_start_token}(.*?){self.think_end_token}", re.DOTALL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most recent tokenizer hardcodes the opening <think>
tag: https://huggingface.co/deepseek-ai/DeepSeek-R1/commit/8a58a132790c9935686eb97f042afa8013451c9f
This means the text coming back from inference won't include <think>
, this is why I updated #3202 to assume the model is reasoning until </think>
is seen, it also strips out <think>
to handle the old chat template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR added the start token if it is missing:
# Add the start token to the beginning of the text.
text = self.think_start_token + text
You can see it in detect_and_parse
```bash | ||
python -m sglang.launch_server --host 0.0.0.0 \ | ||
--model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-14B \ | ||
--enable-reasoning --reasoning-parser deepseek-r1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate the docs I was too lazy to add!
Would you consider also supporting the separate_reasoning
contract? For my use case we want inference users to be able to control whether reasoning_content
is separated, rather than set it as default behavior on sglang launch, which I understand some sglang users will want to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean add a separate_reasoning
parameter in sending requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separating reasoning and non-reasoning outputs is super useful, and would love for that to be a toggle rather than always on or always off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to merge the great changes from this PR in #3202 to try and get best of both worlds?
Or visa versa, @ShaoZhang0115?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated #3202 to combine functionality form this PR, and added some unittests.
How does that work with grammars? Does the grammar kick-in only after the reasoning parser? |
Have a similar question about this as well, though I don't think it's specific to this PR or #3202 , the reasoning parsers (and tool parsers) operate at the level of text coming out of the underlying engine to the API layer. |
Ideally we would be able to pass a grammar for reasoning and a grammar for content, but I believe the default grammar behavior should apply only to the content. |
for chunk in response: | ||
if chunk.choices[0].delta.content: | ||
content += chunk.choices[0].delta.content | ||
elif chunk.choices[0].delta.reasoning_content: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this functioning correctly now? When I test the feature for the vllm, it triggers an error from the OpenAI Python client.
Please note that it is not compatible with the OpenAI Python client library. You can use the requests library to make streaming requests.
Motivation
Rewrite #3202
Modifications
--enable-reasoning
and--reasoning-parser
options for deepseek r1 series models.reasoning_content
as in official api, ref: https://api-docs.deepseek.com/zh-cn/guides/reasoning_model, in both streaming and non-streaming chat completions.Example:
Get response:
Docs with be updated as soon as possible.
Checklist