Reasoning parser #3859

ShaoZhang0115 · 2025-02-25T18:27:06Z

Motivation

Rewrite #3202

Modifications

add --enable-reasoning and --reasoning-parser options for deepseek r1 series models.
return reasoning_content as in official api, ref: https://api-docs.deepseek.com/zh-cn/guides/reasoning_model, in both streaming and non-streaming chat completions.
Example:

python -m sglang.launch_server --host 0.0.0.0 \
--model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
--tp 1 --enable-reasoning --reasoning-parser deepseek-r1

curl --location --request POST 'http: //localhost:30000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--data '{
    "model": "default",
    "messages": [
        {
            "role": "user",
            "content": "Calculate 1 + 3"
        }
    ],
    "stream": false
}'

Get response:

{
    "id": "53de20f7f1244195826e7b52011c37a4",
    "object": "chat.completion",
    "created": 1740507802,
    "model": "default",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\n\n**Solution:**\n\nTo calculate \\(1 + 3\\), follow these easy steps:\n\n1. **Identify the numbers to add:**  \n   You have the number **1** and the number **3**.\n\n2. **Add the numbers together:**  \n   \\[\n   1 + 3 = 4\n   \\]\n\n3. **Final Answer:**  \n   \\[\n   \\boxed{4}\n   \\]",
                "reasoning_content": "To calculate the sum of 1 and 3, I will begin by identifying the two numbers involved in the addition. The first number is 1, and the second number is 3.\n\nNext, I will add these two numbers together. Adding 1 and 3 gives me a total of 4.\n\nTherefore, the result of 1 plus 3 is 4.\n",
                "tool_calls": null
            },
            "logprobs": null,
            "finish_reason": "stop",
            "matched_stop": 151643
        }
    ],
    "usage": {
        "prompt_tokens": 11,
        "total_tokens": 179,
        "completion_tokens": 168,
        "prompt_tokens_details": null
    }
}

Docs with be updated as soon as possible.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

…ions

shuaills · 2025-02-25T18:59:10Z

python/sglang/srt/reasoning_parser.py

+        self.think_start_token = "<think>"
+        self.think_end_token = "</think>"


Can we extend this to all reasoning models? Not just dpsk R1. There might be different thinking tokens.

I think different reasoning models need different parers, and I add docs for it.

…asoning-parser

xihuai18 · 2025-02-26T07:12:30Z

Add Docs
Test with streaming and non-streaming cases, with truncated or non-truncated max-tokens for reasoning.

xihuai18 · 2025-02-26T07:13:27Z

However, I can not pass my tests with --enable-torch-compile, which is confusing.

…g logic

…m:xihuai18/sglang into reasoning-parser

xihuai18 · 2025-02-27T05:12:11Z

However, I can not pass my tests with --enable-torch-compile, which is confusing.

possible related issue： #3730 (comment)

tot0 · 2025-02-27T19:24:19Z

python/sglang/srt/reasoning_parser.py

+        self.think_start_token = think_start_token
+        self.think_end_token = think_end_token
+        self.pattern = re.compile(
+            rf"{self.think_start_token}(.*?){self.think_end_token}", re.DOTALL


The most recent tokenizer hardcodes the opening <think> tag: https://huggingface.co/deepseek-ai/DeepSeek-R1/commit/8a58a132790c9935686eb97f042afa8013451c9f

This means the text coming back from inference won't include <think>, this is why I updated #3202 to assume the model is reasoning until </think> is seen, it also strips out <think> to handle the old chat template.

@tot0

The PR added the start token if it is missing:

# Add the start token to the beginning of the text. text = self.think_start_token + text

You can see it in detect_and_parse

tot0 · 2025-02-27T19:25:51Z

docs/backend/reasoning_parser.md

+```bash
+python -m sglang.launch_server --host 0.0.0.0 \
+--model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-14B \
+--enable-reasoning --reasoning-parser deepseek-r1


Appreciate the docs I was too lazy to add!

Would you consider also supporting the separate_reasoning contract? For my use case we want inference users to be able to control whether reasoning_content is separated, rather than set it as default behavior on sglang launch, which I understand some sglang users will want to do.

you mean add a separate_reasoning parameter in sending requests?

Separating reasoning and non-reasoning outputs is super useful, and would love for that to be a toggle rather than always on or always off.

Happy to merge the great changes from this PR in #3202 to try and get best of both worlds?
Or visa versa, @ShaoZhang0115?

Updated #3202 to combine functionality form this PR, and added some unittests.

maximegmd · 2025-02-27T23:11:03Z

How does that work with grammars? Does the grammar kick-in only after the reasoning parser?

tot0 · 2025-02-27T23:32:54Z

How does that work with grammars? Does the grammar kick-in only after the reasoning parser?

Have a similar question about this as well, though I don't think it's specific to this PR or #3202 , the reasoning parsers (and tool parsers) operate at the level of text coming out of the underlying engine to the API layer.
As far as I can tell (after taking a look at #3298) the grammar engine choice is passed down to the underlying engine via sampling_params, and enforcement is done there. This suggests that to not enforce grammar constraints until reasoning models are done reasoning would involve exposing the knowledge the ReasoningParser has about the "end reasoning" token (</think> for R1) to the underlying engine.

cc @JC1DA and @mmoskal

maximegmd · 2025-02-27T23:55:14Z

How does that work with grammars? Does the grammar kick-in only after the reasoning parser?

Have a similar question about this as well, though I don't think it's specific to this PR or #3202 , the reasoning parsers (and tool parsers) operate at the level of text coming out of the underlying engine to the API layer. As far as I can tell (after taking a look at #3298) the grammar engine choice is passed down to the underlying engine via sampling_params, and enforcement is done there. This suggests that to not enforce grammar constraints until reasoning models are done reasoning would involve exposing the knowledge the ReasoningParser has about the "end reasoning" token (</think> for R1) to the underlying engine.

Ideally we would be able to pass a grammar for reasoning and a grammar for content, but I believe the default grammar behavior should apply only to the content.

gaocegege · 2025-02-28T01:06:50Z

docs/backend/reasoning_parser.md

+for chunk in response:
+    if chunk.choices[0].delta.content:
+      content += chunk.choices[0].delta.content
+    elif chunk.choices[0].delta.reasoning_content:


Is this functioning correctly now? When I test the feature for the vllm, it triggers an error from the OpenAI Python client.

Please note that it is not compatible with the OpenAI Python client library. You can use the requests library to make streaming requests.

… unittests.

xihuai18 added 6 commits February 25, 2025 22:44

[Feature] Add reasoning parser support to chat generation and complet…

3c4b823

…ions

fix: add choices for reasoning-parser

1727ba4

Fix reasoning_parser.py

4759bed

fix: handle possible "<think>\n" in reasoning parser

31e4dd5

fix: update help text for reasoning parser to include supported models

d19b7a0

fix: update reasoning parser to handle changes in DeepSeek output format

addaeb5

ShaoZhang0115 requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners February 25, 2025 18:27

Merge branch 'main' into reasoning-parser

b11076b

shuaills suggested changes Feb 25, 2025

View reviewed changes

xihuai18 added 4 commits February 26, 2025 11:44

fix: refine reasoning parser output handling and clean up response logic

f362249

Merge branch 'main' into reasoning-parser

0b0b087

docs: add reasoning parser documentation for DeepSeek model support

2ed2b6d

Merge branch 'reasoning-parser' of github.com:xihuai18/sglang into re…

3c7f1a7

…asoning-parser

xihuai18 added 9 commits February 26, 2025 19:57

Merge branch 'main' into reasoning-parser

94f1db5

refactor: enhance DeepSeekR1ReasoningParser initialization and parsin…

6cedd5f

…g logic

Merge branches 'reasoning-parser' and 'reasoning-parser' of github.co…

30b938e

…m:xihuai18/sglang into reasoning-parser

refactor: remove NotImplementedError from BaseReasoningParser class

857297c

refactor: simplify BaseReasoningParser initialization and parsing logic

9b209ba

refactor: streamline buffer handling in BaseReasoningParser

7c61f89

refactor: improve text handling in v1_chat_generate_response function

d3aa86f

Merge branch 'sgl-project:main' into reasoning-parser

70a2098

Merge branch 'main' into reasoning-parser

fe38507

Merge branch 'sgl-project:main' into reasoning-parser

108862f

tot0 reviewed Feb 27, 2025

View reviewed changes

Merge branch 'sgl-project:main' into reasoning-parser

47dec05

gaocegege reviewed Feb 28, 2025

View reviewed changes

tot0 pushed a commit to tot0/sglang that referenced this pull request Feb 28, 2025

Merge in awesome docs from sgl-project#3859 by @ShaoZhang0115 and add…

e165bf7

… unittests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reasoning parser #3859

Reasoning parser #3859

ShaoZhang0115 commented Feb 25, 2025

shuaills Feb 25, 2025

xihuai18 Feb 26, 2025

xihuai18 commented Feb 26, 2025

xihuai18 commented Feb 26, 2025

xihuai18 commented Feb 27, 2025

tot0 Feb 27, 2025

gaocegege Feb 28, 2025

tot0 Feb 27, 2025

xihuai18 Feb 27, 2025

atbe Feb 27, 2025

tot0 Feb 27, 2025

tot0 Feb 28, 2025

maximegmd commented Feb 27, 2025

tot0 commented Feb 27, 2025

maximegmd commented Feb 27, 2025

gaocegege Feb 28, 2025

		self.think_start_token = "<think>"
		self.think_end_token = "</think>"

Reasoning parser #3859

Are you sure you want to change the base?

Reasoning parser #3859

Conversation

ShaoZhang0115 commented Feb 25, 2025

Motivation

Modifications

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xihuai18 commented Feb 26, 2025

xihuai18 commented Feb 26, 2025

xihuai18 commented Feb 27, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maximegmd commented Feb 27, 2025

tot0 commented Feb 27, 2025

maximegmd commented Feb 27, 2025

Choose a reason for hiding this comment