-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reasoning parser #3859
base: main
Are you sure you want to change the base?
Reasoning parser #3859
Changes from 21 commits
3c4b823
1727ba4
4759bed
31e4dd5
d19b7a0
addaeb5
b11076b
f362249
0b0b087
2ed2b6d
3c7f1a7
94f1db5
6cedd5f
30b938e
857297c
9b209ba
7c61f89
d3aa86f
70a2098
fe38507
108862f
47dec05
9741e17
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
# Reasoning Parser | ||
|
||
SGLang support parsing the reasoning content from reasoning models like [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) for convenient output processing in the downstream applications. | ||
|
||
Following Official [DeepSeek API design](https://api-docs.deepseek.com/guides/reasoning_model), SGLang offering reasoning content and final conclusions: | ||
|
||
- `reasoning_content`: The content of the CoT. | ||
- `content`: The content of the final answer. | ||
|
||
## Supported Models | ||
|
||
Currently, SGLang supports the following reasoning models: | ||
- [DeepSeek R1 series](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d): The reasoning content is wrapped with `<think>` and `</think>` tags. | ||
|
||
## Usage | ||
|
||
You need to enable the reasoning parser in the SGLang API server by setting the `--enable-reasoning` and `--reasoning-parser` options. The `--reasoning-parser` option specifies the reasoning parser to extract the reasoning content and final answer. | ||
|
||
```bash | ||
python -m sglang.launch_server --host 0.0.0.0 \ | ||
--model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-14B \ | ||
--enable-reasoning --reasoning-parser deepseek-r1 | ||
``` | ||
|
||
### Non-streaming Request | ||
|
||
Make a request to the reasoning model, get the reasoning content and final answer. | ||
|
||
Using OpenAI python api: | ||
```python | ||
import openai | ||
|
||
client = openai.Client(base_url="http://localhost:30000/v1", api_key="None") | ||
|
||
response = client.chat.completions.create( | ||
model="deepseek-r1:14b", | ||
messages=[{"role": "user", "content": "Compute 1+3"}], | ||
max_tokens=1024, | ||
stream=False | ||
) | ||
|
||
response.choices[0].message.reasoning_content | ||
# 'First, I recognize that the problem requires adding the numbers 1 and 3.\n\nNext, I identify the numbers to be added, which are 1 and 3.\n\nThen, I perform the addition operation: 1 plus 3 equals 4.\n\nFinally, I conclude that the sum of 1 and 3 is 4.\n' | ||
response.choices[0].message.content | ||
# \n\nTo compute \\(1 + 3\\), follow these simple steps:\n\n1. **Identify the numbers to add:** \n The numbers are **1** and **3**.\n\n2. **Add the numbers together:** \n \\[\n 1 + 3 = 4\n \\]\n\n3. **Write the final answer:** \n The sum of \\(1 + 3\\) is \\(\\boxed{4}\\).' | ||
``` | ||
|
||
### Streaming Request | ||
|
||
`reasoning_content` is available in the `delta` field of the streaming response. | ||
|
||
Using OpenAI python api: | ||
|
||
```python | ||
# ... Initialize the client as before ... | ||
|
||
response = client.chat.completions.create( | ||
model="deepseek-r1:14b", | ||
messages=[{"role": "user", "content": "Compute 1+3"}], | ||
max_tokens=1024, | ||
stream=True | ||
) | ||
reasoning_content = "" | ||
content = "" | ||
for chunk in response: | ||
if chunk.choices[0].delta.content: | ||
content += chunk.choices[0].delta.content | ||
elif chunk.choices[0].delta.reasoning_content: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this functioning correctly now? When I test the feature for the vllm, it triggers an error from the OpenAI Python client.
|
||
reasoning_content += chunk.choices[0].delta.reasoning_content | ||
|
||
reasoning_content | ||
# 'I need to calculate the sum of 1 and 3. \n\nFirst, I identify the numbers involved in the addition: 1 and 3.\n\nNext, I add these two numbers together to find the total.\n\nFinally, the result of the addition is 4.\n' | ||
content | ||
# '\n\n**Solution:**\n\nWe need to compute the sum of 1 and 3.\n\n1. **Identify the numbers to add:**\n - Number 1\n - Number 3\n\n2. **Add the numbers together:**\n \\[\n 1 + 3 = 4\n \\]\n\n3. **Final Answer:**\n \\[\n \\boxed{4}\n \\]' | ||
``` | ||
|
||
|
||
## Supported More Reasoning Models | ||
|
||
For future reasoning models, you can implement the reasoning parser as a subclass of `BaseReasoningParser` in `python/sglang/srt/reasoning_parser.py`. | ||
|
||
```python | ||
class BaseReasoningParser: | ||
"""Base class for reasoning parser.""" | ||
|
||
def __init__(self): | ||
self._buffer = "" | ||
|
||
def detect_and_parse(self, text: str) -> Tuple[Optional[str], Optional[str]]: | ||
"""Detect and parse the text, return reasoning_content and content.""" | ||
raise NotImplementedError | ||
|
||
def parse_streaming_increment( | ||
self, new_text: str | ||
) -> Tuple[Optional[str], Optional[str]]: | ||
"""Parse the new text incrementally, return reasoning_content and content.""" | ||
raise NotImplementedError | ||
``` | ||
|
||
And specify the reasoning parser for new reasoning models accordingly. | ||
|
||
```python | ||
class ReasoningParser: | ||
"""Reasoning parser for different reasoning models.""" | ||
|
||
# Specify the reasoning parser for each reasoning model here | ||
ReasoningParserDict: Dict[str, Type[BaseReasoningParser]] = { | ||
"deepseek-r1": DeepSeekR1ReasoningParser | ||
} | ||
|
||
def __init__(self, reasoning_parser: str): | ||
self.parser = self.ReasoningParserDict[reasoning_parser]() | ||
|
||
def parse_non_stream(self, full_text: str) -> Tuple[Optional[str], Optional[str]]: | ||
""" | ||
Non-streaming parsing for reasoning models. | ||
Return: reasoning_content, content | ||
""" | ||
return self.parser.detect_and_parse(full_text) | ||
|
||
def parse_stream_chunk(self, chunk_text: str): | ||
""" | ||
Streaming parsing for reasoning models. | ||
Return: reasoning_content, content | ||
""" | ||
return self.parser.parse_streaming_increment(chunk_text) | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
import json | ||
import logging | ||
import re | ||
from typing import Any, Dict, List, Optional, Tuple, Type | ||
|
||
|
||
class BaseReasoningParser: | ||
"""Base class for reasoning parser.""" | ||
|
||
def __init__(self, think_start_token: str, think_end_token: str): | ||
self._buffer = "" | ||
self.think_start_token = think_start_token | ||
self.think_end_token = think_end_token | ||
self.pattern = re.compile( | ||
rf"{self.think_start_token}(.*?){self.think_end_token}", re.DOTALL | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The most recent tokenizer hardcodes the opening This means the text coming back from inference won't include There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The PR added the start token if it is missing:
You can see it in |
||
) | ||
self.is_reasoning = True | ||
|
||
def parse_streaming_increment( | ||
self, new_text: str | ||
) -> Tuple[Optional[str], Optional[str]]: | ||
"""Parse the new text incrementally, return reasoning_content and content.""" | ||
# Should parse | ||
if self.is_reasoning: | ||
self._buffer += new_text | ||
|
||
# Reasoning continues | ||
if self.think_end_token not in self._buffer: | ||
return new_text, "" | ||
# Reasoning ends | ||
else: | ||
reasoning_part = new_text.split(self.think_end_token)[0] | ||
content_part = new_text.split(self.think_end_token)[1] | ||
|
||
self.is_reasoning = False | ||
self._buffer = "" | ||
|
||
return reasoning_part, content_part | ||
|
||
else: | ||
return "", new_text | ||
|
||
def detect_and_parse(self, text: str) -> Tuple[Optional[str], Optional[str]]: | ||
"""Detect and parse the text, return reasoning_content and content.""" | ||
if self.think_end_token not in text: | ||
return text, "" | ||
else: | ||
# Add the start token to the beginning of the text. | ||
text = self.think_start_token + text | ||
|
||
reasoning_content = self.pattern.findall(text)[0] | ||
content = text[ | ||
len(self.think_start_token) | ||
+ len(reasoning_content) | ||
+ len(self.think_end_token) : | ||
] | ||
|
||
return reasoning_content, content | ||
|
||
|
||
class DeepSeekR1ReasoningParser(BaseReasoningParser): | ||
""" | ||
DeepSeekR1 reasoning parser, which use "<think>" and "</think>" to detect the reasoning part. | ||
Referring to https://github.com/deepseek-ai/DeepSeek-R1?tab=readme-ov-file#usage-recommendations~. | ||
""" | ||
|
||
def __init__(self): | ||
super().__init__("<think> ", "</think> ") | ||
|
||
|
||
class ReasoningParser: | ||
"""Reasoning parser for different reasoning models.""" | ||
|
||
ReasoningParserDict: Dict[str, Type[BaseReasoningParser]] = { | ||
"deepseek-r1": DeepSeekR1ReasoningParser | ||
} | ||
|
||
def __init__(self, reasoning_parser: str): | ||
self.parser = self.ReasoningParserDict[reasoning_parser]() | ||
|
||
def parse_non_stream(self, full_text: str) -> Tuple[Optional[str], Optional[str]]: | ||
""" | ||
Non-streaming parsing for reasoning models. | ||
Return: reasoning_content, content | ||
""" | ||
return self.parser.detect_and_parse(full_text) | ||
|
||
def parse_stream_chunk(self, chunk_text: str): | ||
""" | ||
Streaming parsing for reasoning models. | ||
Return: reasoning_content, content | ||
""" | ||
return self.parser.parse_streaming_increment(chunk_text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate the docs I was too lazy to add!
Would you consider also supporting the
separate_reasoning
contract? For my use case we want inference users to be able to control whetherreasoning_content
is separated, rather than set it as default behavior on sglang launch, which I understand some sglang users will want to do.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean add a
separate_reasoning
parameter in sending requests?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separating reasoning and non-reasoning outputs is super useful, and would love for that to be a toggle rather than always on or always off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to merge the great changes from this PR in #3202 to try and get best of both worlds?
Or visa versa, @ShaoZhang0115?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated #3202 to combine functionality form this PR, and added some unittests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Microsoft has already shipped the separate_reasoning api to production and intends to keep it there, so would very much like to have it merged into main instead of maintaining a fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you give a reference?