Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] 简洁阐述功能 如何计算传给模型的内容的token数量,并动态限制确保不超过max-model-len的值 #5104

Open
TZJ12 opened this issue Nov 26, 2024 · 6 comments
Labels
enhancement New feature or request stale

Comments

@TZJ12
Copy link

TZJ12 commented Nov 26, 2024

功能描述 / Feature Description
是否支持计算传给模型的引用文本、提示词、用户问题等内容的token数量,并且在超出模型能接受的最大token的时候进行动态限制,防止报错: 请求内容超出模型最大token数量

解决的问题 / Problem Solved
解决 请求内容超出模型最大token数量的报错

实现建议 / Implementation Suggestions

替代方案 / Alternative Solutions

其他信息 / Additional Information
324ECFE4-DBD9-4ec8-AA95-B9D79291F947

@TZJ12 TZJ12 added the enhancement New feature or request label Nov 26, 2024
@948024326
Copy link

动态限制? 如果你是直接从xinfer三方模型服务平台接进来的 那么你要从chatchat报错里改了

@TZJ12
Copy link
Author

TZJ12 commented Nov 27, 2024

我是使用vllm启动的模型

@948024326
Copy link

我是使用vllm启动的模型

得从chatchat这部分改 我后面看下 处理好后回复你

@TZJ12
Copy link
Author

TZJ12 commented Nov 27, 2024

好的,谢谢你

@Zephyr69
Copy link

Zephyr69 commented Dec 5, 2024

同问,自己也在看。这部分居然没有做限制,一旦超过max-model-len直接让它报错。

Copy link

github-actions bot commented Jan 4, 2025

这个问题已经被标记为 stale ,因为它已经超过 30 天没有任何活动。

@github-actions github-actions bot added the stale label Jan 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

3 participants