-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: Prefix caching doesn't work for LlavaOneVision
bug
Something isn't working
#11371
opened Dec 20, 2024 by
sleepwalker2017
1 task done
[Bug]: The service operation process results in occasional exception errors RuntimeError: CUDA error: an illegal memory access was encountered
bug
Something isn't working
#11366
opened Dec 20, 2024 by
pangr
1 task done
[Feature]: Add support for attention score output
feature request
#11365
opened Dec 20, 2024 by
WoutDeRijck
1 task done
[Misc]: What is 'residual' used for in the IntermediateTensor class?
misc
#11364
opened Dec 20, 2024 by
oldcpple
1 task done
[Bug]: The following fields were present in the request but ignored: {'schema'}
bug
Something isn't working
#11363
opened Dec 20, 2024 by
Quang-elec44
1 task done
[Bug]: priority scheduling doesn't work according to token_per_s. The token_per_s of requests with higher priorities is not higher than that of requests without priority settings.
bug
Something isn't working
#11361
opened Dec 20, 2024 by
kar9999
1 task done
[Feature]: meta-llama/Prompt-Guard-86M Usage Value Error.
feature request
#11360
opened Dec 20, 2024 by
burakaktas35
1 task done
[Bug]: vllm 0.6.3.post1 crash when deploy qwen2vl 72b
bug
Something isn't working
#11356
opened Dec 20, 2024 by
xxlight
1 task done
[Bug]: V100 cannot use the -enable-chunked-prefill method with dtype float16, but it can be used with dtype float32
bug
Something isn't working
#11352
opened Dec 20, 2024 by
warlockedward
1 task done
[New Model]: answerdotai/ModernBERT-large
new model
Requests to new models
#11347
opened Dec 19, 2024 by
pooyadavoodi
1 task
[Bug]: no output of profile when VLLM_TORCH_PROFILER_DIR is enabled for vllm serve
bug
Something isn't working
#11346
opened Dec 19, 2024 by
ziyang-arch
1 task done
[Performance]: 1P1D Disaggregation performance
performance
Performance-related issues
#11345
opened Dec 19, 2024 by
Jeffwan
1 task done
[Bug]: Paligemma 2 model loading error
bug
Something isn't working
#11343
opened Dec 19, 2024 by
mmderakhshani
1 task done
[Bug]: Multi-Node CPU Inference on MacOS calling Something isn't working
intel_extension_for_pytorch
bug
#11342
opened Dec 19, 2024 by
MoSedkyy
1 task done
[Bug]: CUDA illegal memory access in flash attention only for specific values of --max-num-seqs (with AWQ model )
bug
Something isn't working
#11340
opened Dec 19, 2024 by
camfarineau
1 task done
[Usage]: How to expand inference context length to longer (such as 128k, 256k) on multi modality models.
usage
How to use vllm
#11337
opened Dec 19, 2024 by
Wiselnn570
1 task done
[Bug]: vllm crash when 20 concurrent test with long content (9k words)
bug
Something isn't working
#11335
opened Dec 19, 2024 by
Flynn-Zh
1 task done
[Bug]: FP8 kvcache causes RuntimeError in v1 engine
bug
Something isn't working
#11329
opened Dec 19, 2024 by
Nekofish-L
1 task done
[Usage]: how to use torch_compile
usage
How to use vllm
#11323
opened Dec 19, 2024 by
chenglu66
1 task done
[Usage]: I want to use guided-decoding,chunked-prefill, and prefix-caching simultaneously in multimodal model. What parameters do I need to pass during startup?
usage
How to use vllm
#11322
opened Dec 19, 2024 by
wciq1208
1 task done
[Bug]: Error Encountered While Attempting Python-only Build (Without Compilation) for vLLM v0.6.5
bug
Something isn't working
#11321
opened Dec 19, 2024 by
Leander-wang
1 task done
[Performance]: Performance degradation due to CPU bottleneck when serving embedding models to GPUs
performance
Performance-related issues
#11320
opened Dec 19, 2024 by
ashgold
1 task done
[Doc]: Update default max_num_batch_tokens for chunked prefill
documentation
Improvements or additions to documentation
#11319
opened Dec 19, 2024 by
toslunar
1 task done
[Performance]: vllm0.6.5加载GLM4-9B-Chat,动态加载lora,输入长文本时推理性能下降较多
performance
Performance-related issues
#11317
opened Dec 19, 2024 by
zh19980310
1 task done
[Bug]: Chat with n>1 breaks xgrammar
bug
Something isn't working
#11312
opened Dec 18, 2024 by
joerunde
1 task done
Previous Next
ProTip!
What’s not been updated in a month: updated:<2024-11-20.