GPTQ Fused MoE class#8
Closed
ElizaWszola wants to merge 82 commits intomarlin-moe-8-bitfrom gptq_fused_moe
+6,722-1,916
Commits
Commits on Sep 3, 2024
Commits on Sep 4, 2024
- committed
- committed
- committed
- committed
- committed
- committed
- authored
- authored
- authored
- authored
Commits on Sep 5, 2024
- authored
- authored
- authored
- authored
- authored
- authored
[Documentation][Spec Decode] Add documentation about lossless guarantees in Speculative Decoding in vLLM (vllm-project#7962)
authored- committed
Commits on Sep 6, 2024
- authored
- committed
- committed
- committed
- authored
- committed
- authored
- committed
- authored
[Kernel] [Triton] Memory optimization for awq_gemm and awq_dequantize, 2x throughput (vllm-project#8248)
authored- authored
Commits on Sep 7, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- committed
- committed
- authored
- authored
- authored
Commits on Sep 8, 2024
Commits on Sep 9, 2024
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
[Bugfix] Streamed tool calls now more strictly follow OpenAI's format; ensures Vercel AI SDK compatibility (vllm-project#8272)
authored- authored
- authored
- authored