🎯
Focusing
- Chengdu, China
-
07:12
- 8h ahead
-
vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
-
cutlass Public
Forked from NVIDIA/cutlassCUDA Templates for Linear Algebra Subroutines
C++ Other UpdatedDec 11, 2024 -
pytorch Public
Forked from pytorch/pytorchTensors and Dynamic neural networks in Python with strong GPU acceleration
Python Other UpdatedNov 25, 2024 -
flashinfer Public
Forked from flashinfer-ai/flashinferFlashInfer: Kernel Library for LLM Serving
Cuda Apache License 2.0 UpdatedNov 18, 2024 -
triton Public
Forked from triton-lang/tritonDevelopment repository for the Triton language and compiler
C++ MIT License UpdatedNov 17, 2024 -
bitsandbytes Public
Forked from bitsandbytes-foundation/bitsandbytesAccessible large language models via k-bit quantization for PyTorch.
Python MIT License UpdatedNov 6, 2024 -