Skip to content
View JL-Roger's full-sized avatar

Block or report JL-Roger

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.

Python 46 Updated Dec 14, 2024

lightweight, standalone C++ inference engine for Google's Gemma models.

C++ 6,097 520 Updated Jan 31, 2025

Apple AMX Instruction Set

C 1,033 50 Updated Dec 26, 2024

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Python 11,413 1,138 Updated Jan 31, 2025

The road to hack SysML and become an system expert

Emacs Lisp 462 57 Updated Sep 25, 2024

QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.

Python 96 8 Updated Dec 5, 2024

PyTorch native quantization and sparsity for training and inference

Python 1,801 211 Updated Feb 3, 2025

Advanced Quantization Algorithm for LLMs/VLMs.

Python 363 29 Updated Jan 27, 2025

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

JavaScript 5,802 593 Updated Jan 8, 2025

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 1,919 235 Updated Jan 20, 2025

Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组

Python 9,556 939 Updated Feb 3, 2025

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 6,466 459 Updated Feb 1, 2025

VPTQ, A Flexible and Extreme low-bit quantization algorithm

Python 574 39 Updated Jan 21, 2025

A bunch of coding tutorials for my Youtube videos on Neural Network Quantization.

Jupyter Notebook 14 6 Updated May 21, 2024

this repository accompanies the book "Grokking Deep Learning"

Jupyter Notebook 7,513 1,586 Updated Jun 1, 2024

Solve puzzles. Learn CUDA.

Jupyter Notebook 10,432 804 Updated Sep 1, 2024

Introduction to Machine Learning Systems

JavaScript 1,409 170 Updated Feb 3, 2025

Efficient Deep Learning Systems course materials (HSE, YSDA)

Jupyter Notebook 737 120 Updated Feb 2, 2025

我的 ComfyUI 工作流合集 | My ComfyUI workflows collection

5,704 535 Updated Dec 20, 2024

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 2,853 229 Updated Jan 24, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 5,408 476 Updated Feb 2, 2025

小彭老师领衔编写,现代C++的中文百科全书

Typst 801 56 Updated Jan 26, 2025

AI模型序列化总结

51 9 Updated Jan 3, 2020

OpenMMLab Model Deployment Framework

Python 2,832 648 Updated Sep 30, 2024

Protocol Buffers - Google's data interchange format

C++ 66,506 15,610 Updated Feb 3, 2025

FlatBuffers: Memory Efficient Serialization Library

C++ 23,707 3,290 Updated Jan 25, 2025

Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚

Python 21,476 1,227 Updated Feb 1, 2025

CVNets: A library for training computer vision networks

Python 1,822 235 Updated Oct 30, 2023

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

Python 4,546 657 Updated Jan 28, 2025
Next
Showing results