[feature]: Support CPU accelerate by using GGUF #87

Aisuko · 2024-05-02T08:47:25Z

https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf?source=post_page-----7d1fa0b0b623--------------------------------

Aisuko · 2024-06-23T13:13:33Z

Hugging Face transformers already support gguf. However, only several model architectures. So, we will do some test first. If it is ok we will suppert CPU accelerate smoothly. More detail see our discussion

Aisuko · 2024-06-26T23:46:21Z

Currently, we support CPU inference accelerate using llama.cpp. However, we will keep working on kimchima repo. We need to implement the CPT and fine-tune in kimchima

Aisuko changed the title ~~[feature]: Support CPU accelerate~~ [feature]: Support CPU accelerate by using GGUF May 2, 2024

Aisuko self-assigned this Jun 23, 2024

Aisuko added llms quantization cpu labels Jun 23, 2024

Aisuko mentioned this issue Jun 23, 2024

[Feature]: Support inference with the Q4_k_m quantization gguf models SkywardAI/kirin#162

Closed

2 tasks

Aisuko removed the llms label Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature]: Support CPU accelerate by using GGUF #87

[feature]: Support CPU accelerate by using GGUF #87

Aisuko commented May 2, 2024

Aisuko commented Jun 23, 2024 •

edited

Loading

Aisuko commented Jun 26, 2024

[feature]: Support CPU accelerate by using GGUF #87

[feature]: Support CPU accelerate by using GGUF #87

Comments

Aisuko commented May 2, 2024

Aisuko commented Jun 23, 2024 • edited Loading

Aisuko commented Jun 26, 2024

Aisuko commented Jun 23, 2024 •

edited

Loading