Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding quantized Model #26

Open
SaddamBInSyed opened this issue Jun 27, 2024 · 1 comment
Open

Adding quantized Model #26

SaddamBInSyed opened this issue Jun 27, 2024 · 1 comment

Comments

@SaddamBInSyed
Copy link

Hi
Thanks for this good work.,

Is there a method to add a quantized LLM model so that it can be run on a GPU with under 10GB of VRAM, making it accessible to more users?

Note:
I am running llama3 from the ollama tool on my laptop, so once we have this option in this repo, I can test the same on my laptop itself.

Thank you

@SaschaHornauer
Copy link

+1 Exactly that, just being able to choose one smaller model would be already great. Being able to pick and choose from a model zoo even better so people can choose their performance vs. hardware requirements. Personally I would like to run this on my laptop RTX with 8GB VRAM where I know that some small LLMs perform already surprisingly well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants