GitHub - Ali619/Llama-3.2-1B-Instructor-finetuned-persian-alpaca

Fine-Tune Llama 3.2-1B with only 8 GiB of gpu memory and batch_size=32

unsloth will reduce a large amount of VRAM usage and also speedup training even on low-rank consumer GPU devices. I'm using a Nvidia RTX-3060 with 12 GB VRAM to fine-tune this model and just used 8 GB of VRAM with batch siez=32 per device. You can see all the hyperparameters below:

from transformers import TrainingArguments

STEP = 100
BATCH_SIZE = 8
MAX_LENGTH = 2048

per_device_train_batch_size=BATCH_SIZE
gradient_accumulation_steps=int(16 / BATCH_SIZE)
optim="adamw_bnb_8bit"
learning_rate=2e-3
lr_scheduler_type="cosine"
weight_decay=0.01
warmup_steps=STEP
fp16=not is_bfloat16_supported()
bf16=is_bfloat16_supported() # True

How to run

Install requirements.txt:

pip install -r requirements.txt

Note: You have to install cuda and other dependencies to run this script on GPU, otherwise it would be too slow to finish.

Run:

python fine-tune.py

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
pics		pics
.gitignore		.gitignore
README.md		README.md
fine-tune.py		fine-tune.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-Tune Llama 3.2-1B with only 8 GiB of gpu memory and batch_size=32

How to run

About

Releases

Packages

Languages

Ali619/Llama-3.2-1B-Instructor-finetuned-persian-alpaca

Folders and files

Latest commit

History

Repository files navigation

Fine-Tune Llama 3.2-1B with only 8 GiB of gpu memory and batch_size=32

How to run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages