Dong Wang*,
Haris Šikić*,
Lothar Thiele,
Olga Saukh,
Graz University of Technology, Austria
Complexity Science Hub Vienna, Austria
ETH Zurich, Switzerland
*: Equal Contribution.
Model Folding is a data-free model compression technique that merges structurally similar neurons across layers, reducing model size without fine-tuning or training data. It preserves data statistics using
Mamba is recommended for faster installation. You can also use Conda for all the packages. They are functionally equivalent.
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh
conda init zsh
We use FFCV for faster data loading, it only supports python<3.12. In case you do not want to use FFCV, you can just remove and then our code will use the default torchvision data loader.pip3 install ffcv
mamba create -y -n model_folding python=3.11 opencv pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
mamba activate model_folding
mamba install pandas scipy tqdm cupy pkg-config libjpeg-turbo numba -y
pip3 install torchopt ultralytics-thop wandb timm==0.9.16 torchist==0.2.3 matplotlib torchopt
pip3 install ffcv
pip3 install git+https://github.com/IBM/hartigan-kmeans.git
For computer vision dataset
- In case you installed the FFCV, you can go to prepare_dataset.py to generate
.beton
dataset files. - In case you do not want to use FFCV, image dataset will be automatically downloaded from torchvision at the 1st time you run the code.
For LLM dataset, the dataset will be automatically downloaded from Hugging Face.
For computer vision models including ResNet, VGG, etc.
We use the implementation of torchvision to load the model. For models trained on ImageNet, we reuse the pre-trained weights from torchvision. For models trained on CIFAR10, CIFAR100, etc., we train the models from scratch (see train.py).
For LLM models including LLaMA1, LLaMA2, etc.
We use the implementation of Hugging Face Transformers to load the model. Please make sure your hugging face account is logged in and authorized to access the models by Meta.
For computer vision models trained on ImageNet, we reuse the pre-trained weights from torchvision. For models trained on CIFAR10, CIFAR100, etc., we train the models from scratch (see train.py). For LLM models, we use the pre-trained weights from Meta and loaded by Hugging Face Transformers.
You can use the following commands to train the models. With --wider_factor to train the wider model, you can control the width of the model.
The well-trained models will be saved in the ./weights
folder, with a name of {model}_{dataset}.pth
or {model}_{wider_factor}Xwider_{dataset}.pth
.
Training resnet18 on CIFAR10
CUDA_VISIBLE_DEVICES=6 python train.py --dataset=CIFAR10 --model=resnet18 --batch_size=128 --epochs=100 --learning_rate=0.1 --wider_factor=1
Training vgg11_bn on CIFAR10
CUDA_VISIBLE_DEVICES=6 python train.py --dataset=CIFAR10 --model=vgg11_bn --epochs=100 --learning_rate=0.1 --wider_factor=1
Training resnet50 on CIFAR100
CUDA_VISIBLE_DEVICES=1 python train.py --dataset=CIFAR100 --model=resnet50 --batch_size=512 --epochs=100 --learning_rate=0.1 --wider_factor=1
As model folding in an ongoing research, we provide two versions of code for you to try. In one version, you can reproduce the results in the paper for all computer vision models. In the other version, you can try the latest code for both LLM models evaluated in the paper and computer vision models for the ongoing research.
Option 1 (suggested): To reproduce the results in the paper for all computer vision models, you can go to ./modelfolding_cv.
Option 2: To use this universal folding code for most computer vision models, you can use the following command:
CUDA_VISIBLE_DEVICES=2 python folding.py --gpus=0 --dataset=CIFAR10 --wider_factor=1 --weight='./weights/resnet18_CIFAR10.pth' --model=resnet18 --model_perm_config_path='./config/resnet18_perm.json'
Note: the computer vision models used for both options are the same. You try cv models with train.py and use the weights for both options.
For example, to fold Llama-1-7b
, you can use the following command:
CUDA_VISIBLE_DEVICES=0 python folding.py --gpus=0 --dataset=wikitext2 --model=llama_1_7b_hf --model_perm_config_path='./config/llama_1_7b_hf_perm.json'
You can adjust the folding ratio by changing the parameter pr
in the llama_1_7b_hf_recipe.py
file.
If you want to do zero-shot evaluation, please install evaluation code from a modified version of EleutherAI LM Harness from Wanda Github repo and set --zero_shot_eval
.
CUDA_VISIBLE_DEVICES=0 python folding.py --gpus=0 --dataset=wikitext2 --model=llama_1_7b_hf --model_perm_config_path='./config/llama_1_7b_hf_perm.json' --zero_shot_eval
@inproceedings{wang2025forget,
title = {Forget the Data and Fine-tuning!\\Just Fold the Network to Compress},
author = {Dong Wang and Haris \v{S}iki\'{c} and Lothar Thiele and Olga Saukh},
booktitle = {Proceedings of the International Conference on Learning Representations (ICLR)},
year = {2025},
url = {https://openreview.net/forum?id=W2Wkp9MQsF}
}
Model Folding Team w/ ❤️