Accepted as a conference paper at ICLR 2021 [OpenReview]
Note: This implementation is adopted from the source code of Once For All (Cai et al. 2019)
If you find CompOFA useful for your work, please cite it using:
@inproceedings{compofa-iclr21,
author = {Manas Sahni and Shreya Varshini and Alind Khare and Alexey Tumanov},
title = {{C}omp{OFA}: Compound Once-For-All Networks for Faster Multi-Platform Deployment},
booktitle = {Proc. of the 9th International Conference on Learning Representations},
series = {ICLR '21},
month = {May},
year = {2021},
url = {https://openreview.net/forum?id=IgIk8RRT-Z}
}
CompOFA is a model design space that exploits the insight of compound couplings between model dimensions of a CNN to vastly simplify the search space while maintaining Pareto optimality. The smaller space can be trained in half the time without phases, and gives equally performant and diverse model families.
CompOFA networks consistently achieve comparable and higher ImageNet accuracies for similar latency and FLOP constraints on CPU, GPU and mobile platforms.
Through experiments on ImageNet, we achieve a 2x reduction in training time and 216x speedup in model search time as compared to the state of the art, without loss of Pareto optimality!
OFA | CompOFA | |
---|---|---|
Train Time (GPU Hours) | 978.3 | 493.5 |
Train Cost | $2.4k | $1.2k |
CO2 emission (lbs) | 277 | 128 |
Search Time | 4.5 hours | 75 seconds |
CompOFA also yields a higher average accuracy, i.e. as a population it has a higher concentration of accurate models.
Tested with:
- Python 3.7
torch
1.3.1torchvision
0.4.2horovod
0.19.3 for multi-GPU training See requirements.txt for complete list
Run the below 2 commands to train CompOFA with fixed kernel sizes and the compound heuristic.
[horovodrun -np <num_gpus> -H <node1:num_gpus>,<node2:num_gpus>...] python train_ofa_net.py --task compound --phase 1 --fixed_kernel --heuristic simple
[horovodrun -np <num_gpus> -H <node1:num_gpus>,<node2:num_gpus>...] python train_ofa_net.py --task compound --phase 2 --fixed_kernel --heuristic simple
All 243 subnets are trained together in both the phases, but using different training hyperparameters. Note that this is different from the progressive shrinking in OFA, where the training for different task
s progressively grows the number of trained networks.
ofa/checkpoints/
directory contains pre-trained models for CompOFA-MobileNetV3 with fixed kernel and elastic kernel.
See eval_sampled_config.py
for example on sampling a random compound subnet of CompOFA and validating its top-1 accuracy
python eval_sampled_config.py --net <PRETRAINED_PATH> --imagenet_path <IMAGENET_PATH>
In the NAS directory run the following command to execute the Neural Architecture Search for finding the optimal sub-networks for its corresponginf target latency.
python run_NAS.py --net=<OFA_NETWORK> --target-hardware=<TARGET_HARDWARE> --imagenet-path <IMAGENET_PATH>
--net takes in the name of the specific type of model to carry out NAS on:
'compofa'
: CompOFA with fixed kernel'compofa-elastic'
: CompOFA with elastic kernel'ofa_mbv3_d234_e346_k357_w1.0'
: OFA network
--target-hardware takes in the type of deployment hardware that guides the latency-specfic NAS:
'note10'
'gpu'
'cpu'