(*Note: This is an ongoing project, hence the full code and strategy is not yet open-sourced by the author.)
We presnet a new multi-task learning strategy using Vision transformers (ViTs). Our approach is based on exploiting the class-token and self-attention mechanism of Vision Transformers in order to train multiple tasks through a single ViT, more efficiently and with limited computational budget.