Minimal implementation of transformer-encoder for text classification, transformer-decoder for text generation, and ViT for image classification (diffusion transformer for image generation is in this repo.)
The .py file contains codes for:
- Word, character, BPE tokenizers and vocabulary generation,
- Text generation and text classification dataset formation,
- Text and image embeddings,
- Encoder, decoder, and ViT models, with modules shared as much as possible,
- Training and evaluation, common for all three tasks.
.ipynb files minimally illustrate the training and evaluation of models by using toy datasets (including MNIST for ViT) and light-weight transformers. However, the code in .py file should allow training scaled-up models on large datasets as well.
References: