This project implements an image captioning model utilizing CNN and Transformer architectures. The model architecture is as follows:
-
CNN as Feature Extractor:
- We use EfficientNetB0 as the Convolutional Neural Network (CNN) to extract features from the images.
-
Transformer Encoder:
- The extracted features from the CNN are passed to the encoder, which consists of multiple Transformer layers.
- These layers process the features and prepare them for decoding.
-
Transformer Decoder:
- The decoder, which also comprises Transformer layers, generates captions for the images based on the features provided by the encoder.
-
Data
- Each input image is associated with five captions.
- Data preprocessing and augmentation techniques are applied to improve the model's robustness and performance.
-
Model Training
- Once the model is trained, the results, metrics, and the trained model itself are logged using MLFlow and DagsHub for efficient tracking and management of model development.
![Screenshot 2024-08-22 at 11 33 45 PM](https://private-user-images.githubusercontent.com/97504177/360816017-447953df-5d2c-4448-ba42-40ec00f7611f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2NTU5MjIsIm5iZiI6MTczOTY1NTYyMiwicGF0aCI6Ii85NzUwNDE3Ny8zNjA4MTYwMTctNDQ3OTUzZGYtNWQyYy00NDQ4LWJhNDItNDBlYzAwZjc2MTFmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE1VDIxNDAyMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZlZDE5YWQ3ZDU3MWMxODNiMTUxZGI0NTlkZWZlZDNhZDY5MGFiZGE5NTAzMGVlNjdmZmFhODNhZjYyNTZmZjgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.jgS9qCrCVCXF8pKDJ3BTaCZfeY_fkKcCeWKp_vSQ39c)
![Screenshot 2024-08-22 at 11 33 34 PM](https://private-user-images.githubusercontent.com/97504177/360816001-b41c455c-915c-46e0-96b5-b8bcb5194c65.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2NTU5MjIsIm5iZiI6MTczOTY1NTYyMiwicGF0aCI6Ii85NzUwNDE3Ny8zNjA4MTYwMDEtYjQxYzQ1NWMtOTE1Yy00NmUwLTk2YjUtYjhiY2I1MTk0YzY1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE1VDIxNDAyMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQ3M2U1ZDZjMjMwNzBlYzkwNmExM2JmMzE3NmU3MDBlNDIyNGJmYWU3MzUwOTBjMzk2NzVhYzI0YTI4NTY4MTMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.p4LT8kkXjR5ewNiOuUE85EhetdTGjZ2iInrclqqU768)