This project implements an image captioning model utilizing CNN and Transformer architectures. The model architecture is as follows:
-
CNN as Feature Extractor:
- We use EfficientNetB0 as the Convolutional Neural Network (CNN) to extract features from the images.
-
Transformer Encoder:
- The extracted features from the CNN are passed to the encoder, which consists of multiple Transformer layers.
- These layers process the features and prepare them for decoding.
-
Transformer Decoder:
- The decoder, which also comprises Transformer layers, generates captions for the images based on the features provided by the encoder.
-
Data
- Each input image is associated with five captions.
- Data preprocessing and augmentation techniques are applied to improve the model's robustness and performance.
-
Model Training
- Once the model is trained, the results, metrics, and the trained model itself are logged using MLFlow and DagsHub for efficient tracking and management of model development.
![Screenshot 2024-08-22 at 11 33 45 PM](https://private-user-images.githubusercontent.com/97504177/360816017-447953df-5d2c-4448-ba42-40ec00f7611f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0NzEzMTIsIm5iZiI6MTczOTQ3MTAxMiwicGF0aCI6Ii85NzUwNDE3Ny8zNjA4MTYwMTctNDQ3OTUzZGYtNWQyYy00NDQ4LWJhNDItNDBlYzAwZjc2MTFmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEzVDE4MjMzMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBhYmRjNGYxODYzMTQxOWQ5YTQyNzNhZDJmY2U0ODU5NzY4YzQwNGY5NzZlZWU2MWRlNWQ5ZTc2NTc1ZTdkODAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.9xYx-725rkyBBDs4cbn6IPPoT2Pl9q9EBk-rFuNQOXE)
![Screenshot 2024-08-22 at 11 33 34 PM](https://private-user-images.githubusercontent.com/97504177/360816001-b41c455c-915c-46e0-96b5-b8bcb5194c65.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0NzEzMTIsIm5iZiI6MTczOTQ3MTAxMiwicGF0aCI6Ii85NzUwNDE3Ny8zNjA4MTYwMDEtYjQxYzQ1NWMtOTE1Yy00NmUwLTk2YjUtYjhiY2I1MTk0YzY1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEzVDE4MjMzMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWVjMThiM2M5NTE4MWNmYzJlYWUyODY5MGIxNWZlMWUwZTI2ZTAxN2ZmNmMwODMyMGNiYTJhNDkyYmEyMTM3MjMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.fAdTf2vjd22AI3aFmUvkpbiu2RE_xvmrGF2z2_LOTTA)