Given a CT Scan image we have to classify wheather the CT Scan image is Adenocarinoma cancer or not.
Click the below image to see vedio solution explaination.
- Problem understand and gaining information about cancer.
- Data collection and uploading zip file to google drive.
- Creation of virutal environment.
- Performing experiment on jupyter notebook using pretrained VGG 16 model.
- Creation of project structure and project packaging.
- Converting jupyter notebook code to modular coding with exception handling and logging.
- Developing training pipeline components and pipeline itself.
- Intgration of mlflow to track experiments and for recoding of parameter, result and preformance metrics.
- Training model using training pipeline and tracking experiments using mlflow with dagshub as remote repository.
- Storing trained model in local artifacts repository.
- Developing prediction pipeline which classify wheather lung has adinocarsinoma cancer or not using chest/lung ct scan image.
- Developing of application using streamlit which takes ct scan image from user then uses trained model to predict output then render back on ui.
- Dockerizing application to deploy on cloud.
- Deploying lung cancer detection application on AWS cloud.
- Update config.yaml
- Update secrets.yaml (Optional)
- Update params.yaml
- Update the entity
- Update the configuration manager in src config
- Update the components
- Update the stages
- Update the Training pipeline
- Update the dvc.yaml
Note : When we use mlflow before runing code we have to set the mlflow variables
Note : When we do pipeline versioning we have to have driver code in each stages itself
Case 1 : Adinocarsinoma cancer image
Case 2 : Normal image
I have used DVC for versioning of training pipeline.
The below image in the mlflow dag which represent the dependency of the components.
https://dagshub.com/DarshanRokkad/Chest_Cancer_Classification
I used mlflow to manage my deep learning life cycle by logging the evalution metrics and plots.
I used dagshub as a remote repository with mlflow to store the logs and artifacts.
The below is project pipeline of the project.
I have used AWS ECR and AWS EC2 to deploy our application.
│
├── .dvc <-- used for data and pipeline versioning
│
├── .github/workflow <-- contains yml code to create CI-CD pipeline for github actions
│
├── artificats (remote) <-- contains dataset and trained models(in remote repository)
│
├── config <-- contains yaml file where we mention the configuration of our project
│
├── images <-- contains images used in readme file
│
├── logs (remote) <-- contains logs created during running of pipelines and components
│
├── notebook <-- contains jupyter notebook where experiments and research work is done
│
├── secrets (remote) <-- contains a yaml file which contains the api tokens, secreat keys, password and many more
│
├── src
│ │
│ └── lung_cancer_classifier (package)
│ │
│ ├── components
│ │ │
│ │ ├── __init__.py
│ │ │
│ │ ├── data_ingestion.py <-- this module downloads zip file dataset present in google drive and extracts zip file in local machine
│ │ │
│ │ ├── prepare_base_model.py <-- this module pulls the vgg-16 base model and adds custom layers at the end then saves custom model
│ │ │
│ │ ├── model_trainer.py <-- this module take the custom model and train it with the training data and validates with validation data
│ │ │
│ │ └── model_evaluation.py <-- this module test the trained model with the testing data and log the evaluation metrics and artifacts to dagshub using mlflow
│ │
│ ├── config <-- this folder contains module that have the configuration manager which is used to manage configuration of each components of training pipeline
│ │
│ ├── constants <-- module contains path of the yaml file
│ │
│ ├── entity <-- has a python file which contains data class of each component of the training pipeline
│ │
│ ├── pipeline
│ │ │
│ │ ├── __init__.py
│ │ │
│ │ ├── training_pipeline.py <-- module used to train the model in different stages
│ │ │
│ │ └── prediction_pipeline.py <-- module takes the image from user through web application and returns the prediction
│ │
│ ├── training_stages <-- folder used to create stages by using the configuration manager and components
│ │ │
│ │ ├── __init__.py
│ │ │
│ │ ├── stage_01_data_ingestion.py <-- module used to create a data ingestion configuration object and then ingest data into local machine
│ │ │
│ │ ├── stage_02_prepare_base_model.py <-- module used to create custom model by using vgg-16 as base model and modify/add few fully connected layers at last
│ │ │
│ │ ├── stage_03_model_trainer.py <-- module used to train custom model using training and validation data
│ │ │
│ │ └── stage_04_model_evaluation.py <-- module used to evaluate the trained model using test data
│ │
│ ├── utils <-- module to which contians functions that are commonly used.
│ │
│ └── __init__.py <-- this python file contains logger
│
├── .dvcignore <-- similar to .gitignore
│
├── .gitignore <-- used to ignore the unwanted file and folders
│
├── LICENSE <-- copyright license for the github repository
│
├── README.md <-- used to display the information about the project
│
├── app.py <-- this is contains web page written in streamlit
│
├── dvc.lock <-- this is file is output of pipeline versioning
│
├── dvc.yaml <-- this is yaml file contains code to reproduce training pipeline
│
├── params.yaml <-- this yaml file contains the parameters and values used during model training
│
├── requirements.txt <-- text file which contain the dependencies/packages used in project
│
├── scores.json <-- contains the score recorded after model evaluation
│
├── setup.py <-- python script used for building our project as a python packages
│
└── template.py <-- program used to create the project structure