This repository contains the final python notebooks that were employed in 3 Kaggle challenges that were proposed during the course. We have exploited Colab and Kaggle servers to train our models. Since they offer the possibility to keep track of the file history, sometimes we didn't remember to update this repository and therefore you might miss some intermediate modification we have made on the files. The datasets that were used in each of the challenge are contained in a separate repo which is imported as a submodule. Artificial Neural Networks have shown impressing results in a broad range of application domains. The challenges are nothing else than a set of problems taken from image processing. The order in which they were presented was set to progressively increase the complexity of the tasks.
The repo is organzed as follow:
DL-CompetitionsDatasets, contains the datasets;
dataSetStatistics.py, was used to evaluate some characteristics of the datasets;
image_classification.ipynb, python notebook for the first challenge;
image_segmentation.ipynb, python notebook for the second challenge;
question_answering.ipynb, python notebook for the third challenge;
resize_on_disk.ipynb, python notebook to transform the dataset of the third challenge;
-
The first competition consists of a classification problem. In an image classification problem, given an image, the goal is to predict the correct class to which the image belongs. The task request to categorize 307 images in 20 different classes.
In this challenge we have used: Convolutional Neural Netowrks, basic data augmentation techniques (zoom, rotation, horizontal and vertical flip), transfer learning with an without fine tuning ( Resnet, DenseNet201, InceptionV3, InceptionResNetV2, InceptionResNetV2, Xception) and ensembles with K-folding.
For more information on the competition or in the techniques applied take a look on the two links below.
-
In this second challenge we were requested to segment an image. Image segmentation can be seen as a classification problem applied to each pixel in the figure provided as input. The dataset, that with high probability was a subset of the Inria dataset, contains aerial
orthorectifiedcolor images (you can see an example below). The challenge consists in determining which of the pixels belonged to a building.In this challenge we have used: U-Net models, transfer learning using pretrained networks such as DenseUNet and ResUNet, data augmentation( horizontal/vertical/zoom), preprocessing and postprocessing using techniques taken from image analysis and computer vision ( histogram equalization, Gaussian filters, morphological Transformations provided by OpenCV), we tried to increase the number of channels adding what can be obtained trough Laplacian filter and we tried a custom data augmentation, aimed at enriching the dataset by creating synthetic aerial images
For more information on the competition or in the techniques applied take a look on the two links below.
-
This was the most difficult challenge we faced. In this task the network takes two inputs: i) a synthetic scene in which are presented several objects with different geometric shapes and/or finishes (colour, material) ii) and a question about the existence of something in the scene (e.g., Is there a yellow thing?') or about counting (e.g., How many big objects are there?'). The network has to produce a suitable answer by choosing between a set of predefined sentence: yes, no, 0, 1, ..., 9. So in a certain sense, it can be seen as a classification problem.
An example. Q: What number of other matte objects are the same shape as the small rubber object? A: 1 Even if the challenge was a subset of CLEVR, the dataset was huge : more than 12 GB. As a consequence, the first thing we did was to accelerate the training procedure (a batch of 64 elements took 2 seconds to be processed). After reading A simple neural network module for relational reasoning, it became clear that the task could be solved using images with lower resolution. In this way, we were able to reduce by around 8 times the time taken to process a batch and this allows to exploit more efficient caching mechanisms.
The basic architecture that we used was a combination of three NNs. A CNN processed the image, while embedding + LSTM examined the question. The two outputs were then transformed by a dense layer to output a 1-hot encoded answer.
We have tried several approach: tackling with different networks counting and boolean questions, GRU, different pre-trained feature extractors, pretrained word embedding, attention mechanisms and we designed a custom data generator to provide evenly distributed batches.
For more information on the competition or in the techniques applied take a look on the two links below.