Skip to content

Alternate Dataset for Fast.ai Lesson 1 of Deep Learning Part 1

Notifications You must be signed in to change notification settings

tjohnson250/fastai_barbieswomen

Repository files navigation

A dataset and Jupyter notebook for exploring Lesson 1 of the Fast.ai Deep Learning 1 course using barbies vs. women, instead of the original Kaggle Dogs and Cats dataset. This is verty small dataset, so it is difficult to get stable results. However, I can usually get 90% plus and sometimes find weights that give 96% on the validation set.

@semih suggested classifying photos of barbies vs. women: http://forums.fast.ai/t/wiki-lesson-1/9398/

barbieswomen.zip contains training and validation data set up using the folder structure required for fast.ai.

To use this with the version of fast.ai used in the courses, it best to clone this repo and then move the notebooks into one of the course folders, then unzip the datafile and move it the train and valid directories into the courses data folder.

The Barbie and Women Import Notebook contains sample code for creating the dataset.

I created the dataset using two python scripts:

googleimagesdownload: https://github.com/hardikvasa/google-images-download

You can install this using pip install google-images-download

make_train_valid.py from https://github.com/prairie-guy/ai_utilities

googleimagesdownload requires a machine with a chrome browser and the appropriate chromedriver (see the googleimagesdownload GitHub repo for instructions). Otherwise, you are limited to 100 images.

Download the images using these commands.

googleimagesdownload -k "woman" -o "barbieswomen" --format jpg --usage_rights labeled-for-reuse -l 150 --chromedriver ./chromedriver
googleimagesdownload -k "barbie" -o "barbieswomen" --format jpg --usage_rights labeled-for-reuse -l 150 --chromedriver ./chromedriver

Examine the images and remove incorrect images. I removed all paintings and images that were not clearly women or barbies. I also removed images that contained both women and barbies, since the model is forced to choose between one or the other classification.

Use imagemagick to resize images for easier uploading and processing:

cd women
convert -resize '640' *.jpg woman.jpg

You will now see your original files and new files titled woman-n.jpg in the same directory. If you are happy with the resizing delete, the originals and convert the other directory of images.

If you are sure the resize is exactly as you need, you can also use mogrify instead of convert to resize and replace your originals:

mogrify -resize '640' *.jpg woman.jpg

Make the train and valid datasets/directory structure:

make_train_valid.py barbieswomen --train .80 --valid .20

Now compress the directory and upload to your VM. If using Paperspace through SSH, execute:

scp barbieswomen.zip paperspace@<your machine's public IP address>:./barbieswomen.zip

About

Alternate Dataset for Fast.ai Lesson 1 of Deep Learning Part 1

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published