Skip to content

Latest commit

 

History

History
117 lines (76 loc) · 7.13 KB

05_questions.md

File metadata and controls

117 lines (76 loc) · 7.13 KB

Lesson 5 Questions: Pet breed classification

  1. The process of first resizing on the cpu and then augmenting by performing 'warping' operations (rotating, streching, etc) is known as presizing in fast.ai. The problem with normal data augmentation is that empty edge zones are added or data can be made worse (from various different transforms occuring at different times). By first resizing to a large size on the CPU and then selecting random crops (with streching) on the GPU, we are allowing for no empty edge zones with fewer operations on the data. This way, training images are kept high quality and able to train quickly.

  2. Completed the https://regexone.com/lesson/excluding_characters? tutorial

  3. In most deep learning datasets, data is commonly provided in one of these two ways: individual images which can be in specific folders and have specific file names to give information about the data contained in the file (what the target variable is), or a table of data that provides the connections between data in the table and data in other file formats (such as text documents or images).

  4. New methods for fast ai L class:

l = L()

l.append(5)
l.append(4)
l.append(8)

l2 = L(range(50))

l.append(5)
l.append(5)

unique_l = l.unique()
unique_l

l_filter = l2.filter(lambda num: num < 10)
l_filter

l_mapped = l_filter.map(lambda num: num*2)
l_mapped

dependant = [1, 2, 3, 4]
target = ['a', 'b', 'c', 'd']

d_loader = L(dependant, target).zip()
d_loader
  1. Using the pathlib module in python
# path = path.relative_to('/root/.fastai/data/oxford-iiit-pet')
path_list = [subdir for subdir in path.iterdir() if subdir.is_dir()]
path_list = L(path_list)
path_roots = path_list.map(lambda path: path.root)
print(path_list)

path_data = path_list[2]
mnist_small = path_data/'mnist_train_small.csv'
mnist_small.exists()
  1. Image transforms can degrade the quality of the data by introducing empty sections in the image (useless for learning) or applying interpolations as part of the transformation that are not part of the image. These interpolations are of lower quality and thus degrade image quality.

  2. After a DataLoader is created, fast.ai provides the d_loader.show_batch() method in order to see images in a data loader. If the parameter unique is set to True, it will show the augmentations done for a certain image in the data loader (training set).

  3. fast.ai provides the dataoader.summary(path) method to show a summary of the data loader for a specific path.

  4. No, models should be trained as soon as possible. This is so that a good baseline can be taken to see if a simple model is enough for your needs, and also to check if the data is not training the model (possibly due to dirty data). We can then look to see where the model is going wrong and fix these problems as part of the cleaning (as shown in lesson 2).

  5. The two parts that are combined in cross entropy loss in pytorch are log_softmax() which first applies the softmax function (e to the power of the current prediction divided by the sum of e to the power of each prediction) and then takes the log of these values. This gives numbers between negaitve infinity and zero (from below). The second part is nll_loss, which indexes the correct category and takes the negative such that the loss is now a single value between 0 (from above) and positive infinity.

Below is some sample code testing the loss function concepts

# create a tensor with 3 categories and standard deviation 3
test_tens = torch.randn(12)*3
sample_acts = test_tens.view(-1, 3)
sample_acts

# calculate the softmax of the tensor
softmax_acts = sample_acts.softmax(dim=1)

# take the log of the tensor
log_softmax_acts = softmax_acts.log()

# index using nll_loss
targets = tensor([0, 1, 2, 1])

cross_entropy_loss = -log_softmax_acts[range(sample_acts.shape[0]), targets]
cross_entropy_loss, cross_entropy_loss.mean()

# use cross entropy loss with pytorch
nn.CrossEntropyLoss(reduction='none')(sample_acts, targets), nn.CrossEntropyLoss()(sample_acts, targets)
  1. Two important properties that softmax ensure is that they all add up to 1 (the probailities add up to 100%) and one specific category is favoured over the rest due to the nature of exponential functions.

  2. You might not want your activation to want these two properties if the data for interence is not part of any of the categories (no label should activate) in training or if you have multiple labels as part of the classification.

  3. Calculate the exp and softmax columns of Figure 5-3 yourself (i.e., in a spreadsheet, with a calculator, or in a notebook).

outputs = tensor([0.02, -2.49, 1.25])

exp = torch.exp(outputs)
softmax = exp/exp.sum()
softmax
  1. Can can't use torch.where for loss func with >2 labels because torch.where can only be used to select between to labels in an if-else manner. With more than 2 categories, we must choose a specific category out of 3+ possible options which cannot be done with the .where function.

  2. log(-2) is not defined because the domain of log is real positive numbers (not including zero).

  3. The learning rate finder rule of thumb is to train batches with varying learning rates, starting with a very small learning rate on the first minibatch and increasing the learning rate until the loss increases instead of decreases. Then, to choose a correct leanring rate, we can take the minimum and divide it by 10 or choose the last point where the learning rate was decreasing (steepest downward slope).

  4. The fine_tune method first freezes the model and trains the new randomly added layer for one epoch (assuming transfer learning is being used), and then unfreezes the layers and trains them for the required number of epochs.

  5. Use ?? to get source code for function or method in a jupyter notebook

  6. Discriminative learning rates come from the idea that different parts of a model in transfer learning may require different learning rates, so different learning rates are used at different layer depths to account for this. For example, we may use a slice in pytorch to set the inner most 'simple features' in pytorch to a low max learning rate and set the outer 'complex features' to higher learning rates as they have not been trained so throughly on the new data from transfer learning.

  7. A python slice object is treated as a spead of values, with the inner most layer taking the lower bound of the slice, the outer most layer taking the upper bound of the slice and the layers using values that increment uniformly from the lower to the upper bound.

  8. Early stopping is a poor choice for one cycle training because the training may not have the chance to choose smaller learning late values which could provide more improvements, that are just being skipped by the early stopping callback. Instead, it is better to train the model for scratch and train for the number of epochs in which the previous best results.

  9. resnet 50 is only 50 layers deep while resnet101 is 101 layers deep.

  10. to_fp16 halves the decimal floating precision of the numbers to avoid CUDA out of memory errors and train faster with 16 less decimals (only keeps 16 deciamls rather than the standard 32). It also speeds up the training.