Skip to content
This repository was archived by the owner on Nov 21, 2022. It is now read-only.

Current Design

malteserteresa edited this page Apr 30, 2019 · 3 revisions

Currently the project comprises of:

  • No front-end
  • A sentiment analyser in tensorflow -> a simple Neural Net -> Binary text classification
  • Trained on a dataset of 20,000 troll tweets
  • Back-end flask with one API (/predict)

Front-End: Browser Extension

TBC

Model: Sentiment Analyser in tensorflow

Step 1 - Normalization

This step cleans the text, removes punctuation, unicode etc.

Step 2 - Create a dictionary

A ranked dictionary is used in tensorflow to map a word to an embedding, where the lowest number is the most frequent word in the corpus

Step 3 - Padding

This step enusres that every tensor going into the model is the same size

Step 4 - Embedding

Neural nets deal with numbers, so numbers we shall create. This is done by mapping the words to their position in word vector space (WVS). You can think of the WVS as a statistical representation of the corpus in space.

Step 5 - Fit

At this step the model is trained using the training data.

Step 6 - Evaluate

Using the History object in Keras,a record of training loss values and metrics values at successive epochs, as well as validation loss values and validation metrics values can be used to evaluate the model.

Step 7 - Predict

Step 8 Hyperparameter Tuning

Hyperparameters are all the knobs and buttons that you can twist and turn to make your model work. This can be done either by trial and error, grid search, random search or bayes optimization.

I am currently trying a random search using hyperas, a wrapper for hyperopt which is a Distributed Asynchronous Hyperparameter Optimization in Python - thanks maxpumperla!

Back-end: RESTful API

There is one API currently which will make a prediction based on the sentence that is supplied as a query parameter in the browser. The API then loads in a saved model and makes an analysis of the harassing nature of the sentence. If it is abusive "That's not very nice." is returned, otherwise it's "Ooo aren't you sweet.".