-
-
Notifications
You must be signed in to change notification settings - Fork 13
Current Design
Currently the project comprises of:
- No front-end
- A sentiment analyser in tensorflow -> a simple Neural Net -> Binary text classification
- Trained on a dataset of 20,000 troll tweets
- Back-end flask with one API (/predict)
TBC
This step cleans the text, removes punctuation, unicode etc.
A ranked dictionary is used in tensorflow to map a word to an embedding, where the lowest number is the most frequent word in the corpus
This step enusres that every tensor going into the model is the same size
Neural nets deal with numbers, so numbers we shall create. This is done by mapping the words to their position in word vector space (WVS). You can think of the WVS as a statistical representation of the corpus in space.
At this step the model is trained using the training data.
Using the History object in Keras,a record of training loss values and metrics values at successive epochs, as well as validation loss values and validation metrics values can be used to evaluate the model.
Hyperparameters are all the knobs and buttons that you can twist and turn to make your model work. This can be done either by trial and error, grid search, random search or bayes optimization.
I am currently trying a random search using hyperas, a wrapper for hyperopt which is a Distributed Asynchronous Hyperparameter Optimization in Python - thanks maxpumperla!
There is one API currently which will make a prediction based on the sentence that is supplied as a query parameter in the browser. The API then loads in a saved model and makes an analysis of the harassing nature of the sentence. If it is abusive "That's not very nice." is returned, otherwise it's "Ooo aren't you sweet.".