Skip to content

Latest commit

 

History

History
executable file
·
35 lines (23 loc) · 1018 Bytes

README.md

File metadata and controls

executable file
·
35 lines (23 loc) · 1018 Bytes

Build Status

BERT text classification SST2 using PyTorch

We train Stanford Sentiment Treebank - 2 (SST2) using BERT

Dataset

We use the Stanford Sentiment Treebank - 2

Setting up locally

  1. Install python 3.7.4

  2. Set up requirements.

    pip install -r tests/requirements.txt
  3. Verify set up

    export PYTHONPATH=./src
    pytest

SST2

  1. Preprocess data to split data into train , test and val sample files and save them to processdata directory

    export PYTHONPATH=src
    datadir=tmp
    
    python src/utils/sst2_split_utils.py --sentencefile $datadir/datasetSentences.txt  --sentiment $datadir/sentiment_labels.txt  --dictionary $datadir/dictionary.txt --split $datadir/datasetSplit.txt --outdir processdata