Paper: Data Augmentation for Improving Emotion Recognition in Software Engineering Communication

In the following, we briefly describe the different components that are included in this project and the software required to run the experiments.

Of the three tools we used, two of them are open-source and available here:

EMTk: https://github.com/collab-uniba/Emotion_and_Polarity_SO
SEntiMoji: https://github.com/SEntiMoji/SEntiMoji

You need to follow their instructions to run the models. Now we describe our code base and datasets:

Project Structure

The project includes the following files and folders:

/dataset: A folder that contains inputs that are used for the experiments.
- /experiment_dataset: A folder that contains annotated dataset, train set, and test set.
  - annotation-set.csv: CSV file that contains 2000 annotated GitHub instances.
  - ann-train.csv: Train subset of annotation data.
  - ann-test.csv: Test subset of annotation data.
- /Augmented: A folder that contains augmented dataset.
  - bart-unconstrained.csv: contains Unconstrained augmented dataset.
  - bart-lexicon.csv: contains Lexicon based augmented dataset.
  - bart-polarity.csv: contains Polarity based augmented dataset.
/crawlers: A folder that contains the scripts we have used for data crawling.
- githubcrawler.py: script for GitHub crawling.
/data_preprocessing: A folder that contains data preprocessing steps described in the paper section 3.2
- data_cleaner.py: contains implementation of the various filtering such as code filtering, url filtering, stacktrace removing, etc.
- github_modifier.py: contains implementation of dataset modifying based on functions implementing in data_cleaner.py.
/data_augmentation: A folder that contains data augmentation related scripts.
- data_augmenter-unconstrained.py: contains implementation of Unconstrained strategy.
- data_augmenter-lexicon.py: contains implementation of Lexicon strategy.
- data_augmenter-polarity.py: contains implementation of Polarity strategy.
/results: A folder that contains the results of all experiments for all tools.
Annotation Instructions.docx: contains the annotation instructions that the annotators used.
requriments.txt: contains the The python libraries used in this experiment.

Setup

setup virtual environment and activate it
`pip install -r requirements.txt'

Running Experiments

Running githubcrawler.py:

python githubcrawler.py --github_token GITHUB_TOKEN \ --type TYPE \ --repo_name REPO_NAME \ --output_path OUTPUT_PATH

Running github_modifier.py:

python github_modifier.py --input_path INPUT_PATH \ --output_path OUTPUT_PATH

Running data augmentations:

python data_augmenter-unconstrained.py --input_file INPUT_PATH \ --output_file OUTPUT_PATH \ --model_path facebook/bart-base

python data_augmenter-lexicon.py --input_file INPUT_PATH \ --output_file OUTPUT_PATH \ --model_path facebook/bart-base

input_file for data_augmenter-lexicon.py should be the output_file of data_augmenter-unconstrained.py

python data_augmenter-polarity.py --input_file INPUT_PATH \ --output_file OUTPUT_PATH \ --model_path facebook/bart-base

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Paper: Data Augmentation for Improving Emotion Recognition in Software Engineering Communication

Project Structure

Setup

Running Experiments

Running githubcrawler.py:

Running github_modifier.py:

Running data augmentations:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
crawlers		crawlers
data_augmentation		data_augmentation
data_preprocessing		data_preprocessing
dataset		dataset
results		results
Annotation Instructions.docx		Annotation Instructions.docx
README.md		README.md
requirements.txt		requirements.txt

vcu-swim-lab/SE-Emotion-Study

Folders and files

Latest commit

History

Repository files navigation

Paper: Data Augmentation for Improving Emotion Recognition in Software Engineering Communication

Project Structure

Setup

Running Experiments

Running githubcrawler.py:

Running github_modifier.py:

Running data augmentations:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages