Skip to content

Latest commit

 

History

History
104 lines (69 loc) · 4.23 KB

README.MD

File metadata and controls

104 lines (69 loc) · 4.23 KB

SPOLIN

CC BY-NC 4.0

SPOLIN: Spontaneanation Pairs of Learnable ImprovisatioN

This is the repo for the paper "Grounding Conversations with Improvised Dialogues" (ACL2020). SPOLIN is a collection of more than 68,000 "Yes, and" type dialogue pairs extracted from the Spontaneanation podcast by Paul F. Tompkins, the Cornell Movie-Dialogs Corpus, and the SubTle corpus. For more information, refer to our paper or our project page.

Available SPOLIN versions:

The core dataset that was used for the experiments in the paper only includes yes-ands and non-yes-ands from Spontaneanation and most of what is provided in those extracted from the Cornell Movie-Dialogs Corpus. After the submitting the paper, we continued our iterative data augmentation process, repeating another iteration with the Cornell Movie-Dialogs Corpus and extracting from the SubTle corpus. This expanded version is also included in this repository here. This latest version of SPOLIN was used to train the model used in our demo.

In the data folder, we provide two versions of the SPOLIN training set:

  1. Version used for experiments in the ACL paper: data/spolin-train-acl.json
  2. Expanded version: data/spolin-train.json

SPOLIN is available via:

Update (4/28/2020):

We make our yes-and classifier from our last iteration that filters out self-_yes-and_s and fine-tuned DialoGPT models available:

For instructions and details on training or inferencing with these models, refer to the READMEs in each respective folder. Please raise an issue if there are any problems with the links and the script for using these models.

Relevant links:

Latest SPOLIN:

yesands non-yesands
Spontaneanation 10,959 6,087*
Cornell 16,926 18,810
SubTle 40,303 19,512
Total 68,188 44,409

*Artificially collected by mix & matching positive Spontaneanation samples to balance dataset for training classifier

data/spolin-train.json data/spolin-valid.json
yesands non-yesands
Spontaneanation 10,459 5,587*
Cornell 16,426 18,310
SubTle 40,303 19,512
Total 67,188 43,409
yesands non-yesands
Spontaneanation 500 500
Cornell 500 500
Total 1,000 1,000

ACL Presentation

Video recording

Citation

If you use data or code in this repository, please cite our ACL2020 paper:

@inproceedings{cho2020spolin,
    title={Grounding Conversations with Improvised Dialogues},
    author={Cho, Hyundong and May, Jonathan},
    booktitle ={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
    publisher = {Association for Computational Linguistics}, 
    location =  {Seattle, Washington, USA},
    year={2020}
}  

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC BY-NC 4.0