SPOLIN

SPOLIN: Spontaneanation Pairs of Learnable ImprovisatioN

This is the repo for the paper "Grounding Conversations with Improvised Dialogues" (ACL2020). SPOLIN is a collection of more than 68,000 "Yes, and" type dialogue pairs extracted from the Spontaneanation podcast by Paul F. Tompkins, the Cornell Movie-Dialogs Corpus, and the SubTle corpus. For more information, refer to our paper or our project page.

Available SPOLIN versions:

The core dataset that was used for the experiments in the paper only includes yes-ands and non-yes-ands from Spontaneanation and most of what is provided in those extracted from the Cornell Movie-Dialogs Corpus. After the submitting the paper, we continued our iterative data augmentation process, repeating another iteration with the Cornell Movie-Dialogs Corpus and extracting from the SubTle corpus. This expanded version is also included in this repository here. This latest version of SPOLIN was used to train the model used in our demo.

In the data folder, we provide two versions of the SPOLIN training set:

Version used for experiments in the ACL paper: data/spolin-train-acl.json
Expanded version: data/spolin-train.json

SPOLIN is available via:

Update (4/28/2020):

We make our yes-and classifier from our last iteration that filters out self-_yes-and_s and fine-tuned DialoGPT models available:

Yes-and classifier
Fine-tuned GPT-2 model weights
Reverse GPT-2 model weights (from DialoGPT repo): make sure to rename small_reverse.pkl to medium_reverse.pkl for using with the script files in this repo.

For instructions and details on training or inferencing with these models, refer to the READMEs in each respective folder. Please raise an issue if there are any problems with the links and the script for using these models.

Relevant links:

Project page: https://justin-cho.com/spolin
Demo: https://spolin.isi.edu
Paper: https://arxiv.org/abs/2004.09544

Latest SPOLIN:

	yesands	non-yesands
Spontaneanation	10,959	6,087*
Cornell	16,926	18,810
SubTle	40,303	19,512
Total	68,188	44,409

*Artificially collected by mix & matching positive Spontaneanation samples to balance dataset for training classifier

data/spolin-train.json

data/spolin-valid.json

	yesands	non-yesands
Spontaneanation	10,459	5,587*
Cornell	16,426	18,310
SubTle	40,303	19,512
Total	67,188	43,409

	yesands	non-yesands
Spontaneanation	500	500
Cornell	500	500
Total	1,000	1,000

ACL Presentation

Video recording

Citation

If you use data or code in this repository, please cite our ACL2020 paper:

@inproceedings{cho2020spolin,
    title={Grounding Conversations with Improvised Dialogues},
    author={Cho, Hyundong and May, Jonathan},
    booktitle ={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
    publisher = {Association for Computational Linguistics}, 
    location =  {Seattle, Washington, USA},
    year={2020}
}

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
convokit_spolin		convokit_spolin
data		data
src-finetuned-dialogpt2		src-finetuned-dialogpt2
src-yes-and-classifier		src-yes-and-classifier
.gitignore		.gitignore
README.MD		README.MD
spolin_json_to_csv.py		spolin_json_to_csv.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPOLIN

SPOLIN: Spontaneanation Pairs of Learnable ImprovisatioN

Available SPOLIN versions:

Update (4/28/2020):

Relevant links:

Latest SPOLIN:

ACL Presentation

Citation

License

About

Releases

Packages

Contributors 2

Languages

wise-east/spolin

Folders and files

Latest commit

History

Repository files navigation

SPOLIN

SPOLIN: Spontaneanation Pairs of Learnable ImprovisatioN

Available SPOLIN versions:

Update (4/28/2020):

Relevant links:

Latest SPOLIN:

ACL Presentation

Citation

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages