Files

.circleci
Algorithms_and_Hardness_for_Learning_Linear_Thresholds_from_Label_Proportions
CIQA
Domain_Agnostic_Contrastive_Representations_for_Learning_from_Label_Proportions
KNF
On_Combining_Bags_to_Better_Learn_from_Label_Proportions
STraTA
aav
abps
abstract_nas
action_angle_networks
action_gap_rl
activation_clustering
active_selective_prediction
adaptive_learning_rate_tuner
adaptive_prediction
adaptive_surrogates
adversarial_nets_lr_scheduler
after_kernel
al_for_fep
albert
algae_dice
aloe
alx
amortized_bo
anthea
aptamers_mlpd
aqt
aquadem
arxiv_latex_cleaner
assemblenet
assessment_plan_modeling
attentional_adapters
attribution
automatic_structured_vi
automl_zero
autoregressive_diffusion
aux_tasks
axial
bam
bangbang_qaoa
basisnet
batch_science
behavior_regularized_offline_rl
bertseq2seq
better_storylines
- configs
- scripts
- src
- README.md
- requirements.datagen.txt
- requirements.train.txt
bigg
bigger_better_faster
bisimulation_aaai2020
bitempered_loss
blur
bnn_hmc
bonus_based_exploration
building_detection
business_metric_aware_forecasting
bustle
c_learning
cache_replacement
caltrain
capsule_em
caql
cascaded_networks
cate
cbertscore
cell_embedder
cell_mixer
cfq
cfq_pt_vs_sa
charformer
ciw_label_noise
class_balanced_distillation
clay
cluster_gcn
clustering_normalized_cuts
cnn_quantization
cochlear_implant
code_as_policies
codistillation
cognate_inpaint_neighbors
coherent_gradients
cola
cold_posterior_bnn
cold_posterior_flax
collocated_irradiance_network
coltran
combiner
comisr
compgen_d2t
compositional_rl
compositional_transformers
concept_explanations
concept_marl
conqur
constrained_language_typology
contrack
contrails
contrastive_rl
coref_mt5
correct_batch_effects_wdn
correlated_compression
correlation_clustering
covid_epidemiology
cube_unfoldings
cubert
cvl_public
d3pm
dac
darc
data_free_distillation
data_selection
dataset_or_not
dble
ddpm_w_distillation
deciphering_clinical_abbreviations
dedal
deep_homography
deep_representation_one_class
demogen
dense_representations_for_entity_retrieval
deplot
depth_and_motion_learning
depth_from_video_in_the_wild
design_bipartite_experiments
dialogue_ope
dichotomy_of_control
dictionary_learning
didi_dataset
differentiable_data_selection
differentially_private_gnns
diffusion_distillation
dimensions_of_motion
direction_net
disarm
distracting_control
distribution_embedding_networks
dnn_predict_accuracy
do_wide_and_deep_networks_learn_the_same_things
docent
domain_conditional_predictors
dot_vs_learned_similarity
dp_multiq
dp_regression
dp_topk
dp_transfer
dql_grasping
dreamfields
dreg_estimators
drfact
drops
dselect_k_moe
dual_dice
dual_pixels
dvrl
ebp
editable_graph_temporal
eeg_modelling
eim
eli5_retrieval_large_lm
enas_lm
entropy_semiring
es_enas
es_maml
es_optimization
etcmodel
etcsum
euphonia_spice
evanet
evolution
experience_replay
explaining_risk_increase
extreme_memorization
f_divergence_estimation_ram_mc
f_net
factorize_a_city
factors_of_influence
fair_submodular_matroid
fair_submodular_maximization_2020
fair_survival_analysis
fairness_and_bias_in_online_selection
fairness_teaching
fast_k_means_2020
fastconvnets
fat
federated_vision_datasets
felix
findit
fisher_brc
flare_removal
flax_models
floatseg
flood_forecasting
frechet_audio_distance
frechet_video_distance
frequency_analysis
frmt
frost
fully_dynamic_facility_location
fully_dynamic_submodular_maximization
func_dist
fwl
gaternet
ged_tts
gen_patch_neural_rendering
generalization_representations_rl_aistats22
generalized_rates
generative_trees
genomics_ood
gfsa
ghum
gift
gigamol
goemotions
gon
gradient_based_tuning
graph_compression
graph_embedding
graph_sampler
graph_temporal_ai
grbm
group_agnostic_fairness
grouptesting
grow_bert
gumbel_max_causal_gadgets
gwikimatch
hal
hct
hierarchical_foresight
hipi
hist_thresh
hitnet
hmc_swindles
homophonous_logography
hspace
human_attention
human_object_interaction
hybrid_zero_dynamics
hyperbolic
hyperbolic_discount
hypertransformer
ials
icetea
ieg
igt_optimizer
ime
imghum
implicit_constrained_optimization
implicit_pdf
incontext
incremental_gain
inerf
infinite_nature
infinite_nature_zero
infinite_uncertainty
intent_recognition
interactive_cbms
interpretability_benchmark
invariant_explanations
invariant_slot_attention
investigating_m4
ipagnn
isl
isolating_factors
jax_dft
jax_mpc
jax_particles
jaxbarf
jaxnerf
jaxraytrace
jaxsel
jaxstronomy
jrl
jslm
keypose
kip
kobe
kws_streaming
l2da
l2tl
label_bias
lamp
large_margin
large_scale_voting
lasagna_mt
latent_programmer
latent_shift_adaptation
layout-blt
learn_to_forget
learn_to_infer
learning_parameter_allocation
learning_with_little_mixing
learnreg
ledge
lego
light_field_neural_rendering
lighthouse
linear_dynamical_systems
linear_eval
linear_identifiability
linear_vae
lista_design_space
llm4mobile
lm_fact_tracing
lm_memorization
local_forward_gradient
locoprop
logic_inference_dataset
logit_adjustment
loss_functions_transfer
low_rank_local_connectivity
m_layer
m_theory
many_constraints
mave
mbpp
meena
memento
memory_efficient_attention
menger_rl
mentormix
meta_augmentation
meta_learning_without_memorization
meta_pseudo_labels
meta_reward_learning
metapose
mico
micronet_challenge
microscope_image_quality
milking_cowmask
minigrid_basics
misinfo_provenance
missing_link
ml_debiaser
mobilebert
model_pruning
moe_models_implicit_bias
moe_mtl
moew
mol_dqn
moment_advice
motion_blur
mpi_extrapolation
mqm_viewer
muNet
mucped22
multi_annotator
multi_game_dt
multi_resolution_rec
multimodalchat
multiple_user_representations
munchausen_rl
musiq
mutual_information_representation_learning
muzero
ncsnv3
negative_cache
nerflets
nested_rhat
neural_additive_models
neural_guided_symbolic_regression
neutra
nf_diffusion
ngrammer
nigt_optimizer
nngp_nas
non_decomp
non_semantic_speech_benchmark
nopad_inception_v3_fcn
norml
npy_array
numbert
occluder_recovery
offline_online_bandits
omnimatte3D
online_belief_propagation
online_correlation_clustering
openscene
opt_list
optimizing_interpretability
osf
pair_ngram
pairwise_fairness
pali
parallel_clustering
pde_preconditioner
performer
persistent-nature
persistent_es
perso_arabic_norm
perturbations
pgdl
playrooms
poem
policy_eval
polish
poly_kernel_sketch
pretrained_conv
prime
primer
privacy_poison
private_covariance_estimation
private_sampling
private_text_transformers
procedure_cloning
property_linking
protein_lm
protnlm
protoattend
protseq
proxy_rewards
pruning_identified_exemplars
pse
psyborgs
psycholab
ptopk_patch_selection
pvn
pwil
q_match
qanet
quantum_sample_learning
r4r
rank_ckpt
rankgen
rankt5
ravens
rcc_algorithms
rce
re_identification_risk
readtwice
realformer
recs_ecosystem_creator_rl
recursive_optimizer
red-ace
regnerf
rembert
remote_sensing_representations
repnet
representation_batch_rl
representation_clustering
representation_similarity
reset_free_learning
resolve_ref_exp_elements_ml
restarting_FOM_for_LP
revisiting_neural_scaling_laws
rise
rl4circopt
rl_metrics_aaai2021
rl_repr
rllim
robust_count_sketch
robust_loss
robust_loss_jax
robust_optim
robust_retrieval
rouge
routing_transformer
rpc
rrlfd
rs_gnn
saccader
saf
sail_rl
saycan
scalable_shampoo
scaling_transformer_inference_efficiency
scaling_transformers
scann
schema_guided_dst
schptm_benchmark
score_prior
scouts_ml_model_env
screen2words
scrna_benchmark
seq2act
sequential_attention
sgk
shortcut_testing
sign_language_detection
simpdom
simple_probabilistic_programming
simulation_research
single_view_mpi
sketching
sliding_window_clustering
slot_attention
sm3
smart_eval
smith
smu
smug_saliency
smurf
snerg
snlds
sobolev
social_rl
socraticmodels
soft_sort
soft_topk
soil_moisture_retrieval
solver1d
sorb
spaceopt
sparse_data
sparse_mixers
sparse_soft_topk
special_orthogonalization
specinvert
spectral_bias
speech_embedding
spelling_convention_nlm
spin_spherical_cnns
spreadsheet_coder
squiggles
stable_transfer
stacked_capsule_autoencoders
standalone_self_attention_in_vision_models
star_cfq
state_of_sparsity
stochastic_to_deterministic
storm_optimizer
strategic_exploration
stream_s2s
streetview_contrails_dataset
structformer
structured_multihashing
student_mentor_dataset_cleaning
subclass_distillation
sufficient_input_subsets
summae
supcon
supervised_pixel_contrastive_loss
symbolic_functionals
t5_closed_book_qa
tabnet
tag
talk_about_random_splits
taperception
task_set
task_specific_learned_opt
tcc
tf3d
tf_trees
tft
time_varying_optimization
tiny_video_nets
topological_transformer
towards_gan_benchmarks
trainable_grids
transformer_modifications
trimap
truss_decomposition
tunas
uflow
ugif
ul2
uncertainties
understanding_convolutions_on_graphs
universal_embedding_challenge
unprocessing
uq_benchmark_2019
using_dl_to_annotate_protein_universe
vae_ood
value_dice
value_function_polytope
vatt
vbmi
vct
vdvae_flax
video_structure
vila
visual_relationship
vmsst
vrdu
warmstart_graphcut_image_segmentation
weak_disentangle
widget-caption
widget_caption
wiki_split_bleu_eval
wildfire_perc_sim
wt5
xirl
yeast_transcription_network
yoto
zebraix
.gitignore
CONTRIBUTING.md
LICENSE
README.md
__init__.py
compile_protos.sh

better_storylines

Name		Name	Last commit message	Last commit date
parent directory ..
configs		configs
scripts		scripts
src		src
README.md		README.md
requirements.datagen.txt		requirements.datagen.txt
requirements.train.txt		requirements.train.txt

README.md

Code for "Toward Better Storylines with Sentence-Level Language Models"

This code reproduces the experiments on ROC Stories in the ACL paper Toward Better Storylines with Sentence-Level Language Models. It contains scripts to download the checkpoints used to reproduce the accuracy numbers in Tables 1 and 2 as well as Figure 1 of the paper.

Setup

The training and eval code is Python 3 and uses Tensorflow 2. Install all needed dependencies into a virtual environment with:

python3 -m venv pyenv_tf2
source pyenv_tf2/bin/activate
pip install --upgrade pip
pip3 install -R requirements.train.txt

Building or Downloading the Dataset

The dataset consists of the mean BERT embedding for each sentence in the ROC Stories dataset. These are stored as a TFDS dataset.

Since computing embeddings for ~400k sentences can be slow, we have made the mean BERT embedding dataset used in the paper available for download.

To download the dataset run the following commands from the base directory of this repository:

wget https://storage.googleapis.com/gresearch/better_storylines/roc_stories_embeddings.zip
mkdir tfds_datasets
unzip roc_stories_embeddings.zip -d tfds_datasets
rm roc_stories_embeddings.zip

Optionally, to generate the dataset from scratch run the following command from the base directory of this repository. If you are running locally (without Apache Beam) it could take a long time:

python3 -m venv pyenv_tf1
source pyenv_tf1/bin/activate
pip install --upgrade pip
pip install -R requirements.datagen.txt
# The following line is needed only if you'd like to do frequency-weighted embs.
!wget https://storage.googleapis.com/gresearch/better_storylines/vocab_frequencies
sh scripts/build_tfds_dataset.sh

Available Checkpoints

The following pre-trained checkpoints are available for download.

Checkpoint Name	Link	Description
`mlp_best_largescale_cl`	link	Best MLP checkpoint for largescale ranking task (CSLoss).
`mlp_best_largescale_nocl`	link	Best MLP checkpoint for largescale ranking task (no CSLoss).
`mlp_best_story_cloze_cl`	link	Best MLP checkpoint for Story Cloze task. (CSLoss)
`mlp_best_story_cloze_nocl`	link	Best MLP checkpoint for Story Cloze task. (no CSLoss)
`resmlp_best_largescale_cl`	link	Best residual MLP checkpoint for largescale ranking task. (CSLoss)
`resmlp_best_largescale_nocl`	link	Best residual MLP checkpoint for largescale ranking task. (no CSLoss)
`resmlp_best_story_cloze_cl`	link	Best residual MLP checkpoint for Story Cloze ranking task. (CSLoss)
`resmlp_best_story_cloze_nocl`	link	Best residual MLP checkpoint for Story Cloze task. (no CSLoss)

Running Evaluation

Evaluation on Story Cloze task

First download a checkpoint to evaluate:

wget https://storage.googleapis.com/gresearch/better_storylines/mlp_best_largescale_cl.zip
mkdir trained_models
unzip mlp_best_largescale_cl.zip -d trained_models
rm mlp_best_largescale_cl.zip

The following script evaluates all checkpoints in the provided directory. Validation accuracy for each checkpoint is outputted into a CSV in the file all_metrics.csv. You should run this script before running any of the other eval scripts since all_metrics.csv is used by the other script to select a checkpoint.

sh scripts/evaluate_all_checkpoints.sh trained_models/mlp_best_largescale_cl

The following script outputs the accuracy of the best checkpoint in the provided directory on each Story Cloze 2016 test set. (The 2018 test set can only be evaluated on through submissions to the CodaLab leaderboard.

sh scripts/evaluate_best_story_cloze_test.sh trained_models/mlp_best_largescale_cl

Evaluation on large-scale reranking task

The following script outputs the accuracy and MRR of the best checkpoint in the provided directory on the largescale reranking task.

sh scripts/evaluate_ranking_task.sh trained_models/mlp_best_largescale_cl

Qualiative evaluation of large-scale reranking

To do qualitative eval you will first need to download the CSVs for the validation and train sets, which can be requested from the ROC Stories website. The following script outputs the highest-scoring next sentences on the largescale reranking task.

sh scripts/evaluate_ranking_qualitative.sh path/to/rocstories/csvs trained_models/mlp_best_largescale_cl

Training from scratch

The following script launches training for the residual model.

sh scripts/train_residual.sh

Paper Citation

@inproceedings{ippolito2020toward,
  title={Toward Better Storylines with Sentence-Level Language Models},
  author={Ippolito, Daphne and Grangier, David and Eck, Douglas and Callison-Burch, Chris},
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  year={2020}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

better_storylines

better_storylines

README.md

Code for "Toward Better Storylines with Sentence-Level Language Models"

Setup

Building or Downloading the Dataset

Available Checkpoints

Running Evaluation

Evaluation on Story Cloze task

Evaluation on large-scale reranking task

Qualiative evaluation of large-scale reranking

Training from scratch

Paper Citation

Files

better_storylines

Directory actions

More options

Directory actions

More options

Latest commit

History

better_storylines

Folders and files

parent directory

README.md

Code for "Toward Better Storylines with Sentence-Level Language Models"

Setup

Building or Downloading the Dataset

Available Checkpoints

Running Evaluation

Evaluation on Story Cloze task

Evaluation on large-scale reranking task

Qualiative evaluation of large-scale reranking

Training from scratch

Paper Citation