We provide the following models to provide an initial baseline score on this Text Similarity task. All results obtained here are reproducible by running each evaluation as detailed below.
Here are the results after running each model 10 times and averaging scores.
Model | # of parameters | Dev-set Pearson | Dev-set Spearman | Test-set Pearson | Test-set Spearman |
---|---|---|---|---|---|
RNN | 16.7M | 0.7342 | 0.7349 | 0.6744 | 0.6662 |
Romanian BERT v1 (uncased) | 124M | 0.8453 | 0.8417 | 0.8156 | 0.8075 |
Romanian BERT v1 (cased) | 124M | 0.8477 | 0.8447 | 0.7985 | 0.7897 |
Multilingual BERT (uncased) | 167M | 0.8237 | 0.8235 | 0.7690 | 0.7650 |
Multilingual BERT (cased) | 167M | 0.8071 | 0.8077 | 0.7664 | 0.7641 |
Distill BERT (cased) | 81M | 0.7737 | 0.7726 | 0.7253 | 0.7167 |
Readerbench/RoGPT-base | 124M | 0.8210 | 0.8185 | 0.7848 | 0.7729 |
Readerbench/RoGPT-medium | 354M | 0.8400 | 0.8387 | 0.7954 | 0.7867 |
xlm-roberta-base | 278M | 0.8215 | 0.8233 | 0.7783 | 0.7756 |
If you would like to run these models yourself please do a pip install -r requirements.txt
from this folder.
The transformer model encodes each sentence separately, then does a mean-pooling of the output token vectors. The two resulting sentence representations are compared using the cosine sim score. The 0-5 interval was reduced to 0-1 (to be compatible with the cosine), and the MSE loss was used. This is the standard recipe for STS/NLI tasks, and we provide it here as a baseline.
The transformer baseline was run with the Romanian BERT (--model_name dumitrescustefan/bert-base-romanian-uncased-v1
) and with the multilingual BERT (--model_name bert-base-multilingual-uncased
). Both the cased and uncased versions of each transformer were used for comparison.
Run the model as python transformer_model.py
with the appropriate --model_name
parameter to obtain the results in the table above. Use --help
to see available parameters that you can change. For example, to run a single iteration, use --experiment_iterations 1
.
The RNN model uses bidirectional stacked LSTMs to encode each sentence. Each sentence representation is passed through an attention layer. Similarly to the transformer model, the cosine similarity function is used, as well as the same MSE loss. Please note that while the transformer model comes with its own tokenizer/vocab, here we need to create it first, and this is exactly what this model will do if it is not already created. It creates a WordPiece tokenizer based on the full dataset.
Run the model as python rnn_model.py
without any parameters to obtain the results in the table above. Use --help
to see available parameters that you can change. For example, to run this model on a CPU use --gpus 0
, as by default, the model will attempt to use one GPU.