Skip to content

Repository for SloBench evaluation docker images

Notifications You must be signed in to change notification settings

clarinsi/slobench-eval-docker

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SloBENCH official evaluation scripts

This is an accompanying repository that contains evaluation scripts used by the evaluation leaderboards in SloBENCH tool - https://slobench.cjvt.si.

Submission evaluation methodology.

SloBENCH tool expects a user to upload a submission.zip file that contains contents according to the rules of a specific leaderboard.

Note: Zip file must not contain any other files, not expected by the system (e.g., __MACOSX). To be sure your submission is sound, you may use zip command from command line - for example: zip submission.zip ./*.txt

Uploaded submission file is automatically extracted along with the reference_dataset.zip file.

The run.py script unzips the ground truth data into the /data-reference path and the submitted data into /data-submission.

Then it runs the task's corresponding eval.py evaluation script, which compares the contents of the previously mentioned paths and returns a dictionary of metricName:metricScore pairs, for example:

{
    'overall': 88.2,
    'metric1': 100.0,
    'metric2': 32.1,
    'metric3': 123.33
}

Script run.py returns evaluation results (or possibly an error) along with some running metadata into a Task Submission Evaluation Object, which is passed to the SloBENCH Web Server.

Leaderboard evaluation

Each evaluation script is packaged into its own container - see specific Dockerfiles for your target leaderboard.

When a new task evaluation script gets added to this repo, the a docker container gets composed and pushed to the slobench/eval:[TASK-NAME] docker repo.

Compiling and running an evaluation locally

Build docker image from root directory of this repository, cloned to your machine, as follows:

docker buildx build --platform linux/amd64 -t eval:TASK_NAME -f evaluation_scripts/TASK_NAME/Dockerfile .

Test your evaluation as follows:

docker run -it --name eval-container --rm \
-v $PWD/DATA_WITH_LABELS.zip:/ground_truth.zip \
-v $PWD/YOUR_SYSTEM_OUTPUT_DATA.zip:/submission.zip \
eval:TASK_NAME ground_truth.zip submission.zip

Change TASK-NAME accordingly and provide paths to your samples of ground truth/reference data (i.e., DATA_WITH_LABELS.zip) and your system output (i.e., YOUR_SYSTEM_OUTPUT_DATA.zip) for the reference data.

For more information check README file of selected leaderboard.

Pushing an image to DockerHub

This repository is accompanied with DockerHub repository https://hub.docker.com/r/slobench/eval. Images are pushed from local builds using the following commands:

docker login
docker tag eval:eval_TASK-NAME slobench/eval:TASK-NAME_VERSION
docker push slobench/eval:TASK-NAME_VERSION

Currently supported tasks

This repository supports the following tasks:

  • eval_question_answering: Evaluation of selected SuperGLUE-like QA tasks.
  • eval_sequence_tagging_conllu: General CONLLU-based evaluation tasks.
  • eval_sequence_tagging_tab: General sequence labelling evaluation tasks.
  • eval_conll2002: CoNLL 2002 NER evaluation.
  • eval_summarization: Text summarization evaluation.
  • eval_translation_en: Machine translation evaluation (English target).
  • eval_translation_sl: Machine translation evaluation (Slovene target).
  • eval_sequence_pair_classification: Sequence pair classification evaluation.
  • eval_speech_recognition: Automated speech recognition evaluation.

SloBENCH tool was developed as an [Clarin.si 2021 project](https://www.clarin.si/info/storitve/projekti).

About

Repository for SloBench evaluation docker images

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Perl 54.2%
  • Python 44.7%
  • Dockerfile 1.1%