Skip to content

Initiating a paradigm shift in reporting and helping with making ML advances more considerate of sustainability and trustworthiness.

Notifications You must be signed in to change notification settings

raphischer/strep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

STREP - Sustainable and Trustworthy Reporting for ML

Software repository for more sustainable and trustworthy reporting of machine learning results - associated research paper published in Data Mining and Knowledge Discovery (open access). With our publicly available Exploration tool, you can investigate all results - no code needs to run on your machine!

Framework Overview

Note that we continue to advance our software - it is work in progress and subject to change, so you might encounter delays, off-times, and slight differences to our paper. Check out the paper branch for a frozen state of the software when the paper was published.

Explore your own databases

Instead of exploring the pre-assembled databases, you can also investigate your own custom results by following these steps:

  1. Prepare your database as a pandas DataFrame (each row lists one model performance result on some data set, with different measures as columns).
  2. Store it in a directory, optionally add some JSON meta information (check our databases folder for examples and follow these naming conventions).
  3. Clone the repo and install necessary libraries via pip install -r requirements.txt (tested on Python 3.10).
  4. Either run python main.py --custom path/to/database.pkl, or use the following code snippet:
from strep.index_scale import load_database, scale_and_rate
from strep.elex.app import Visualization

fname = 'path/to/your/database.pkl'
# load database and meta information (if available)
database, meta = load_database(fname)
# index-scale and rate database
rated_database = scale_and_rate(database, meta)
# start the interactive exploration tool
app = Visualization(rated_database)
app.run_server()

News and Release History

  • 13 January 2025 - Many fixes, (re-)added Papers with Code and EdgeAccUSB databases
  • 2 October 2024 - Greatly improved index scaling (x15 speed), added / updated result databases from MetaQuRe and AutoXPCR (Forecasting)
  • 11 September 2024 - Presented our paper and repository at ECML-PKDD '24
  • 16 August 2024 - Merged a lot of functionality that was developed for other works
  • 30 April 2024 - paper published in Data Mining and Knowledge Discovery (open access), alongside the initial verison of this repository

Contributing

We firmly believe that reporting in a more sustainable and trustworthy fashion is a community effort. If you perform large-scale benchmark experiments, stress-test a lot of models or have any other important things to report - get in touch! Our contact info is given in our papers. We would love to showcase other resource-aware reports here. If you send us your own performance databases, we will gladly add them and highlight your work as a significant contribution.

Current available databases:

Citing

If you appreciate our work and code, please cite our paper as given by Springer:

Fischer, R., Liebig, T. & Morik, K. Towards more sustainable and trustworthy reporting in machine learning. Data Min Knowl Disc 38, 1909–1928 (2024). https://doi.org/10.1007/s10618-024-01020-3

or using the bibkey below:

@article{fischer_towards_2024,
	title = {Towards more sustainable and trustworthy reporting in machine learning},
	volume = {38},
	issn = {1573-756X},
	url = {https://doi.org/10.1007/s10618-024-01020-3},
	doi = {10.1007/s10618-024-01020-3},
	number = {4},
	journal = {Data Mining and Knowledge Discovery},
	author = {Fischer, Raphael and Liebig, Thomas and Morik, Katharina},
	month = jul,
	year = {2024},
	pages = {1909--1928},
}

Repository Structure

  • databases contain different gathered evaluation databases of ML reports, including scripts to assemble some of them.
  • strep contains code that processes the databases, calculates index values and compound scores, and visualizes them.
  • materials contains some figures used in the central paper
  • the top-level main.py script is for running the application locally, while deploy_on_render.py is used for the website (using render)

Terms of Use

Copyright (c) 2025 Raphael Fischer

About

Initiating a paradigm shift in reporting and helping with making ML advances more considerate of sustainability and trustworthiness.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages