Skip to content

Latest commit

 

History

History
23 lines (15 loc) · 1.75 KB

README.md

File metadata and controls

23 lines (15 loc) · 1.75 KB

SoBigData DCI Dataset

This repository hosts demo datasets that can be used with the DCI transfer learning method published on the Virtual Research Environment (VRE) of the SoBigData project.

The DCI method has been published in:

The original Webis-CLS-10 dataset has been created by Peter Prettenhofer and Benno Stein and published in:

The original dataset is published on Zenodo under a CC BY 4.0 license

The version published in this repository (Webis-CLS-10-SBD) is a subset rearranged to be usable with the interface of the methods available on the VRE. Japanese documents have been removed, due to lack of proper tokenization method in the VRE implementation. German unlabeled set has been reduced, to fit the 100MB size limit on Github. The aim of Webis-CLS-10-SBD is only to show how to set up your data to run transfer learning experiments.