Skip to content

Repository of demo datasets for the DCI transfer-learning method published on the SoBigData VRE

Notifications You must be signed in to change notification settings

HLT-ISTI/sobigdata-dci-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

SoBigData DCI Dataset

This repository hosts demo datasets that can be used with the DCI transfer learning method published on the Virtual Research Environment (VRE) of the SoBigData project.

The DCI method has been published in:

The original Webis-CLS-10 dataset has been created by Peter Prettenhofer and Benno Stein and published in:

The original dataset is published on Zenodo under a CC BY 4.0 license

The version published in this repository (Webis-CLS-10-SBD) is a subset rearranged to be usable with the interface of the methods available on the VRE. Japanese documents have been removed, due to lack of proper tokenization method in the VRE implementation. German unlabeled set has been reduced, to fit the 100MB size limit on Github. The aim of Webis-CLS-10-SBD is only to show how to set up your data to run transfer learning experiments.

About

Repository of demo datasets for the DCI transfer-learning method published on the SoBigData VRE

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published