Skip to content

jasonfrankenstein/MLForRecords

Repository files navigation

Text Classification for Records Management

By Jason Franks

Supervisors: Greg Rolan, Lan Du

This repository contains the source code for the paper Text Classification for Records Management.

All code was developed on Google Colab (https://colab.research.google.com/) and is intended to run there.

In order to run these experiments you will need your data in a tab-separated .tsv file with two columns: 'label', containing the category name; and 'text', containing the raw text. Evrey category in the data file should have at least 10 records.

The notebooks are set up to load these data files from a google drive and must be provided with the path to mount (mount_path) and the name of the file containing your text data (data_file). The mount path must contain a folder named output, into which the notebooks will write output metrics.

The notebooks will install any software missing from the Colab environment as of 06/2020.

About

Machine Learning Classification for Records Management

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published