Skip to content

Data Set Requirements and Organization

mitzimorris edited this page Sep 28, 2013 · 5 revisions

As part of this repository, we are sharing open-access data sets. If you have a data set we can share, please let us know (carp@alias-icom).

Location of Data

The data sets are found in the subdirectory data of the top-level repository.

Layout of Data

Each data set is in its own subdirectory. Before adding a data set, make sure to include as much context from the original distribution as possible:

  • the raw data itself, as distributed originally with no modifications, in a top-level directory original

  • in the original directory, license information is in a separate file license.txt. If the raw data is distributed as a tarball and contains license information, either as part of the documentation or in its own file, this information is pulled out into the license.txt file.

  • in the original directory, any available data set descriptions in a subdirectory notes; this should include any web pages or PDFs describing the data (or links if the web pages or PDFs are not themselves open access)

Then, on top of the distribution, there can be additional directories

  • Use a subdirectory src for any source code used to manipulate the data

  • top level build.xml files for Ant or makefile for make

  • any munged form of the data should go under a new directory called munged, which should also contain documentation as to the format of the data

Clone this wiki locally