Skip to content

trait list

Florian Schneider edited this page May 11, 2017 · 3 revisions

This page is to develop a structure for our trait list. We should answer the following questions first:

  • Do we develop one list of accepted traits or several (per organism group)? It could be useful if trait definitions vary between taxonomic groups.
  • Can we find a typology/classification of traits (e.g. morphometric, phenological, life-history, ...)? That could be useful for trait queries and listings of accepted traits for data holders.
  • What is the process of adding a new trait to the list? (or for correcting/extending existing entries/definitions of traits) Can we provide a submission form? Will there be a simple review process? Who is maintaining the trait list?
  • For which existing trait databases should we provide compatibility? That would require a trait definition that is compatible with the other database and a mapping table that allows to translate our template into the format of other DBs.

trait lists

One product of our project will be the preparation of trait lists that define the accepted traits and standardise their format and unit for use with the BExIS template.

For each organism group, a traitlist will be provided. It has the format of a simple data table (xls, comma separated txt or csv) and will be available for download on BExIS. The lists should be maintained by the special interest groups within the Exploratories. For use in the template, they will be referred to with a BExIS ID, a dataset name or a DOI.

The following columns must be present in such a 'traitlist' object:

  • traitName : contains a short and unique name of the trait (no spaces or points, just _ allowed)
  • traitID : a numeric identifier for each trait
  • type_of_data : defines the type of data to expect. Must be one of: numeric, factor, character, logical, integer .

optional columns can be

  • measurementUnit : highly recommended. contains an expected unit in format mm. This will be matched
  • factorLevels : highly recommended for factorial traits, to constrain the set of values that can be entered.
  • MaxAllowedValue & MinAllowedValue : recommended for proportion data (set to 0 and 1) or other constraint numerical data.
  • traitDescription : a short description of the trait
  • extendedDescription : a more detailled definition of the trait.
  • example : A couple of expamples of entries, particularly useful for traits saved as a character string.

Further columns can be present, e.g. for a hierarchical categorisation of traits, e.g. traitClass or traitType, that will not be handled by the R Script. This is for compatibility with hierarchical trait thesaurus (such as TOP or T-SITA).

Some traits would be offered as multiple options, e.g. for numerical reportings or factorial reportings of the same trait, two traits would be defined.

One issue I encountered when checking out the data upload in Bexis, is that the author has to define all columns as integer/real numerical or character value. Different string lengths can be choosen, also boolean is offered as an option. What about factorial values? Does BExIS not treat them as factor? Are people supposed to mark them as integer? This field is really confusing to a data-type-conscious user. To me, storing the column content as character string, when it actually is factorial with just a couple of levels, seems to be rather expensive. However, when I download a file from BExIS as txt, the information on column type is lost anyway and must be inferred from the content. R will usually do that for you more or less okay. So nothing to worry about (Except that these data are difficult to crunch server side).

For our data template, that pointed me to a major problem of a multiple-value column: Our initially intended column "value" is supposed to take all observed trait values, across all traits, factorial or numerical, integer or real, pre-defined or open factor levels. That column is a real mess. In consequence, it must be stored as a long character string and taken apart by the user (ideally, via a predefined R-Script) in post-processing. One option that I have seen in BETSI could be to store numerical and factorial values in different columns of the template. But this could cause confusion and errors. Also, factorials can not be handled appropriately, since each trait has very different factor levels and we need to fall back to character strings, anyway. So there is no gain in splitting things up. How do other trait databases handle this problem? Is it a problem at all? Basically, we know what to expect in the fields, if the trait ID is set correctly.

I feel there are two issues here: First is to understand how BExIS stores primary data and metadata and how we can make them available for local computation (by the user). Second is to find a way to handle the multi-format data columns that we are going to generate with our trait data collection (on BExIS severs as well as in local use).

Clone this wiki locally