Skip to content

Commit

Permalink
fix schema
Browse files Browse the repository at this point in the history
  • Loading branch information
khaled196 committed Mar 15, 2024
1 parent 119aa21 commit 68fae0e
Show file tree
Hide file tree
Showing 6 changed files with 9 additions and 7 deletions.
1 change: 0 additions & 1 deletion topics/fair/tutorials/fair-access/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ layout: tutorial_hands_on
title: Access
abbreviations:
FAIR: Findable, Accessible, Interoperable, Reusable
GTN: Galaxy Training Network
zenodo_link: ''
questions:
- What is data access in the context of FAIR
Expand Down
1 change: 0 additions & 1 deletion topics/fair/tutorials/fair-data-registration/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ layout: tutorial_hands_on
title: Data Registration
abbreviations:
FAIR: Findable, Accessible, Interoperable, Reusable
GTN: Galaxy Training Network
zenodo_link: ''
questions:
- What is data registration?
Expand Down
7 changes: 7 additions & 0 deletions topics/fair/tutorials/fair-metadata/tutorial.bib
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,11 @@ @article{Sarkans2021
year = {2021},
month = may,
pages = {1418–1422}
}

@online{diseaseontology,
author = {Disease Ontology},
title = {The Disease Ontology is a formal ontology of human disease.},
url = {https://disease-ontology.org/?id=DOID:9352},
urldate = {2024-03-15}
}
5 changes: 2 additions & 3 deletions topics/fair/tutorials/fair-metadata/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ layout: tutorial_hands_on
title: Metadata
abbreviations:
FAIR: Findable, Accessible, Interoperable, Reusable
GTN: Galaxy Training Network
zenodo_link: ''
questions:
- What is metadata?
Expand Down Expand Up @@ -98,7 +97,7 @@ A project URL should be used where possible, ideally one that can act as a **per
Terms of access and reuse are missing which could be rectified by including a [data licence](https://rdmkit.elixir-europe.org/licensing#what-licence-should-you-apply-to-your-research-data), which often appears as part of the metadata, usually at the bottom of a webpage hosting data.
There could be ambiguity around the acronym, “BPM”, used in the third column header, so this should be defined within a glossary of acronyms and or ideally hyperlinked to a definition in an existing ontology.

There are issues too with the data, as well as the metadata. The second column DISEASE TYPE could be better designed. Two pieces of information (data) are depicted in the same column: disease type (diabetes) and disease stage (early/late). Ideally these should be in 2 separate columns allowing researchers to subset on stage and disease type independently for downstream analysis. There are also four different terms used for diabetes (“Diabetes Mellitus II”, “Diabetes”, “Diabetes Mellitus” and “Diabetes Mellitus I”), which again does not allow a researcher to subset data efficiently. To fix this you would use defined terms within an existing vocabulary or ontology. The following accesses a [disease ontology](https://disease-ontology.org/?id=DOID:9352) we could use, where each term (for example, “type 2 diabetes mellitus”) is described and assigned a unique ID. In the example above you would use this unique ID, or the associated descriptive term, to tag all patients with the same disease, identically. This then makes the data sub-setable and machine-readable.
There are issues too with the data, as well as the metadata. The second column DISEASE TYPE could be better designed. Two pieces of information (data) are depicted in the same column: disease type (diabetes) and disease stage (early/late). Ideally these should be in 2 separate columns allowing researchers to subset on stage and disease type independently for downstream analysis. There are also four different terms used for diabetes (“Diabetes Mellitus II”, “Diabetes”, “Diabetes Mellitus” and “Diabetes Mellitus I”), which again does not allow a researcher to subset data efficiently. To fix this you would use defined terms within an existing vocabulary or ontology. The following accesses a {% cite diseaseontology %} we could use, where each term (for example, “type 2 diabetes mellitus”) is described and assigned a unique ID. In the example above you would use this unique ID, or the associated descriptive term, to tag all patients with the same disease, identically. This then makes the data sub-setable and machine-readable.


> <question-title></question-title>
Expand All @@ -115,7 +114,7 @@ There are issues too with the data, as well as the metadata. The second column D

# Writing FAIR metadata

We have discussed already how rich metadata enables a dataset to be reused and interpreted correctly. In the context of the FAIR principles, the previous exercise illustrates two of these, namely that _“(Meta)data are richly described with a plurality of accurate and relevant attributes”_ (FAIR Principle R1) and that _“(Meta)data are associated with detailed provenance”_ (FAIR Principle R1,2). Further to this, the suggested use of the published disease ontology for data, illustrates a further three principles, where _“(Meta)data use **vocabularies**,Vocabularies: (or controlled vocabulary) is a dictionary of terms you can use when producing (meta)data, that follow FAIR principles”_ (FAIR Principle I2), and _“(Meta)data meet domain-relevant **community standards**, Community standards: standard guidelines used to structure and exchange data, usually supported by community-developed resources and/or software, (FAIR Principle R1.3). The use of hyperlinks specifically to terms in the ontology means that Metadata include *qualified references*, Qualified references: terms used to describe relationships to pieces of (meta)data. , to other Metadata (FAIR Principle I3). From the previous exercise, the [disease ontology](https://disease-ontology.org/) provides the vocabulary for the different types of diabetes: [type 1 diabetes mellits](https://disease-ontology.org/?id=DOID:9352) and [type 2 diabetes mellitus](https://disease-ontology.org/?id=DOID:9352).
We have discussed already how rich metadata enables a dataset to be reused and interpreted correctly. In the context of the FAIR principles, the previous exercise illustrates two of these, namely that _“(Meta)data are richly described with a plurality of accurate and relevant attributes”_ (FAIR Principle R1) and that _“(Meta)data are associated with detailed provenance”_ (FAIR Principle R1,2). Further to this, the suggested use of the published {% cite diseaseontology %} for data, illustrates a further three principles, where _“(Meta)data use **vocabularies**,Vocabularies: (or controlled vocabulary) is a dictionary of terms you can use when producing (meta)data, that follow FAIR principles”_ (FAIR Principle I2), and _“(Meta)data meet domain-relevant **community standards**, Community standards: standard guidelines used to structure and exchange data, usually supported by community-developed resources and/or software, (FAIR Principle R1.3). The use of hyperlinks specifically to terms in the ontology means that Metadata include *qualified references*, Qualified references: terms used to describe relationships to pieces of (meta)data. , to other Metadata (FAIR Principle I3). From the previous exercise, the [disease ontology](https://disease-ontology.org/) provides the vocabulary for the different types of diabetes: type 1 diabetes mellits {% cite diseaseontology %} and type 2 diabetes mellitus {% cite diseaseontology %}.

The FAIR Guiding Principles also highlight the importance of providing rich metadata to enable researchers to **find** datasets such that _“Data are described with rich metadata”_ (FAIR Principle F2). More often than not, a researcher will find data through searching its metadata, usually via an online or a database search. Information on how this can be achieved is discussed in the next episode on data registration.

Expand Down
1 change: 0 additions & 1 deletion topics/fair/tutorials/fair-origin/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ layout: tutorial_hands_on
title: FAIR and its Origins
abbreviations:
FAIR: Findable, Accessible, Interoperable, Reusable
GTN: Galaxy Training Network
zenodo_link: ''
questions:
- What is FAIR and the FAIR Guiding Principles?
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ layout: tutorial_hands_on
title: Persistent Identifiers
abbreviations:
FAIR: Findable, Accessible, Interoperable, Reusable
GTN: Galaxy Training Network
zenodo_link: ''
questions:
- What is a persistent identifier?
Expand Down

0 comments on commit 68fae0e

Please sign in to comment.