From 68fae0e6884c4e44cb8b4cdeef703f963e3fa596 Mon Sep 17 00:00:00 2001 From: khaled196 Date: Fri, 15 Mar 2024 16:12:29 +0000 Subject: [PATCH] fix schema --- topics/fair/tutorials/fair-access/tutorial.md | 1 - topics/fair/tutorials/fair-data-registration/tutorial.md | 1 - topics/fair/tutorials/fair-metadata/tutorial.bib | 7 +++++++ topics/fair/tutorials/fair-metadata/tutorial.md | 5 ++--- topics/fair/tutorials/fair-origin/tutorial.md | 1 - .../fair/tutorials/fair-persistent-identifiers/tutorial.md | 1 - 6 files changed, 9 insertions(+), 7 deletions(-) diff --git a/topics/fair/tutorials/fair-access/tutorial.md b/topics/fair/tutorials/fair-access/tutorial.md index fdb823e1bbb482..7240ab97f66f4b 100644 --- a/topics/fair/tutorials/fair-access/tutorial.md +++ b/topics/fair/tutorials/fair-access/tutorial.md @@ -3,7 +3,6 @@ layout: tutorial_hands_on title: Access abbreviations: FAIR: Findable, Accessible, Interoperable, Reusable - GTN: Galaxy Training Network zenodo_link: '' questions: - What is data access in the context of FAIR diff --git a/topics/fair/tutorials/fair-data-registration/tutorial.md b/topics/fair/tutorials/fair-data-registration/tutorial.md index 972a79891d0033..e72d28d7b8c9dd 100644 --- a/topics/fair/tutorials/fair-data-registration/tutorial.md +++ b/topics/fair/tutorials/fair-data-registration/tutorial.md @@ -3,7 +3,6 @@ layout: tutorial_hands_on title: Data Registration abbreviations: FAIR: Findable, Accessible, Interoperable, Reusable - GTN: Galaxy Training Network zenodo_link: '' questions: - What is data registration? diff --git a/topics/fair/tutorials/fair-metadata/tutorial.bib b/topics/fair/tutorials/fair-metadata/tutorial.bib index b7d32dcdb2d152..48bd23ddcb6586 100644 --- a/topics/fair/tutorials/fair-metadata/tutorial.bib +++ b/topics/fair/tutorials/fair-metadata/tutorial.bib @@ -18,4 +18,11 @@ @article{Sarkans2021 year = {2021}, month = may, pages = {1418–1422} +} + +@online{diseaseontology, + author = {Disease Ontology}, + title = {The Disease Ontology is a formal ontology of human disease.}, + url = {https://disease-ontology.org/?id=DOID:9352}, + urldate = {2024-03-15} } \ No newline at end of file diff --git a/topics/fair/tutorials/fair-metadata/tutorial.md b/topics/fair/tutorials/fair-metadata/tutorial.md index f1d74a8941d7ed..d988209abb6a6e 100644 --- a/topics/fair/tutorials/fair-metadata/tutorial.md +++ b/topics/fair/tutorials/fair-metadata/tutorial.md @@ -3,7 +3,6 @@ layout: tutorial_hands_on title: Metadata abbreviations: FAIR: Findable, Accessible, Interoperable, Reusable - GTN: Galaxy Training Network zenodo_link: '' questions: - What is metadata? @@ -98,7 +97,7 @@ A project URL should be used where possible, ideally one that can act as a **per Terms of access and reuse are missing which could be rectified by including a [data licence](https://rdmkit.elixir-europe.org/licensing#what-licence-should-you-apply-to-your-research-data), which often appears as part of the metadata, usually at the bottom of a webpage hosting data. There could be ambiguity around the acronym, “BPM”, used in the third column header, so this should be defined within a glossary of acronyms and or ideally hyperlinked to a definition in an existing ontology. -There are issues too with the data, as well as the metadata. The second column DISEASE TYPE could be better designed. Two pieces of information (data) are depicted in the same column: disease type (diabetes) and disease stage (early/late). Ideally these should be in 2 separate columns allowing researchers to subset on stage and disease type independently for downstream analysis. There are also four different terms used for diabetes (“Diabetes Mellitus II”, “Diabetes”, “Diabetes Mellitus” and “Diabetes Mellitus I”), which again does not allow a researcher to subset data efficiently. To fix this you would use defined terms within an existing vocabulary or ontology. The following accesses a [disease ontology](https://disease-ontology.org/?id=DOID:9352) we could use, where each term (for example, “type 2 diabetes mellitus”) is described and assigned a unique ID. In the example above you would use this unique ID, or the associated descriptive term, to tag all patients with the same disease, identically. This then makes the data sub-setable and machine-readable. +There are issues too with the data, as well as the metadata. The second column DISEASE TYPE could be better designed. Two pieces of information (data) are depicted in the same column: disease type (diabetes) and disease stage (early/late). Ideally these should be in 2 separate columns allowing researchers to subset on stage and disease type independently for downstream analysis. There are also four different terms used for diabetes (“Diabetes Mellitus II”, “Diabetes”, “Diabetes Mellitus” and “Diabetes Mellitus I”), which again does not allow a researcher to subset data efficiently. To fix this you would use defined terms within an existing vocabulary or ontology. The following accesses a {% cite diseaseontology %} we could use, where each term (for example, “type 2 diabetes mellitus”) is described and assigned a unique ID. In the example above you would use this unique ID, or the associated descriptive term, to tag all patients with the same disease, identically. This then makes the data sub-setable and machine-readable. > @@ -115,7 +114,7 @@ There are issues too with the data, as well as the metadata. The second column D # Writing FAIR metadata -We have discussed already how rich metadata enables a dataset to be reused and interpreted correctly. In the context of the FAIR principles, the previous exercise illustrates two of these, namely that _“(Meta)data are richly described with a plurality of accurate and relevant attributes”_ (FAIR Principle R1) and that _“(Meta)data are associated with detailed provenance”_ (FAIR Principle R1,2). Further to this, the suggested use of the published disease ontology for data, illustrates a further three principles, where _“(Meta)data use **vocabularies**,Vocabularies: (or controlled vocabulary) is a dictionary of terms you can use when producing (meta)data, that follow FAIR principles”_ (FAIR Principle I2), and _“(Meta)data meet domain-relevant **community standards**, Community standards: standard guidelines used to structure and exchange data, usually supported by community-developed resources and/or software, (FAIR Principle R1.3). The use of hyperlinks specifically to terms in the ontology means that Metadata include *qualified references*, Qualified references: terms used to describe relationships to pieces of (meta)data. , to other Metadata (FAIR Principle I3). From the previous exercise, the [disease ontology](https://disease-ontology.org/) provides the vocabulary for the different types of diabetes: [type 1 diabetes mellits](https://disease-ontology.org/?id=DOID:9352) and [type 2 diabetes mellitus](https://disease-ontology.org/?id=DOID:9352). +We have discussed already how rich metadata enables a dataset to be reused and interpreted correctly. In the context of the FAIR principles, the previous exercise illustrates two of these, namely that _“(Meta)data are richly described with a plurality of accurate and relevant attributes”_ (FAIR Principle R1) and that _“(Meta)data are associated with detailed provenance”_ (FAIR Principle R1,2). Further to this, the suggested use of the published {% cite diseaseontology %} for data, illustrates a further three principles, where _“(Meta)data use **vocabularies**,Vocabularies: (or controlled vocabulary) is a dictionary of terms you can use when producing (meta)data, that follow FAIR principles”_ (FAIR Principle I2), and _“(Meta)data meet domain-relevant **community standards**, Community standards: standard guidelines used to structure and exchange data, usually supported by community-developed resources and/or software, (FAIR Principle R1.3). The use of hyperlinks specifically to terms in the ontology means that Metadata include *qualified references*, Qualified references: terms used to describe relationships to pieces of (meta)data. , to other Metadata (FAIR Principle I3). From the previous exercise, the [disease ontology](https://disease-ontology.org/) provides the vocabulary for the different types of diabetes: type 1 diabetes mellits {% cite diseaseontology %} and type 2 diabetes mellitus {% cite diseaseontology %}. The FAIR Guiding Principles also highlight the importance of providing rich metadata to enable researchers to **find** datasets such that _“Data are described with rich metadata”_ (FAIR Principle F2). More often than not, a researcher will find data through searching its metadata, usually via an online or a database search. Information on how this can be achieved is discussed in the next episode on data registration. diff --git a/topics/fair/tutorials/fair-origin/tutorial.md b/topics/fair/tutorials/fair-origin/tutorial.md index abc7e206d827ff..c8c331101b8c03 100644 --- a/topics/fair/tutorials/fair-origin/tutorial.md +++ b/topics/fair/tutorials/fair-origin/tutorial.md @@ -3,7 +3,6 @@ layout: tutorial_hands_on title: FAIR and its Origins abbreviations: FAIR: Findable, Accessible, Interoperable, Reusable - GTN: Galaxy Training Network zenodo_link: '' questions: - What is FAIR and the FAIR Guiding Principles? diff --git a/topics/fair/tutorials/fair-persistent-identifiers/tutorial.md b/topics/fair/tutorials/fair-persistent-identifiers/tutorial.md index ae3e89ca4a77b2..dc1b7d8634047f 100644 --- a/topics/fair/tutorials/fair-persistent-identifiers/tutorial.md +++ b/topics/fair/tutorials/fair-persistent-identifiers/tutorial.md @@ -3,7 +3,6 @@ layout: tutorial_hands_on title: Persistent Identifiers abbreviations: FAIR: Findable, Accessible, Interoperable, Reusable - GTN: Galaxy Training Network zenodo_link: '' questions: - What is a persistent identifier?