Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start of a new tutorial: FAIR-ego. DRAFT #4351

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
1ca3d5c
Create tutorial.md
korneelhens Sep 19, 2023
847d2bc
Add files via upload
korneelhens Sep 19, 2023
1af21b1
Update tutorial.md
korneelhens Sep 19, 2023
194ca5a
Update tutorial.md
korneelhens Sep 19, 2023
41d9dd9
Add files via upload
korneelhens Sep 19, 2023
da2954d
Update tutorial.md
korneelhens Sep 19, 2023
5ec46ee
Update tutorial.md
korneelhens Sep 20, 2023
78c6bd9
Update tutorial.md
korneelhens Sep 21, 2023
c6e676e
Add files via upload
korneelhens Sep 21, 2023
0681036
Update tutorial.md
korneelhens Sep 21, 2023
698efcb
Update tutorial.md
korneelhens Sep 21, 2023
5978a76
Create tutorial.bib
korneelhens Sep 21, 2023
e853b70
Update tutorial.md
korneelhens Sep 21, 2023
dec46b2
Update tutorial.md
korneelhens Sep 21, 2023
397f6e7
Update tutorial.md
korneelhens Sep 22, 2023
99d9596
Update tutorial.bib
korneelhens Sep 22, 2023
fcb219b
Update tutorial.md
korneelhens Sep 22, 2023
ea5a9ee
Update tutorial.md
korneelhens Sep 22, 2023
a08291e
Update tutorial.bib
korneelhens Sep 22, 2023
43f146c
Update tutorial.md
korneelhens Sep 22, 2023
a3cc117
Update tutorial.bib
korneelhens Sep 25, 2023
ee256d0
Update tutorial.md
korneelhens Sep 25, 2023
91bb633
Update tutorial.md
korneelhens Sep 25, 2023
eeed2ea
Update topics/fair/tutorials/fair-ego/tutorial.md
korneelhens Oct 26, 2023
bb503b0
Update tutorial.md
korneelhens Oct 26, 2023
36f0853
Merge branch 'main' into korneelhens-patch-1
korneelhens Oct 26, 2023
e540839
Merge pull request #1 from korneelhens/korneelhens-patch-1
korneelhens Oct 26, 2023
284c1f8
Merge branch 'galaxyproject:main' into main
korneelhens Nov 9, 2023
eb17a65
Update topics/fair/tutorials/fair-ego/tutorial.md
korneelhens Nov 9, 2023
23c3a75
add korneelhens
korneelhens Nov 16, 2023
9a64452
Update CONTRIBUTORS.yaml
korneelhens Nov 16, 2023
d58c7ba
Update CONTRIBUTORS.yaml
hexylena Nov 16, 2023
6906f2f
Update topics/fair/tutorials/fair-ego/tutorial.md
korneelhens Nov 30, 2023
bbdd536
Update topics/fair/tutorials/fair-ego/tutorial.md
korneelhens Nov 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file not shown.
Binary file added topics/fair/tutorials/fair-ego/emailresponse.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added topics/fair/tutorials/fair-ego/flybrain.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
46 changes: 46 additions & 0 deletions topics/fair/tutorials/fair-ego/tutorial.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
@article{Vines_2014,
doi = {10.1016/j.cub.2013.11.014},
url = {https://doi.org/10.1016%2Fj.cub.2013.11.014},
year = 2014,
month = {jan},
publisher = {Elsevier {BV}},
volume = {24},
number = {1},
pages = {94--97},
author = {Timothy~H. Vines and Arianne~Y.K. Albert and Rose~L. Andrew and Florence D{\'{e}}barre and Dan~G. Bock and Michelle~T. Franklin and Kimberly~J. Gilbert and Jean-S{\'{e}}bastien Moore and S{\'{e}}bastien Renaut and Diana~J. Rennison},
title = {The Availability of Research Data Declines Rapidly with Article Age},
journal = {Current Biology}
}
@article{Baker2016,
doi = {10.1038/533452a},
url = {https://doi.org/10.1038/533452a},
year = {2016},
month = may,
publisher = {Springer Science and Business Media {LLC}},
volume = {533},
number = {7604},
pages = {452--454},
author = {Monya Baker},
title = {1, 500 scientists lift the lid on reproducibility},
journal = {Nature}
}
@misc{FAIR-in-a-nutshell,
url = {https://training.galaxyproject.org/training-material//topics/fair/tutorials/fair-intro/tutorial.html},
note = {Accessed 2023-09-22},
title = {FAIR in a nutshell}
}
@misc{FAIR-data-management-solutions,
url = {https://training.galaxyproject.org/training-material//topics/fair/tutorials/data-management/tutorial.html},
note = {Accessed 2023-09-22},
title = {FAIR data management solutions}
}
@misc{RDMkit-Data-management-plan,
url = {https://rdmkit.elixir-europe.org/data_management_plan},
note = {Accessed 2023-09-22},
title = {RDMkit Data management plan}
}
@misc{elixir-belgium-About-DMP,
url = {https://rdm.elixir-belgium.org/about_DMP},
note = {Accessed 2023-09-22},
title = {elixir belgium About DMP}
}
210 changes: 210 additions & 0 deletions topics/fair/tutorials/fair-ego/tutorial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
---
layout: tutorial_hands_on
title: FAIR and RDM from a selfish perspective
abbreviations:
FAIR: Findable, Accessible, Interoperable, Reusable
GTN: Galaxy Training Network
RDM: Research Data Management
level: Introductory
zenodo_link: ''
questions:
- Why should other researchers adhere to FAIR principles?
- How can I profit from good RDM?
objectives:
- Learn about the benefits of good data management
time_estimation: "30M"
key_points:
- RDM can lead save time and money.
tags:
- fair
- gtn
- rdm
contributions:
authorship:
- korneelhens
- nomadscientist
funding:
- elixir-fair-data
subtopic: fair-data

requirements:
- type: "internal"
topic_name: fair
tutorials:
- fair-intro

draft: true

---


# Introduction

**You** are obviously too important to bother with this research data management (RDM) nonsense. Just copy the data management plan from a previous grant application and stop wasting time.
korneelhens marked this conversation as resolved.
Show resolved Hide resolved
However, there might be some value in persuading your colleagues to do their RDM properly and to adhere to FAIR standards. In this tutorial, we explore the purely selfish reasons for encouraging good data management practices based on some real-life examples.


> <agenda-title></agenda-title>
>
> In this tutorial, we will cover:
>
> 1. TOC
> {:toc}
>
{: .agenda}

# Excercise 1: Find the experiment

## The Problem

Imagine the following situation: An undergrad student wants to do a lab-based project with you. This is great because your post-doc has recently left the lab and there are several promising leads from their work that can be followed up. You propose the following topic to the student:

**Effect of mutations in the gene CHES-I-like on feeding behaviour in _Drosophila melanogaster_**
korneelhens marked this conversation as resolved.
Show resolved Hide resolved

The only problem is, you can't remember what your post-doc has done so far. This is no problem, as you have followed your data management plan to the letter: _"lab books will be preserved for 10 years in the PI’s office"._ Let's see how effective this strategy actually is.

<question-title>1</question-title>
The PDF below contains a scan of the lab books in this example. Somewhere, there is an experiment to show the effect of the knock-down of a gene called CHES-I-like on the feeding behaviour of fruit flies. Try to find this information in the lab book.

korneelhens marked this conversation as resolved.
Show resolved Hide resolved
> [5 years worth of lab books](Lab-book_excercise1.pdf)
korneelhens marked this conversation as resolved.
Show resolved Hide resolved

> > <solution-title></solution-title>
> > The information can be found on page XXX of the pdf.
> {: .solution}
>
{: .question}

Too difficult? Maybe try something a bit more easy

> <question-title></question-title>

> The information can be found on page XXX of the lab book. Can you tell me how this experiment was designed?

> [5 years worth of lab books](Lab-book_excercise1.pdf)

> > <solution-title></solution-title>
> > No you can't. Neither can I. Obviously our data management plan did not do us any favours here.
> {: .solution}
>
{: .question}

## The solution

The problem descibed above could have been avoided by creating a Data Management Plan (DPM) before the start of the project. A DMP is a document that describes how research data is going to be managed during and after the project. Importantly, A DMP is a living document that may need to be altered during the project. Any time the research plan changes, the DMP should be reviewed and updated.
Learn more about DPMs from these excellent resources: {% cite RDMkit-Data-management-plan %} and {% cite elixir-belgium-About-DMP %}

# Excercise 2: Reproduce the experiment

## The Problem

Imagine the following situation: You have got great data. The editors of Nature, Science and Cell are all knocking on your door to publish in their journals. The only thing standing between you and a Nobel prize is one picture of an immunostaining that is not publication quality and the immunostaining needs to be redone.

![Wonky immunostaining](flybrain.png "Immunostaining")

> <question-title>2</question-title>
> This is an immunostaining of a fruitfly brain. Can you list 15 more pieces of information that you would need to be able to reproduce this image?
> > <solution-title></solution-title>
> > 1. Which fly strain
> > 2. Which developmental stage
> > 3. Which genotype
> > 4. Time of day
> > 5. Feeding conditions
> > 6. Sex
> > 7. Mating condition
> > 8. Fixation time and conditions
> > 9. Primary antibody
> > 10. Secondary antibody
> > 11. Dilutions of antibodies
> > 12. Staining protocol
> > 13. Which microscope
> > 14. Which laser lines
> > 15. What exposure time
> > 16. Filter settings
> > 17. Magnification
> > 18. Pinhole size
> > 19. Number of z slices
> > 20. ...
> >
> {: .solution}
>
{: .question}

## The solution

This excercise illustrates how much information you need to keep to be able to reproduce experiments, be it your own data or that of other scientists.
More often than not, this level of detail is not recorded / published, which may explain part of the **reproducibility crisis** {% cite Baker2016 %}.

This information about the data is called **metadata** and is arguably as important as the data itself.

# Excercise 3: Get the data

## The Problem

Imagine the following situation: it has been a slow couple of years in terms of publications and grant income. You really need to get papers out, but you are short of research money. No worries! There is tons of data pit there for you to re-analyse, interpret, and integrate with your own data. You have found a particularly useful study in a paper and you want to use the data to get your research going again. In the data sharing statement, you find the following sentence:

_**The data that support the findings of this study are available from the corresponding author upon reasonable request.**_
korneelhens marked this conversation as resolved.
Show resolved Hide resolved

> <question-title>3</question-title>
> Can you list a couple of reasons an author might have not to publish the data he is reporting?
> > <solution-title></solution-title>
> >
> > 1. Conflicts of privacy and confidentiality e.g. patient data
> > 2. Economically sensitive data e.g. patent applications
> {: .solution}
>
{: .question}

> <question-title>4</question-title>
> Since the authors will provide the data upon **reasonable** request, give an example of an **unreasonable** request
> > <solution-title></solution-title>
> > I don't think there are unreasonable requests?
> {: .solution}
>
{: .question}

Recently, the following has appeared in severeal journals:

_**Contact the authors for detailed protocol**_

> <question-title>5</question-title>
> If the publisher won't accomodate lengthy protocols in the main paper, what would be a good place to publish detailed protocols?
> > <solution-title></solution-title>
> >
> > 1. Supplementary Data
> > 2. Github
> > 3. Dedicated Methods paper
> {: .solution}
>
{: .question}

You are resigned to the fact that you will have to contact the authors to get the data. Let's have a look at this paper {% cite Vines_2014 %} to get an idea how likely it is that you will actually be succesful.

Figure 1 in this paper is especially revealing:

![Alt text (Four graphs showing the decline in responses from authors with age of the paper; the likelihood of getting a response; the likelihood that that is useful and the likelihood that the data still exists.)](emailresponse.jpg "Obstacles to receiving data from the authors")

In this figure, four obstacles to receiving data from the authors are identified
1. the probability that the paper had at least one apparently working e-mail. Worryingly, even in the first year, this is only 80 % and drops off as the paper ages. Academics usually put their university email address on the paper, but they also move around from institute to institute causing the emails to get lost.
2. probability of receiving a response, given that at least one e-mail was apparently working. This is disappointingly low to start with, but remains stable over the years
3. probability of receiving a response giving the status of the data, given that we received a response. This is also stable at around 80%
4. probability that the data were extant (either “shared” or “exist but unwilling to share”) given that we received a useful response.

> <question-title>6</question-title>
> From the previous graph, calculate the likelihood of getting hold of 10 year old data
> > <solution-title></solution-title>
> >
> > 0.75 x 0.5 x 0.8 x 0.75 = 0.225
> > You have a 22.5% chance of getting the data!
> {: .solution}
>
{: .question}

## The solution

The problems described above all relate to (lack of) data reuse possibilities by researchers. The FAIR principles of research data management have been established precicely to increase data reuse.
Learn more about the FAIR principles in these excellent GTN tutorials: {% cite FAIR-in-a-nutshell %} and {% cite FAIR-data-management-solutions %}

# Conclusion

The Galaxy Training Network is an example of a robust, effective Community of Practice.
For more information please look at this great article {% cite hiltemann2023galaxy %}, the corresponding FAIR guidelines {% cite fair-training-materials %} and follow [short introduction to FAIR data stewardship](http://fellowship.elixiruknode.org/).