diff --git a/.github/workflows/render_page.yml b/.github/workflows/render_page.yml index 69d6ee32..bd5d8bba 100644 --- a/.github/workflows/render_page.yml +++ b/.github/workflows/render_page.yml @@ -1,17 +1,36 @@ -name: render page -on: +on: push: - branches: - - main + branches: main + workflow_dispatch: + +name: Quarto Publish + jobs: - deploy: + build-deploy: runs-on: ubuntu-latest + permissions: + contents: write steps: - - uses: actions/checkout@v3 - - uses: actions/setup-python@v4 + - name: Install libcurl on Linux + if: runner.os == 'Linux' + run: sudo apt-get update -y && sudo apt-get install -y libcurl4-openssl-dev libharfbuzz-dev libfribidi-dev + + - name: Check out repository + uses: actions/checkout@v4 + + - name: Set up Quarto + uses: quarto-dev/quarto-actions/setup@v2 with: - python-version: 3.x - - run: | - pip install mkdocs-material mkdocs-table-reader-plugin mkdocs-bibtex neoteroi-mkdocs mkdocs-video mkdocs-minify-plugin mkdocs-jupyter mkdocs-git-revision-date-localized-plugin - mkdocs gh-deploy --force - + version: '1.3.450' + tinytex: true + +# - name: Install R +# uses: r-lib/actions/setup-r@v2 + +# - name: Install R Dependencies +# uses: r-lib/actions/setup-renv@v2 + + - name: Render and Publish + uses: quarto-dev/quarto-actions/publish@v2 + with: + target: gh-pages \ No newline at end of file diff --git a/.gitignore b/.gitignore index 0a71c91a..e2f4f77c 100644 --- a/.gitignore +++ b/.gitignore @@ -39,4 +39,6 @@ vignettes/*.pdf .Renviron # Other -*.DS_Store \ No newline at end of file +*.DS_Store +/.quarto/ +_site \ No newline at end of file diff --git a/_quarto.yml b/_quarto.yml new file mode 100644 index 00000000..027b99ad --- /dev/null +++ b/_quarto.yml @@ -0,0 +1,44 @@ +project: + type: website + +website: + title: 'Health Data Science Sandbox' + navbar: + logo: /img/logo.png + left: + - href: index.qmd + text: Home + - href: news.qmd + text: News + - href: access/index.qmd + text: HPC access + - href: modules/index.qmd + text: Modules + - text: Datasets + menu: + - href: datasets/datapolicy.qmd + - href: datasets/synthdata.qmd + - href: about/about.qmd + text: About +# - href: contact/contact.qmd +# text: Contact + + right: + - icon: github + href: https://github.com/hds-sandbox + aria-label: GitHub + - icon: linkedin + href: https://www.linkedin.com/company/ucph-heads + aria-label: LinkedIn + - icon: twitter + href: https://twitter.com/ucph_heads + aria-label: Twitter + +format: + html: + theme: + light: [materia, css/materialight.scss] + dark: darkly + toc: true +# include-in-header: +# - file: "resources/bioschema.html" \ No newline at end of file diff --git a/develop/about/about.md b/about/about.qmd similarity index 57% rename from develop/about/about.md rename to about/about.qmd index 790d29e4..4edcd495 100644 --- a/develop/about/about.md +++ b/about/about.qmd @@ -1,33 +1,50 @@ ---- -permalink: /about/ -hide: - - navigation - - toc - - footer ---- - -

About the Sandbox

-

an infrastructure project for
health data science training and research in Denmark

- -The National Health Data Science Sandbox project kicked off in 2021 with 5 years of funding via the Data Science Research Infrastructure initiative from the Novo Nordisk Foundation. Health data science experts at five Danish universities are contributing to the Sandbox with coordination from the Center for Health Data Science under lead PI Professor Anders Krogh. [Data scientists](https://hds-sandbox.github.io/contact/contact.html) hosted in the research groups of each PI are building infrastructure and training modules on Computerome and UCloud, the primary academic high performance computing (HPC) platforms in Denmark. - -
- ![workflow](../assets/images/Sandbox_PIs.png){ width="60%" } -
- -Our computational 'sandbox' allows data scientists to explore datasets, tools and analysis pipelines in the same high performance computing environments where real research projects are conducted. Rather than a single, hefty environment, we're deploying modularized topical environments tailored for independent use on each HPC platform. We aim to support three key user groups based at Danish universities: - -- trainees: use our training modules to learn analysis techniques with some guidance and guardrails - for your data type of interest AND for general good practices for HPC environments -- researchers: prototype your tools and algorithms with an array of good quality datasets that are GDPR compliant and free to access -- educators: develop your next course with computational assignments in the HPC environment your students will use for their research - -Activity developing independent training modules and hosting workshops has centered on UCloud, while collaborative construction of a flexible Course Platform has been completed on Computerome for use by the Sandbox and independent educators. Publicly sourced datasets are being used in training modules on UCloud, while generation of synthetic data is an ongoing project at Computerome. Sandbox resources are under active construction, so check out our other pages for the current status on [HPC Access](https://hds-sandbox.github.io/access/index.html), [Datasets](https://hds-sandbox.github.io/datasets/datapolicy.html), and [Modules](https://hds-sandbox.github.io/modules/index.html). We run workshops using completed training modules on a regular basis and provide active support for Sandbox-hosted courses through a slack workspace. See our [Contact](https://hds-sandbox.github.io/contact/contact.html) page for more information. - ---- - -
**Partner with the Sandbox**
-The Sandbox welcomes proposals for new courses, modules, and prototyping projects from researchers and educators. We'd like to partner with lecturers engaged with us in developing needed materials collaboratively - we would love to have input from subject experts or help promote exciting new tools and analysis methods via modules! Please contact us with your ideas at [nhds_sandbox@sund.ku.dk](mailto:nhds_sandbox@sund.ku.dk). - ---- - -**We thank the Novo Nordisk Foundation for funding support. If you use the Sandbox for research or reference it in text or presentations, please acknowledge the Health Data Science Sandbox project and its funder the Novo Nordisk Foundation (grant number NNF20OC0063268).** +--- +title: "About the Sandbox" +listing: + fields: [image, title, description] + contents: ../cards/*.qmd + type: default + sort: + - "date desc" +--- + +**An infrastructure project for health data science training and research in Denmark** + +The National Health Data Science Sandbox project kicked off in 2021 with 5 years of funding via the Data Science Research Infrastructure initiative from the Novo Nordisk Foundation. Health data science experts at five Danish universities are contributing to the Sandbox with coordination from the Center for Health Data Science under lead PI Professor Anders Krogh. [Data scientists](../contact/contact.qmd) hosted in the research groups of each PI are building infrastructure and training modules on Computerome and UCloud, the primary academic high performance computing (HPC) platforms in Denmark. + + +![](../images/Sandbox_PIs.png){fig-align="center" width="60%"} + +Our computational 'sandbox' allows data scientists to explore datasets, tools and analysis pipelines in the same high performance computing environments where real research projects are conducted. Rather than a single, hefty environment, we're deploying modularized topical environments tailored for independent use on each HPC platform. We aim to support three key user groups based at Danish universities: + +- trainees: use our training modules to learn analysis techniques with some guidance and guardrails - for your data type of interest AND for general good practices for HPC environments +- researchers: prototype your tools and algorithms with an array of good quality datasets that are GDPR compliant and free to access +- educators: develop your next course with computational assignments in the HPC environment your students will use for their research + +Activity developing independent training modules and hosting workshops has centered on UCloud, while collaborative construction of a flexible Course Platform has been completed on Computerome for use by the Sandbox and independent educators. Publicly sourced datasets are being used in training modules on UCloud, while generation of synthetic data is an ongoing project at Computerome. Sandbox resources are under active construction, so check out our other pages for the current status on [HPC Access](../access/index.qmd), [Datasets](../datasets/datapolicy.qmd), and [Modules](../modules/index.qmd). We run workshops using completed training modules on a regular basis and provide active support for Sandbox-hosted courses through a slack workspace. See our [Contact](../contact/contact.qmd) page for more information. + +--- + +# Partner with the Sandbox + +The Sandbox welcomes proposals for new courses, modules, and prototyping projects from researchers and educators. We'd like to partner with lecturers engaged with us in developing needed materials collaboratively - we would love to have input from subject experts or help promote exciting new tools and analysis methods via modules! Please contact us with your ideas at [nhds_sandbox@sund.ku.dk](mailto:nhds_sandbox@sund.ku.dk). + +--- + +**We thank the Novo Nordisk Foundation for funding support. If you use the Sandbox for research or reference it in text or presentations, please acknowledge the Health Data Science Sandbox project and its funder the Novo Nordisk Foundation (grant number NNF20OC0063268).** + +# Contact the sandbox team + +The Health Data Science Sandbox is coordinated by the [Center for Health Data Science](https://heads.ku.dk/) +at the University of Copenhagen (KU). Sandbox data scientists are also placed in collaborating groups at +the Technical University of Denmark (DTU), University of Southern Denmark (SDU), Aarhus University (AU), +and Aalborg University (AAU). + +To get in touch with the Sandbox or be connected with Sandbox staff at your university, +please [email us](mailto:nhds_sandbox@sund.ku.dk). To obtain module material for use in your +own compute environment, see our GitHub organization page +at [hds-sandbox](https://github.com/hds-sandbox/). + +We appreciate the contributions of previous team members José Alejandro Romero Herrera (KU), Conor O'Hare (KU), Sander Boisen Valentin (AAU) and Peter Husen (SDU). + +You can find all the team members and their contacts below: diff --git a/develop/access/Computerome.md b/access/Computerome.qmd similarity index 98% rename from develop/access/Computerome.md rename to access/Computerome.qmd index e9af0c28..dbece0e8 100644 --- a/develop/access/Computerome.md +++ b/access/Computerome.qmd @@ -11,7 +11,7 @@ hide: # Accessing the Sandbox on Computerome -We do not currently support independent use of Sandbox materials on Computerome. Access is supported via courses collaborating with the Sandbox and run on Computerome's [Course Platform](https://computerome.dk/solutions/course-platform). Check [here](https://hds-sandbox.github.io/access/index.html) for more info. +We do not currently support independent use of Sandbox materials on Computerome. Access is supported via courses collaborating with the Sandbox and run on Computerome's [Course Platform](https://computerome.dk/solutions/course-platform). Check [here](index.qmd) for more info. *The below instructions are provided as reference for course participants.* diff --git a/develop/access/UCloud.md b/access/UCloud.qmd similarity index 100% rename from develop/access/UCloud.md rename to access/UCloud.qmd diff --git a/access/genomedk.qmd b/access/genomedk.qmd new file mode 100644 index 00000000..1689899e --- /dev/null +++ b/access/genomedk.qmd @@ -0,0 +1 @@ +sss \ No newline at end of file diff --git a/develop/access/index.md b/access/index.qmd similarity index 83% rename from develop/access/index.md rename to access/index.qmd index 576f4947..15927a5a 100644 --- a/develop/access/index.md +++ b/access/index.qmd @@ -1,31 +1,44 @@ ---- -layout: default -title: HPC Access -has_children: true -nav_order: 3 -permalink: /access -hide: - - footer - - toc ---- - -

HPC Access

- -The Sandbox is collaborating with the two major academic high performance computing platforms in Denmark. Computerome is located at the Technical University of Denmark (and co-owned by the University of Copenhagen) while UCloud is owned by the University of Southern Denmark. These HPC platforms each have their own strengths which we leverage in the Sandbox in different ways. - ---- - -

Computerome

- -[Computerome](https://computerome.dk/) is the home of many sensitive health datasets via collaborations between DTU, KU, Rigshospitalet, and other major health sector players in the Capital Region of Denmark. Computerome has recently launched their secure cloud platform, [DELPHI](https://computerome.dk/solutions/delphi), and in collaboration with the Sandbox has built a [Course Platform](https://computerome.dk/solutions/course-platform) on the same backbone such that courses and training can be conducted in the same environment as real research would be performed in the secure cloud. The Sandbox is supporting courses in the Course Platform, but it is also available for independent use by educators at Danish universities. Please see their website for more information on independent use and pricing, and [contact us](mailto:nhds_sandbox@sund.ku.dk) if you'd like to collaborate on hosting a course on Computerome. We can help with tool installation, environment testing, and user support (ranging from using the environment to course content if we have Sandbox staff with matching expertise). - -Participants in courses co-hosted by the Sandbox can check [here](https://hds-sandbox.github.io/access/Computerome.html) for access instructions. - ---- - -

UCloud

- -[UCloud](https://cloud.sdu.dk/app/login) is a relatively new HPC platform that can be accessed by students at Danish universities (via a WAYF university login). It has a user friendly graphical user interface that supports straightforward project, user, and resource management. UCloud provides access to many tools via selectable Apps matched with a range of flexible compute resources, and the Sandbox is deploying training modules in this form such that any UCloud user can easily access Sandbox materials independently. The Sandbox is also hosting workshops and training events on UCloud in conjunction with in-person training. - -Check out UCloud's extensive user docs [here](https://docs.cloud.sdu.dk/index.html) and instructions for how to access Sandbox apps [here](https://hds-sandbox.github.io/access/UCloud.html). - +--- +title: "HPC access" +format: html +date-modified: last-modified +date-format: long +date: 2024-01-01 +--- + + +The Sandbox is collaborating with the two major academic high performance computing platforms in Denmark. Computerome is located at the Technical University of Denmark (and co-owned by the University of Copenhagen) while UCloud is owned by the University of Southern Denmark. These HPC platforms each have their own strengths which we leverage in the Sandbox in different ways. + +--- + +## Computerome + +[Computerome](https://computerome.dk/) is the home of many sensitive health datasets via collaborations between DTU, KU, Rigshospitalet, and other major health sector players in the Capital Region of Denmark. Computerome has recently launched their secure cloud platform, [DELPHI](https://computerome.dk/solutions/delphi), and in collaboration with the Sandbox has built a [Course Platform](https://computerome.dk/solutions/course-platform) on the same backbone such that courses and training can be conducted in the same environment as real research would be performed in the secure cloud. The Sandbox is supporting courses in the Course Platform, but it is also available for independent use by educators at Danish universities. Please see their website for more information on independent use and pricing, and [contact us](mailto:nhds_sandbox@sund.ku.dk) if you'd like to collaborate on hosting a course on Computerome. We can help with tool installation, environment testing, and user support (ranging from using the environment to course content if we have Sandbox staff with matching expertise). + +Participants in courses co-hosted by the Sandbox can check [here](Computerome.qmd) for access instructions. + +--- + +## Ucloud + +[UCloud](https://cloud.sdu.dk/app/login) is a relatively new HPC platform that can be accessed by students at Danish universities (via a WAYF university login). It has a user friendly graphical user interface that supports straightforward project, user, and resource management. UCloud provides access to many tools via selectable Apps matched with a range of flexible compute resources, and the Sandbox is deploying training modules in this form such that any UCloud user can easily access Sandbox materials independently. The Sandbox is also hosting workshops and training events on UCloud in conjunction with in-person training. + +Check out UCloud's extensive user docs [here](https://docs.cloud.sdu.dk/index.html) and instructions for how to access Sandbox apps [here](UCloud.qmd). + +--- + +## GenomeDK + +TBA soon + +--- + +## Any other computing cluster + +TBA soon + +--- + +## Your local PC + +TBA soon \ No newline at end of file diff --git a/access/other.qmd b/access/other.qmd new file mode 100644 index 00000000..1689899e --- /dev/null +++ b/access/other.qmd @@ -0,0 +1 @@ +sss \ No newline at end of file diff --git a/cards/AlbaMartinez.qmd b/cards/AlbaMartinez.qmd new file mode 100644 index 00000000..4a0045db --- /dev/null +++ b/cards/AlbaMartinez.qmd @@ -0,0 +1,14 @@ +--- +title: "Alba Refoyo Martinez" +description: "Data Scientist, Copenhagen University" +#https://emojidb.org/quarto-emojis for emojis copypaste +--- + +![](alba.jpg){width=300} + + +[![](LinkedIN.png){width=50}](https://dk.linkedin.com/in/albarefoyo) [![](Github_black.png){width=50}](https://github.com/albarema) [![](Outlook.png){width=50}](alba.martinez@sund.ku.dk?subject=Test) [![](mapPin.png){width=50}](https:heads.ku.dk) + + +Alba is a Sandbox data scientist based at the University of Copenhagen. During her academic background as a PhD and Postdoc she has developed a solid expertise in large-scale genomics and pipelines development on computing clusters. + diff --git a/cards/Github.png b/cards/Github.png new file mode 100644 index 00000000..c10eda31 Binary files /dev/null and b/cards/Github.png differ diff --git a/cards/Github_black.png b/cards/Github_black.png new file mode 100644 index 00000000..68024f40 Binary files /dev/null and b/cards/Github_black.png differ diff --git a/cards/JacobHansen.qmd b/cards/JacobHansen.qmd new file mode 100644 index 00000000..57f9d03f --- /dev/null +++ b/cards/JacobHansen.qmd @@ -0,0 +1,15 @@ +--- +title: "Jacob Fredegaard Hansen" +description: "Data Scientist, Southern Denmark University" +#https://emojidb.org/quarto-emojis for emojis copypaste +--- + +![](jaCob.jpg){width=300} + + +[![](LinkedIN.png){width=50}](https://www.linkedin.com/in/jacobfh) [![](Github_black.png){width=50}](https://github.com/jacobfh1) [![](Outlook.png){width=50}](jfredegaard@bmb.sdu.dk?subject=Test) [![](mapPin.png){width=50}](https://www.sdu.dk/en/forskning/biomedical-ms) [![](www.png){width=50}](https://jacobfh.com/) + + +Jacob is a Sandbox data scientist based at the university of Southern Danmark in Odense, and he is specialized in the proteomics applications. + + diff --git a/cards/JakobSkelmose.qmd b/cards/JakobSkelmose.qmd new file mode 100644 index 00000000..87a70184 --- /dev/null +++ b/cards/JakobSkelmose.qmd @@ -0,0 +1,13 @@ +--- +title: "Jakob Skelmose" +description: "Data Scientist, Aalborg University" +#https://emojidb.org/quarto-emojis for emojis copypaste +--- + +![](jaKob.jpg){width=300} + + +[![](Github_black.png){width=50}](https://github.com/tzuV) [![](Outlook.png){width=50}](mailto:jbks@dcm.aau.dk?subject=Test) [![](mapPin.png){width=50}](https://www.klinisk.aau.dk/forskning/forskningsenheder-og-centre/center-for-clinical-data-science) [![](www.png){width=50}](https://vbn.aau.dk/da/persons/148971) + + +Jakob is a Sandbox data scientist based at the university of Aalborg. His work is mainly focused on applications related to synthetic data, both for reserch and teaching purposes. \ No newline at end of file diff --git a/cards/JenniferBartell.qmd b/cards/JenniferBartell.qmd new file mode 100644 index 00000000..ab6c5d8c --- /dev/null +++ b/cards/JenniferBartell.qmd @@ -0,0 +1,12 @@ +--- +title: "Jennifer Bartell" +description: "Data Scientist and Project coordinator, Copenhagen University" +#https://emojidb.org/quarto-emojis for emojis copypaste +--- + +![](jennie.jpg){width=300} + + +[![](LinkedIN.png){width=50}](https://dk.linkedin.com/in/jagbartell) [![](Outlook.png){width=50}](mailto:bartell@sund.ku.dk?subject=Test) [![](mapPin.png){width=50}](https://heads.ku.dk/) + +Jenifer is the Sandbox project coordinator, beyond being also a data scientist. She is based at the university of Copenhagen. Jennifer has a long experience in bacterial genomics and metabolomics, transcriptomics and pathway analysis. \ No newline at end of file diff --git a/cards/LinkedIN.png b/cards/LinkedIN.png new file mode 100644 index 00000000..e4020d9d Binary files /dev/null and b/cards/LinkedIN.png differ diff --git a/cards/Outlook.png b/cards/Outlook.png new file mode 100644 index 00000000..97ac8ab6 Binary files /dev/null and b/cards/Outlook.png differ diff --git a/cards/SamueleSoraggi.qmd b/cards/SamueleSoraggi.qmd new file mode 100644 index 00000000..954c3840 --- /dev/null +++ b/cards/SamueleSoraggi.qmd @@ -0,0 +1,13 @@ +--- +title: "Samuele Soraggi" +description: "Data Scientist, Aarhus University" +from: "markdown+emoji" +#https://emojidb.org/quarto-emojis for emojis copypaste +--- + +![](samuele.jpg){width=300} + + +[![](LinkedIN.png){width=50}](https://www.linkedin.com/in/samuelesoraggi/) [![](Github_black.png){width=50}](https://github.com/SamueleSoraggi) [![](Outlook.png){width=50}](mailto:samuele@birc.au.dk?subject=Test) [![](mapPin.png){width=50}](https://birc.au.dk/) [![](www.png){width=50}](https://samuelesoraggi.github.io) + +Samuele is a Sandbox data scientist based at the university of Aarhus. During his academic activity he has gained experience in population genomics, transcriptomics, single cell multiomics and spans his knowledge across various themes of advanced computational statistics. diff --git a/cards/alba.jpg b/cards/alba.jpg new file mode 100644 index 00000000..94bd3b19 Binary files /dev/null and b/cards/alba.jpg differ diff --git a/cards/alexJose.jpg b/cards/alexJose.jpg new file mode 100644 index 00000000..10032161 Binary files /dev/null and b/cards/alexJose.jpg differ diff --git a/cards/jaCob.jpg b/cards/jaCob.jpg new file mode 100644 index 00000000..6f0adfb1 Binary files /dev/null and b/cards/jaCob.jpg differ diff --git a/cards/jaKob.jpg b/cards/jaKob.jpg new file mode 100644 index 00000000..8f0ddbf6 Binary files /dev/null and b/cards/jaKob.jpg differ diff --git a/cards/jarh.jpeg b/cards/jarh.jpeg new file mode 100644 index 00000000..cf24b123 Binary files /dev/null and b/cards/jarh.jpeg differ diff --git a/cards/jennie.jpg b/cards/jennie.jpg new file mode 100644 index 00000000..55b81815 Binary files /dev/null and b/cards/jennie.jpg differ diff --git a/cards/mapPin.png b/cards/mapPin.png new file mode 100644 index 00000000..fef21f07 Binary files /dev/null and b/cards/mapPin.png differ diff --git a/cards/samuele.jpg b/cards/samuele.jpg new file mode 100644 index 00000000..76445d03 Binary files /dev/null and b/cards/samuele.jpg differ diff --git a/cards/www.png b/cards/www.png new file mode 100644 index 00000000..baee2722 Binary files /dev/null and b/cards/www.png differ diff --git a/develop/contact/contact.md b/contact/contact.md similarity index 98% rename from develop/contact/contact.md rename to contact/contact.md index 0e3b015d..5fea32e2 100644 --- a/develop/contact/contact.md +++ b/contact/contact.md @@ -1,36 +1,36 @@ ---- -layout: pages -title: Contact -permalink: /contact -nav_order: 6 -hide: - - footer - - toc - - navigation ---- - -
-# Contact the Sandbox -
- -The Health Data Science Sandbox is coordinated by the [Center for Health Data Science](https://heads.ku.dk/) -at the University of Copenhagen (KU). Sandbox data scientists are also placed in collaborating groups at -the Technical University of Denmark (DTU), University of Southern Denmark (SDU), Aarhus University (AU), -and Aalborg University (AAU). - -To get in touch with the Sandbox or be connected with Sandbox staff at your university, -please [email us](mailto:nhds_sandbox@sund.ku.dk). To obtain module material for use in your -own compute environment, see our GitHub organization page -at [hds-sandbox](https://github.com/hds-sandbox/). - -| Member | Role | Institution | PI | -|-------------------------------|--------------------------------------|------------------------------------|------------------------------------| -| Jennifer Bartell | Project Coordinator / Data Scientist | Center for Health Data Science, KU |Anders Krogh | -| Alba Refoyo Martinez | Data Scientist | Center for Health Data Science, KU |Anders Krogh | -| Jakob Skelmose | Data Scientist | Department of Clinical Medicine, AAU|Martin Boegsted | -| Samuele Soraggi | Data Scientist | Bioinformatics Research Centre, AU |Mikkel Schierup | -| Jesper Roy Christiansen | Data Scientist | Computerome, DTU |Peter Loengreen | -| Jacob Fredegaard Hansen | Data Scientist | Department of Biochemistry and Molecular Biology, SDU|Ole Noerregaard Jensen| - - -We appreciate the contributions of previous team members José Alejandro Romero Herrera (KU), Conor O'Hare (KU), Sander Boisen Valentin (AAU) and Peter Husen (SDU). +--- +layout: pages +title: Contact +permalink: /contact +nav_order: 6 +hide: + - footer + - toc + - navigation +--- + +
+# Contact the Sandbox +
+ +The Health Data Science Sandbox is coordinated by the [Center for Health Data Science](https://heads.ku.dk/) +at the University of Copenhagen (KU). Sandbox data scientists are also placed in collaborating groups at +the Technical University of Denmark (DTU), University of Southern Denmark (SDU), Aarhus University (AU), +and Aalborg University (AAU). + +To get in touch with the Sandbox or be connected with Sandbox staff at your university, +please [email us](mailto:nhds_sandbox@sund.ku.dk). To obtain module material for use in your +own compute environment, see our GitHub organization page +at [hds-sandbox](https://github.com/hds-sandbox/). + +| Member | Role | Institution | PI | +|-------------------------------|--------------------------------------|------------------------------------|------------------------------------| +| Jennifer Bartell | Project Coordinator / Data Scientist | Center for Health Data Science, KU |Anders Krogh | +| Alba Refoyo Martinez | Data Scientist | Center for Health Data Science, KU |Anders Krogh | +| Jakob Skelmose | Data Scientist | Department of Clinical Medicine, AAU|Martin Boegsted | +| Samuele Soraggi | Data Scientist | Bioinformatics Research Centre, AU |Mikkel Schierup | +| Jesper Roy Christiansen | Data Scientist | Computerome, DTU |Peter Loengreen | +| Jacob Fredegaard Hansen | Data Scientist | Department of Biochemistry and Molecular Biology, SDU|Ole Noerregaard Jensen| + + +We appreciate the contributions of previous team members José Alejandro Romero Herrera (KU), Conor O'Hare (KU), Sander Boisen Valentin (AAU) and Peter Husen (SDU). diff --git a/develop/contributors.md b/contributors.md similarity index 100% rename from develop/contributors.md rename to contributors.md diff --git a/css/materiadark.scss b/css/materiadark.scss new file mode 100644 index 00000000..b7654ec8 --- /dev/null +++ b/css/materiadark.scss @@ -0,0 +1,7 @@ +/*-- scss:defaults --*/ + +// Base document colors + +//$navbar-bg: #4266A1; +//$sidebar-bg: white; +//$footer-bg: #4266A1; \ No newline at end of file diff --git a/css/materialight.scss b/css/materialight.scss new file mode 100644 index 00000000..de4eadae --- /dev/null +++ b/css/materialight.scss @@ -0,0 +1,7 @@ +/*-- scss:defaults --*/ + +// Base document colors + +$navbar-bg: #4266A1; +$sidebar-bg: white; +$footer-bg: #4266A1; \ No newline at end of file diff --git a/css/styles.css b/css/styles.css new file mode 100644 index 00000000..e69de29b diff --git a/develop/datasets/datapolicy.md b/datasets/datapolicy.qmd similarity index 90% rename from develop/datasets/datapolicy.md rename to datasets/datapolicy.qmd index d7f9b644..bb3b00eb 100644 --- a/develop/datasets/datapolicy.md +++ b/datasets/datapolicy.qmd @@ -1,40 +1,36 @@ ---- -layout: page -title: Data policy -parent: Datasets -has_children: true -nav_order: 1 -hide: - - footer - - toc ---- - -

Data policy

-

with respect to person-specific datasets

- -A priority of the Sandbox is to guide health data science learning using real-world-similar datasets. A major component is addressing how to analyze and leverage person-specific data, such as electronic health records, without invading personal privacy or straying from GDPR guidelines on sensitive data use. We are therefore focused on using either publicly accessible datasets (that are generally well anonymized to enable such release) or we are using/creating synthetic datasets that mimic real-world datasets without replicating real people's data such that they can be identified. In either case, it is essential for Sandbox users to treat person-specific data respectfully and be aware of the additional responsibility and limitations of working with this type of data as part of their career in health data science. - -We recommend that users interested in this type of data **complete an ethics course on research using health datasets** before digging into any analysis. A well regarded course that is also often required for using public databases that contain person-specific data is the Human Subject and Data Research Ethics course designed by the Massachusetts Institute of Technology. The course is hosted at [CITI](https://about.citiprogram.org/), the Collaborative Institutional Training Initiative. Completing the course is free of charge and provides you with a certificate which you may need to upload to certain databases to gain access. Set up an account at CITI, add an Institutional affiliation with 'Massachusetts Institute of Technology Affiliates', and then find and complete the course titled 'Data or Specimens Only Research' to obtain a certificate (in pdf form). - ---- - -## Public domain data -The intended scope of the Sandbox is broad, and we will be pulling from many different public access databases (especially for training modules on omics analysis). Databases can be topically broad, giant repositories or field-specific, and each may have its own usage rules. We plan to provide our own copies of publically available datasets where allowed to ensure compatibility with the linked module is preserved, but some datasets may need to be downloaded by users themselves under specific access / distribution restrictions. Many omics datasets do not present significant data sensitivity concerns in comparison to real-world data such as electronic health records (EHRs) and clinical trial datasets. - -There are large public de-identified EHR datasets that serve as benchmark resources for teaching and comparing new methods with old, but these are not numerous and often have restricted usage and sharing terms in addition to being quite dated. Historical approaches to dataset anonymization and de-identification have been substantially challenged in the age of digitalized healthcare and increasing data integration, which means meaningfully large 'anonymized' datasets are now rarely released. - ---- - -## Synthetic data - - - - - -Via our collaborators and broader network, the Sandbox has the opportunity to simulate/synthesize data resembling different databases and registries from the Danish health sector. We are exploring methods of creating useful synthetic datasets with national and EU-level data access policies and GDPR restrictions in mind, while developing initial datasets using publicly available data from Danish research studies and other resources. - -Ultimately, a new era of synthetic data is rapidly developing. The funded Sandbox proposal focused on generating synthetic data using mechanistic models, agent-based models, or draws from multivariate distributions (such as copulas), which are methods that do not present any significant GDPR-related concerns with sharing the produced datasets as they are derived from population-level characteristics and prior knowledge. However, new deep learning-based methods of data synthesis can theoretically learn complex, nonlinear patterns within a sensitive dataset and generate a synthetic dataset that replicates these patterns. This is a really promising approach for sharing high utility synthetic datasets, but it also elevates risk of accidentally sharing too much about the real dataset and skirting the boundaries of GDPR and ethical data handling. There is an inherent trade-off between privacy preservation and similarity of the synthetic dataset to the original dataset, with method development focused on moving closer to the ideal zone of high privacy AND high similarity. The figure at right is a rough approximation of this relationship versus current families of synthesis methods. - -Please see [Synthetic Data](https://hds-sandbox.github.io/datasets/synthdata.html) for more information about our approach to this technology. +--- +title: "Data policy" +format: html +date-modified: last-modified +date-format: long +date: 2024-01-01 +--- + +## With respect to person-specific datasets + +A priority of the Sandbox is to guide health data science learning using real-world-similar datasets. A major component is addressing how to analyze and leverage person-specific data, such as electronic health records, without invading personal privacy or straying from GDPR guidelines on sensitive data use. We are therefore focused on using either publicly accessible datasets (that are generally well anonymized to enable such release) or we are using/creating synthetic datasets that mimic real-world datasets without replicating real people's data such that they can be identified. In either case, it is essential for Sandbox users to treat person-specific data respectfully and be aware of the additional responsibility and limitations of working with this type of data as part of their career in health data science. + +We recommend that users interested in this type of data **complete an ethics course on research using health datasets** before digging into any analysis. A well regarded course that is also often required for using public databases that contain person-specific data is the Human Subject and Data Research Ethics course designed by the Massachusetts Institute of Technology. The course is hosted at [CITI](https://about.citiprogram.org/), the Collaborative Institutional Training Initiative. Completing the course is free of charge and provides you with a certificate which you may need to upload to certain databases to gain access. Set up an account at CITI, add an Institutional affiliation with 'Massachusetts Institute of Technology Affiliates', and then find and complete the course titled 'Data or Specimens Only Research' to obtain a certificate (in pdf form). + +--- + +## Public domain data +The intended scope of the Sandbox is broad, and we will be pulling from many different public access databases (especially for training modules on omics analysis). Databases can be topically broad, giant repositories or field-specific, and each may have its own usage rules. We plan to provide our own copies of publically available datasets where allowed to ensure compatibility with the linked module is preserved, but some datasets may need to be downloaded by users themselves under specific access / distribution restrictions. Many omics datasets do not present significant data sensitivity concerns in comparison to real-world data such as electronic health records (EHRs) and clinical trial datasets. + +There are large public de-identified EHR datasets that serve as benchmark resources for teaching and comparing new methods with old, but these are not numerous and often have restricted usage and sharing terms in addition to being quite dated. Historical approaches to dataset anonymization and de-identification have been substantially challenged in the age of digitalized healthcare and increasing data integration, which means meaningfully large 'anonymized' datasets are now rarely released. + +--- + +## Synthetic data + +![]("../images/Tradeoff_base.png"){width=400 fig-align="right"} + + + +Via our collaborators and broader network, the Sandbox has the opportunity to simulate/synthesize data resembling different databases and registries from the Danish health sector. We are exploring methods of creating useful synthetic datasets with national and EU-level data access policies and GDPR restrictions in mind, while developing initial datasets using publicly available data from Danish research studies and other resources. + +Ultimately, a new era of synthetic data is rapidly developing. The funded Sandbox proposal focused on generating synthetic data using mechanistic models, agent-based models, or draws from multivariate distributions (such as copulas), which are methods that do not present any significant GDPR-related concerns with sharing the produced datasets as they are derived from population-level characteristics and prior knowledge. However, new deep learning-based methods of data synthesis can theoretically learn complex, nonlinear patterns within a sensitive dataset and generate a synthetic dataset that replicates these patterns. This is a really promising approach for sharing high utility synthetic datasets, but it also elevates risk of accidentally sharing too much about the real dataset and skirting the boundaries of GDPR and ethical data handling. There is an inherent trade-off between privacy preservation and similarity of the synthetic dataset to the original dataset, with method development focused on moving closer to the ideal zone of high privacy AND high similarity. The figure at right is a rough approximation of this relationship versus current families of synthesis methods. + +Please see [Synthetic Data](synthdata.qmd) for more information about our approach to this technology. diff --git a/develop/datasets/datasets.md b/datasets/datasets.md similarity index 95% rename from develop/datasets/datasets.md rename to datasets/datasets.md index 9ba23953..718c27c8 100644 --- a/develop/datasets/datasets.md +++ b/datasets/datasets.md @@ -1,11 +1,11 @@ ---- -layout: default -title: Datasets -has_children: true -nav_order: 3 -permalink: /datasets ---- - -# Datasets - -Here we provide details of datasets used in our various modules as well as a specific guide on using electronic health record datasets. +--- +layout: default +title: Datasets +has_children: true +nav_order: 3 +permalink: /datasets +--- + +# Datasets + +Here we provide details of datasets used in our various modules as well as a specific guide on using electronic health record datasets. diff --git a/develop/datasets/index.md b/datasets/index.md similarity index 99% rename from develop/datasets/index.md rename to datasets/index.md index c1244694..7844fe50 100644 --- a/develop/datasets/index.md +++ b/datasets/index.md @@ -1,26 +1,26 @@ ---- -layout: default -title: Datasets -has_children: true -nav_order: 3 -permalink: /datasets -hide: - - footer - - toc ---- - -# Data policy with respect to person-specific datasets - -A priority of the Sandbox is to guide health data science learning using real-world-similar datasets. A major component is addressing how to analyze and leverage person-specific data, such as electronic health records, without invading personal privacy or straying from GDPR guidelines on sensitive data use. We are therefore focused on using either publicly accessible datasets (that are generally well anonymized to enable such release) or we are using/creating synthetic datasets that mimic real-world datasets without replicating real people's data such that they can be identified. In either case, it is essential for Sandbox users to treat person-specific data respectfully and be aware of the additional responsibility and limitations of working with this type of data as part of their career in health data science. - -We recommend that users interested in this type of data **complete an ethics course on research using health datasets** before digging into any analysis. A well regarded course that is also often required for using public databases that contain person-specific data is the Human Subject and Data Research Ethics course designed by the Massachusetts Institute of Technology. The course is hosted at [CITI](https://about.citiprogram.org/), the Collaborative Institutional Training Initiative. Completing the course is free of charge and provides you with a certificate which you may need to upload to certain databases to gain access. Set up an account at CITI, add an Institutional affiliation with 'Massachusetts Institute of Technology Affiliates', and then find and complete the course titled 'Data or Specimens Only Research' to obtain a certificate (in pdf form). - -## Public domain data -The intended scope of the Sandbox is broad, and we will be pulling from many different public access databases in our development of teaching modules. There are classical datasets that serve as benchmark resources for teaching and comparing new methods with old, and also brand new datasets that will support modules on emerging technologies (such as spatial single cell RNA-seq analysis). Databases can be topically broad giant repositories or field-specific, and each may have its own usage rules. We plan to provide our own copies of publically available datasets where allowed to ensure compatibility with the linked module is preserved, but some datasets may need to be downloaded by users themselves under specific access / distribution restrictions. - -## Synthetic/Simulated data -The Sandbox is focused on supporting Danish health data science education and research. Via our collaborators and broader network, we have the opportunity to simulate/synthesize data resembling different databases and registries from the Danish health sector in addition to using traditional data simulation techniques to replicate general datasets. We are exploring methods of creating useful synthetic datasets with local access guidelines/GDPR restrictions in mind, while developing initial datasets using published data from Danish studies and publically available resources. - -
- ![workflow](../assets/images/SynthDataQualities.png) -
+--- +layout: default +title: Datasets +has_children: true +nav_order: 3 +permalink: /datasets +hide: + - footer + - toc +--- + +# Data policy with respect to person-specific datasets + +A priority of the Sandbox is to guide health data science learning using real-world-similar datasets. A major component is addressing how to analyze and leverage person-specific data, such as electronic health records, without invading personal privacy or straying from GDPR guidelines on sensitive data use. We are therefore focused on using either publicly accessible datasets (that are generally well anonymized to enable such release) or we are using/creating synthetic datasets that mimic real-world datasets without replicating real people's data such that they can be identified. In either case, it is essential for Sandbox users to treat person-specific data respectfully and be aware of the additional responsibility and limitations of working with this type of data as part of their career in health data science. + +We recommend that users interested in this type of data **complete an ethics course on research using health datasets** before digging into any analysis. A well regarded course that is also often required for using public databases that contain person-specific data is the Human Subject and Data Research Ethics course designed by the Massachusetts Institute of Technology. The course is hosted at [CITI](https://about.citiprogram.org/), the Collaborative Institutional Training Initiative. Completing the course is free of charge and provides you with a certificate which you may need to upload to certain databases to gain access. Set up an account at CITI, add an Institutional affiliation with 'Massachusetts Institute of Technology Affiliates', and then find and complete the course titled 'Data or Specimens Only Research' to obtain a certificate (in pdf form). + +## Public domain data +The intended scope of the Sandbox is broad, and we will be pulling from many different public access databases in our development of teaching modules. There are classical datasets that serve as benchmark resources for teaching and comparing new methods with old, and also brand new datasets that will support modules on emerging technologies (such as spatial single cell RNA-seq analysis). Databases can be topically broad giant repositories or field-specific, and each may have its own usage rules. We plan to provide our own copies of publically available datasets where allowed to ensure compatibility with the linked module is preserved, but some datasets may need to be downloaded by users themselves under specific access / distribution restrictions. + +## Synthetic/Simulated data +The Sandbox is focused on supporting Danish health data science education and research. Via our collaborators and broader network, we have the opportunity to simulate/synthesize data resembling different databases and registries from the Danish health sector in addition to using traditional data simulation techniques to replicate general datasets. We are exploring methods of creating useful synthetic datasets with local access guidelines/GDPR restrictions in mind, while developing initial datasets using published data from Danish studies and publically available resources. + +
+ ![workflow](../assets/images/SynthDataQualities.png) +
diff --git a/develop/datasets/synthdata.md b/datasets/synthdata.qmd similarity index 82% rename from develop/datasets/synthdata.md rename to datasets/synthdata.qmd index bbd21906..ae8c5a11 100644 --- a/develop/datasets/synthdata.md +++ b/datasets/synthdata.qmd @@ -1,50 +1,50 @@ ---- -layout: page -title: Synthetic data -parent: Datasets -has_children: true -nav_order: 1 -hide: - - footer - - toc ---- - -

Synthetic data

-

progress and challenges

- - -## Defining synthetic data -It is necessary to clarify what we mean when we refer to synthetic data within the Sandbox project. While the term has been used for decades to describe all kinds of 'non-real' data including those derived from models and simulations, developments in deep generative modeling have dramatically expanded our understanding of what synthetic data can be. In the age of deepfakes and news articles written entirely by ChatGPT, synthetic data derived from deep learning is in a wholly different class from data simulated with a mechanistic or agent-based model. - -The Sandbox is actually interested in any form of synthetic data - our highest priority is providing safe-to-use data to trainees and researchers that does not raise any concerns about sensitive data with respect to the EU's General Data Protection Regulation (GDPR) and local Danish data regulations. So, we are using both old school and new school forms of data synthesis. However, the discussion on this page is heavily weighted towards our interest in new school synthesis - with our connections to generative modeling researchers and high quality data, we are naturally interested in figuring out a safe way to deploy synthetic datasets derived from deep learning and other high similarity approaches. - -??? info "The TLDR for synthetic data in the Sandbox" - - The development of synthetic datasets should be viewed as a research project. The technology is generally untested with few examples of public roll-out, and its deployment should be future-proofed as much as possible against attacks and potential sensitive data disclosure. - - Synthetic data generation and evaluation approaches should be tailored to each dataset of interest. With current technology, it is unlikely that high quality, safe-to-share datasets will be produced at any kind of production scale without a massive effort devoted to pre-processing, data harmonization, and customized routines for different families of datasets. - - The Sandbox is not openly sharing any synthetic datasets generated from person-specific sensitive data. We think these datasets will be useful to approved researchers that ideally gain access via an approved data portal with registration and data use agreements with relevant data authorities. We are not currently that portal. - -## Generating synthetic data -We have explored the performance of copulas, multiple imputation, sequential synthesis, and several generative adversarial network (GAN) approaches with a cancer dataset which we were developing for a course in the MS in Personal Medicine program at University of Copenhagen. We quickly discovered that factors such as missingness, collinearity, and the ratio of patients to features cause just as many problems for synthetic data generation as they do in predictive modeling. We are currently evaluating the above techniques as well as additional deep learning approaches such as variational autoencoders (VAEs) and Bayesian graphs against a collection of benchmark health datasets to better understand the positives and negatives of each technique when faced with common challenges in real world health data. - -Recently, a few interesting libraries / pipelines have been released that enable testing of different synthetic data generation approaches alongside a range of evaluation metrics. We are actively exploring these tools as we test different generation approaches and examining their implementation of evaluation metrics. We plan to add additional components and features as we resolve challenges with different target datasets. - -## Evaluating synthetic data -There are 3 key principles to consider when judging the overall quality of a synthetic dataset: fidelity to the original dataset, risk to privacy, and prediction utility. Fidelity and utility are often grouped together as similarity to the original data which exists in a trade-off with privacy - the more similar your synthetic dataset to the original, the higher your risk to patient privacy. However, the distinction between them is important as they can be achieved independently of each other depending on the project frame. Fidelity refers to reproduction of the multivariate shape and structure of the original data (including complex nonlinear relationships) while utility refers to how well the synthetic dataset matches the predictive accuracy of the original dataset. Risk to privacy includes both risk of patient reidentification and risk of sensitive information disclosure about a patient. There are many proposed evaluation metrics for measuring different aspects of these three qualities. We are actively investigating the performance of these metrics against our different datasets. - -
- ![workflow](../assets/images/SynthDataQualities.png) -
- -We should point out that while using quantitative metrics to assess privacy preservation is a critical step in creating a synthetic dataset, positive results do not absolve us from any concerns regarding risk to privacy in the synthetic data. Regulatory guidelines regarding the safety of synthetic data and the ability to openly share it are extremely unclear. No authorities have specified quantitative cut-offs using these metrics that enable open release, for example. For this reason, we have developed our own internal guidelines for how to handle this aspect of the project, which are based on a comprehensive examination of relevant EU and Danish legislation (i.e. the GDPR, the Artificial Intelligence Act, the Danish Health Law, and the Danish Data Protection Act). We continue work on synthesis with hope that new legislation such as the development of the European Health Data Space will provide further guidance in the future. - -## Rules for synthetic data in the Sandbox -We are currently focused on exploring methods and metrics by developing reproducible, well documented examples and use cases of synthetic data in partnership with other researchers, legal advisors, and data authorities. We're relying primarily on publicly available tabular health datasets in this exploration phase, but we will also work with sensitive data in the future. Our rules aim to preserve the trust of the public in how their health data is handled by data authorities and researchers. - -!!! tip "Sandbox Rules for Synthetic Data" - 1. Creation of synthetic data involves processing sensitive data, and this requires obtaining project approvals from data authorities when performing this work on sensitive data. Any synthetic data work with restricted-access, sensitive data by the Sandbox will only be conducted with these approvals in place in the frame of a research project. - 2. Goals for each synthetic dataset project should be defined at project initiation: how will the synthetic dataset be used, who is the intended audience, and how might it be shared? This frame should govern every consequent decision for that dataset and be shared alongside the final dataset. - 3. Quantitative metrics for fidelity, utility, and privacy preservation should be implemented for each dataset and shared alongside the final dataset. - 4. A cost-benefit analysis should be performed after the project is completed - is any risk to privacy appropriately balanced by value of the dataset in achieving its stated aims and contributing to the public good? - 5. Data authorities with ethical and strategic stakes in who accesses the synthetic dataset should be included in decisions about how it is used and who is allowed to access it. - 6. Synthetic datasets created from person-specific sensitive data rather than population characteristics can still pose privacy risks, and any users of the dataset should be approved and registered. The Sandbox will not release any such datasets publicly and will instead work with appropriate data authorities to decide how such datasets should be governed in a responsible way. - +--- +title: "Synthetic data" +format: html +date-modified: last-modified +date-format: long +date: 2024-01-01 +--- + +# progress and challenges + + +## Defining synthetic data +It is necessary to clarify what we mean when we refer to synthetic data within the Sandbox project. While the term has been used for decades to describe all kinds of 'non-real' data including those derived from models and simulations, developments in deep generative modeling have dramatically expanded our understanding of what synthetic data can be. In the age of deepfakes and news articles written entirely by ChatGPT, synthetic data derived from deep learning is in a wholly different class from data simulated with a mechanistic or agent-based model. + +The Sandbox is actually interested in any form of synthetic data - our highest priority is providing safe-to-use data to trainees and researchers that does not raise any concerns about sensitive data with respect to the EU's General Data Protection Regulation (GDPR) and local Danish data regulations. So, we are using both old school and new school forms of data synthesis. However, the discussion on this page is heavily weighted towards our interest in new school synthesis - with our connections to generative modeling researchers and high quality data, we are naturally interested in figuring out a safe way to deploy synthetic datasets derived from deep learning and other high similarity approaches. + +:::{.callout-note title="The TLDR for synthetic data in the Sandbox"} + - The development of synthetic datasets should be viewed as a research project. The technology is generally untested with few examples of public roll-out, and its deployment should be future-proofed as much as possible against attacks and potential sensitive data disclosure. + - Synthetic data generation and evaluation approaches should be tailored to each dataset of interest. With current technology, it is unlikely that high quality, safe-to-share datasets will be produced at any kind of production scale without a massive effort devoted to pre-processing, data harmonization, and customized routines for different families of datasets. + - The Sandbox is not openly sharing any synthetic datasets generated from person-specific sensitive data. We think these datasets will be useful to approved researchers that ideally gain access via an approved data portal with registration and data use agreements with relevant data authorities. We are not currently that portal. +::: + +## Generating synthetic data + +We have explored the performance of copulas, multiple imputation, sequential synthesis, and several generative adversarial network (GAN) approaches with a cancer dataset which we were developing for a course in the MS in Personal Medicine program at University of Copenhagen. We quickly discovered that factors such as missingness, collinearity, and the ratio of patients to features cause just as many problems for synthetic data generation as they do in predictive modeling. We are currently evaluating the above techniques as well as additional deep learning approaches such as variational autoencoders (VAEs) and Bayesian graphs against a collection of benchmark health datasets to better understand the positives and negatives of each technique when faced with common challenges in real world health data. + +Recently, a few interesting libraries / pipelines have been released that enable testing of different synthetic data generation approaches alongside a range of evaluation metrics. We are actively exploring these tools as we test different generation approaches and examining their implementation of evaluation metrics. We plan to add additional components and features as we resolve challenges with different target datasets. + +## Evaluating synthetic data + +There are 3 key principles to consider when judging the overall quality of a synthetic dataset: fidelity to the original dataset, risk to privacy, and prediction utility. Fidelity and utility are often grouped together as similarity to the original data which exists in a trade-off with privacy - the more similar your synthetic dataset to the original, the higher your risk to patient privacy. However, the distinction between them is important as they can be achieved independently of each other depending on the project frame. Fidelity refers to reproduction of the multivariate shape and structure of the original data (including complex nonlinear relationships) while utility refers to how well the synthetic dataset matches the predictive accuracy of the original dataset. Risk to privacy includes both risk of patient reidentification and risk of sensitive information disclosure about a patient. There are many proposed evaluation metrics for measuring different aspects of these three qualities. We are actively investigating the performance of these metrics against our different datasets. + + +![](../images/SynthDataQualities.png) + + +We should point out that while using quantitative metrics to assess privacy preservation is a critical step in creating a synthetic dataset, positive results do not absolve us from any concerns regarding risk to privacy in the synthetic data. Regulatory guidelines regarding the safety of synthetic data and the ability to openly share it are extremely unclear. No authorities have specified quantitative cut-offs using these metrics that enable open release, for example. For this reason, we have developed our own internal guidelines for how to handle this aspect of the project, which are based on a comprehensive examination of relevant EU and Danish legislation (i.e. the GDPR, the Artificial Intelligence Act, the Danish Health Law, and the Danish Data Protection Act). We continue work on synthesis with hope that new legislation such as the development of the European Health Data Space will provide further guidance in the future. + +## Rules for synthetic data in the Sandbox + +We are currently focused on exploring methods and metrics by developing reproducible, well documented examples and use cases of synthetic data in partnership with other researchers, legal advisors, and data authorities. We're relying primarily on publicly available tabular health datasets in this exploration phase, but we will also work with sensitive data in the future. Our rules aim to preserve the trust of the public in how their health data is handled by data authorities and researchers. + +:::{.callout-tip title="Sandbox Rules for Synthetic Data"} + 1. Creation of synthetic data involves processing sensitive data, and this requires obtaining project approvals from data authorities when performing this work on sensitive data. Any synthetic data work with restricted-access, sensitive data by the Sandbox will only be conducted with these approvals in place in the frame of a research project. + 2. Goals for each synthetic dataset project should be defined at project initiation: how will the synthetic dataset be used, who is the intended audience, and how might it be shared? This frame should govern every consequent decision for that dataset and be shared alongside the final dataset. + 3. Quantitative metrics for fidelity, utility, and privacy preservation should be implemented for each dataset and shared alongside the final dataset. + 4. A cost-benefit analysis should be performed after the project is completed - is any risk to privacy appropriately balanced by value of the dataset in achieving its stated aims and contributing to the public good? + 5. Data authorities with ethical and strategic stakes in who accesses the synthetic dataset should be included in decisions about how it is used and who is allowed to access it. + 6. Synthetic datasets created from person-specific sensitive data rather than population characteristics can still pose privacy risks, and any users of the dataset should be approved and registered. The Sandbox will not release any such datasets publicly and will instead work with appropriate data authorities to decide how such datasets should be governed in a responsible way. +::: diff --git a/develop/access/.DS_Store b/develop/access/.DS_Store deleted file mode 100644 index 5008ddfc..00000000 Binary files a/develop/access/.DS_Store and /dev/null differ diff --git a/develop/access/UCloud_tutorial.md b/develop/access/UCloud_tutorial.md deleted file mode 100644 index 090bb342..00000000 --- a/develop/access/UCloud_tutorial.md +++ /dev/null @@ -1,72 +0,0 @@ -# Data analysis Tutorial - -Here you have the instructions to start working on the tutorial. - -## Starting a jupyterlab sessions on uCloud - -You can use these instructions to open jupyterlab **also for the analysis of your own data and for the integration analysis**. - -* Log onto ucloud at the address [http://cloud.sdu.dk](http://cloud.sdu.dk) using the university credentials. - - -* When you are logged in, be sure to choose the project for the NNF course (red circle). Then click on the Apps button (green circle). -![](./img/dashboard.png) - -* Find the app `Jupyterlab` (red circle), which is under the title `Featured`. -![](./img/chooseapp.png) - -* Click on the app button. You will get into the settings window. Load the application settings following the illustrations below. - -![](./img/setup_1.png) - -![](./img/setup_2.png) - -![](./img/setup_3.png) - -* Now, click on the button `Submit` (red circle). - -![](./img/submit.png) - -* Wait to go through the queue. When the session starts, the timer begins to count down (red circle). In a couple of minutes you should be able to open the interface through the button (green circle). - -![](./img/running.png) - -## Work on the tutorial - -To work on the tutorial you need to go on a personal folder, which contains also the dataset you will filter after the tutorial. Each student has its own folder as in the table below: - -| Name | Sample folder | -|---|---| -| Andersen, Albert Lund | Gifu_ctr_1 | -| Bagger, Andreas | Gifu_ctr_2 | -| Hansen, Mads Würgler | Gifu_ctr_ 3 | -| Milo, Lasse | Gifu_ctr_4 | -| Reimick, Sebastian Haunstrup | Gifu_R7A_1 | -| Hemmingsen, Jonas Klejs | Gifu_R7A_2 | -| Skovmøller, Emma Hvitfeldt | Gifu_R7A_3 | -| Sørensen, Emma Frasez | Gifu_R7A_4 | -|Agersnap, Simon Nørregaard | HAR1_ctr_1 | -| Schmidt, Alina | HAR1_ctr_2 | -|Henriksen, Frederik Oskar | HAR1_ctr_3 | -|Lundby, Josephine Marie | HAR1_ctr_4 | -|Nørholm, Anne | HAR1_R7A_1 | -|Odgaard, Louise Nyrup | HAR1_R7A_2 | -|Sørensen, Sara Sejer | HAR1_R7A_3 | -|Lønskov, Jonas | HAR1_R7A_4 | -|Niklassen, Jacob Hansen | Gifu_ctr_1_bis | -|Øllgaard, Ann Mai Brøndum Holm | Gifu_ctr_2_bis | -|Overgaard, Morten Øgelund | Gifu_R7A_1_bis | -|Sørensen, Elisabeth Asta | Gifu_R7A_2_bis | -|Rey, Isabel | HAR1_ctr_1_bis | - -* When you open jupyterlab, you need to use the browser on your left to go to the folder `426401/Students_analysis/Folder_name`, where you find your personal folder (from the table) instead of `Folder_name`. -![](./img/browser.png) - -* Here, you have the notebook `tutorial.ipynb`. Open that to start working on the tutorial -* When you open the notebook, on the top-right corner you should have `R scrna` (red circle). If not, click there and choose `R scrna` from the menu that appears. - -![](./img/notebook.png) - -* Now you can start working on the tutorial. There is a lot of text to read and of course code to run. - -When you are finished with the tutorial, you are ready to go on to use the tutorial code for the [filtering session](./filtering.md). \ No newline at end of file diff --git a/develop/assets/.DS_Store b/develop/assets/.DS_Store deleted file mode 100644 index 275c16e4..00000000 Binary files a/develop/assets/.DS_Store and /dev/null differ diff --git a/develop/assets/images/.DS_Store b/develop/assets/images/.DS_Store deleted file mode 100644 index c023bc72..00000000 Binary files a/develop/assets/images/.DS_Store and /dev/null differ diff --git a/develop/assets/upcomingcourses.csv b/develop/assets/upcomingcourses.csv deleted file mode 100644 index f46ce3c0..00000000 --- a/develop/assets/upcomingcourses.csv +++ /dev/null @@ -1,9 +0,0 @@ -Event title,Dates,Location,Organizers,Sign-up link -Workshop: Intro to the Health Data Science Sandbox,18-19 April 2024,"AAU SUND, Aalborg University",HDS Sandbox,email [the Sandbox]((mailto:nhds_sandbox@sund.ku.dk)) -"Course: Single-cell, Single-molecule: The Next Level in Cell Biology",Spring Semester 2024,Aarhus University,"Stig Andersen, Mikkel Schierup, Victoria Birkedal, and Thomas Boesen with computational support from the HDS Sandbox (Samuele Soraggi)",[AU course catalog entry](https://kursuskatalog.au.dk/en/course/118020/Single-cell-Single-molecule-The-Next-Level-in-Cell-Biology) -"Course: Protein structure, dynamics and modelling (BMB834)",Spring Semester 2024,University of Southern Denmark,"Ole Nørregaard Jensen, Himanshu Khandelia, and Jacob Kongsted with computational support and material from the HDS Sandbox (Jacob Fredegaard Hansen)",[SDU course catalog entry](https://odin.sdu.dk/sitecore/index.php?a=searchfagbesk&internkode=bmb834&lang=en) -Course: Intro to NGS data analysis summer school,1-5 July 2024,Aarhus University,"Stig Andersen, Mikkel Schierup, Samuele Soraggi, with SS also supporting through HDS Sandbox",[AU course catalog entry](https://kursuskatalog.au.dk/en/course/124516/Next-Generation-Sequencing) -Workshop: Intro to NGS workflows and pipeline management,TBD - Fall Semester 2024,University of Copenhagen,HDS Sandbox (Alba Refoyo Martinez and Samuele Soraggi),email [the Sandbox]((mailto:nhds_sandbox@sund.ku.dk)) -Workshop: Research data management for NGS data,TBD - Fall Semester 2024,University of Copenhagen,HDS Sandbox (Alba Refoyo Martinez and Jennifer Bartell),email [the Sandbox]((mailto:nhds_sandbox@sund.ku.dk)) -Workshop: Intro to Multi-Omics analysis using the Health Data Science Sandbox,TBD - Fall Semester 2024,Aarhus University,HDS Sandbox (Samuele Soraggi) together with the OMICS focus group (Joanna Kalucka and Christian Damgaard),email [Samuele Soraggi]((mailto:Samuele@birc.au.dk)) -Workshop: Intro to UCloud for trainees and trainers,TBD - Fall Semester 2024,University of Copenhagen,HDS Sandbox (Alba Refoyo Martinez and Jennifer Bartell) and the SUND DataLab,email [the Sandbox]((mailto:nhds_sandbox@sund.ku.dk)) \ No newline at end of file diff --git a/develop/images/favicon.png b/develop/images/favicon.png deleted file mode 100644 index 9c41b500..00000000 Binary files a/develop/images/favicon.png and /dev/null differ diff --git a/develop/keywords.md b/develop/keywords.md deleted file mode 100644 index 77fef42d..00000000 --- a/develop/keywords.md +++ /dev/null @@ -1,4 +0,0 @@ - -Here's a lit of used keywords: - -[TAGS] \ No newline at end of file diff --git a/develop/modules/index.md b/develop/modules/index.md deleted file mode 100644 index 8d74b20e..00000000 --- a/develop/modules/index.md +++ /dev/null @@ -1,76 +0,0 @@ ---- -layout: default -title: Modules -has_children: true -nav_order: 3 -permalink: /modules -hide: - - footer - - toc ---- - -
-# Training Modules -
- -Sandbox resources have been organized as training modules focused on key topics in health data science. We are constantly adding additional resources and have plans to create additional modules on medical imaging and wearable device data. Feel free to adapt these resources for your own purposes (with credit to the National Health Data Science Sandbox project and other projects they acknowledge in the specific materials). - -You can **access our training modules** through: - -+ **In-person workshops and courses** at host universities (check [News](https://hds-sandbox.github.io/news/news.html) for announcements) -+ Course/workshop repositories on our **[GitHub page](https://github.com/hds-sandbox)** - some tool assembly may be required! -+ Independently accessible **Sandbox apps on [UCloud](https://cloud.sdu.dk)**, the academic HPC at University of Southern Denmark -+ Virtual machines deployed on the **[Course Platform](https://www.computerome.dk/solutions/course-platform) at Computerome**, the academic HPC at the Technical University of Denmark (Sandbox rollout still under development!) - -Available resources within each training module are listed below, including **:octicons-code-review-24: tutorials and guides :octicons-code-review-24:** and **:octicons-tools-24: popular tools for analysis and visualization :octicons-tools-24:**. [Email us](mailto:nhds_sandbox@sund.ku.dk) with any questions, comments or suggestions for new workshops! - -## Genomics -![Genomics](../assets/images/genomics2.png){ align=right width="30%" } Genomics is the study of genomes, the complete set of an organism's DNA. Genomics research now encompasses functional and structural studies, epigenomics, and metagenomics, and genomic medicine is under active implementation and extension in the health sector. - -#### Use the [Genomics Sandbox App](https://cloud.sdu.dk/app/jobs/create?app=genomics&version=2023.03.01) on UCloud to explore the resources below: - -+ [:octicons-code-review-24: Introduction to Next Generation Sequencing data](https://hds-sandbox.github.io/NGS_summer_course_Aarhus/) (last update: June 2023) -+ [:octicons-code-review-24: Introduction to Population Genomics](https://hds-sandbox.github.io/PopulationGenomicsCourse/) (implementation of a course by Prof. Kasper Munch of Aarhus University) (last update: March 2023) -+ [:octicons-code-review-24: Introduction to GWAS](https://hds-sandbox.github.io/GWAS_course/) (last update: March 2023) -+ :octicons-tools-24: Interactive Genomic Browser (a popular visualization tools for genomics analysis) -> - -## Transcriptomics -![Transcriptomics](../assets/images/transcriptomics.png){ align=right width="30%" } Transcriptomics is the study of transcriptomes, which investigates RNA transcripts within a cell or tissue to determine what genes are being expressed and in what proportion. These RNA transcripts include mRNAs, tRNA, rRNA and other non-coding RNA presents in a cell. - -#### Use the [Transcriptomics Sandbox App](https://cloud.sdu.dk/app/jobs/create?app=transcriptomics&version=2023.03) on UCloud to explore these resources: - -+ [:octicons-code-review-24: Bulk RNAseq](https://hds-sandbox.github.io/bulk_RNAseq_course) (last update: June 2023) -+ [:octicons-code-review-24: Single-Cell RNAseq](https://hds-sandbox.github.io/scRNASeq_course/) (last update: May 2023) -+ :octicons-tools-24: Cirrocumulus (a popular tool for visualizing different types of RNA-seq data and downstream analysis) -+ :octicons-tools-24: RNAseq in RStudio (RStudio session with pre-installed RNAseq analysis packages for exploring with your own uploaded data) - - -## Proteomics -![proteomics](../assets/images/proteomics.png){ align=right width="30%" } Proteomics is the study of proteins that are produced by an organism. Proteomics allows us to analyse protein compositon and structure, which have great importance in determining their function. - -#### Use the [Proteomics Sandbox App](https://cloud.sdu.dk/app/jobs/create?app=proteomics&version=Mar2023) on UCloud to explore pre-installed tools for proteomics analysis and other resources: - -+ [:octicons-tools-24: Proteomics Sandbox Documentation](https://hds-sandbox.github.io/proteomics-sandbox/index.html) (last update: May 2023) -+ :octicons-code-review-24: Introduction to Clinical Proteomics (course under development) - -We also offer a tutorial on UCloud's [ColabFold app](https://cloud.sdu.dk/app/jobs/create?app=colabfold&version=1.5.2), a tool that allows predictions with AlphaFold2 or RoseTTAFold. - -+ [:octicons-code-review-24: ColabFold Intro](https://hds-sandbox.github.io/proteomics-sandbox/colabfold.html) (last update: October 2022) - - -## Electronic Health Records -![EHRs](../assets/images/EHRs.png){ align=right width="30%" } Electronic health records (EHRs) are digital records kept in the public health sector that record the medical histories of individuals, and access is normally highly restricted to preserve patient privacy. This data is sometimes also shared (partly or in full) in secondary patient registries that support research of a specific disease or condition (such as breast cancer or cystic fibrosis). These datasets are extraordinarily valuable in the development of predictive models used in precision medicine. - -The chronic lymphocytic leukemia synthetic dataset listed below is generated solely from public data and is of low utility, so we don't recommend its use beyond the course it was designed for (with much explanation for the students on its construction and caveats). Please see [Synthetic Data](https://hds-sandbox.github.io/datasets/synthdata.html) for more information. - -+ :octicons-tools-24: Chronic Lymphocytic Leukemia synthetic dataset created for use in "Fra realworld data til personlig medicin", a course from University of Copenhagen's [MS in Personlig Medicin](https://personligmedicin.ku.dk/) (last update: January 2023) -+ :octicons-code-review-24: Intro to EHR analysis (workshop under development) - -## Data Carpentry and management -![HPC-Lab](../assets/images/HPC.png){ align=right width="30%" } Computing skills are an important foundation for health data science (and using the above training modules), but formal training is often lacking as biomedical researchers navigate increasingly difficult computational tasks in their projects. These skills range from programming to use of high performance computers (hpc) to proper research data management. - -+ [:octicons-code-review-24: HPC Startup Guide](https://hds-sandbox.github.io/access/index.html) (instructions for accessing and navigating compute resources at Computerome and UCloud) -+ [:octicons-code-review-24: HeaDS DataLab workshop materials](https://center-for-health-data-science.github.io/index.html) (workshops for programming and good practices developed by the Center for Health Data Science at University of Copenhagen, which are sometimes co-taught by Sandbox staff! Includes **R**, **python**, **bash**, and **git**!) -[:octicons-code-review-24: RDM for NGS data](https://hds-sandbox.github.io/RDM_NGS_course) (workshop on how to handle NGS data following simple guidelines to increase the FAIRability of your data) -+ :octicons-code-review-24: Intro to HPC (workshop under development) \ No newline at end of file diff --git a/develop/news/news.md b/develop/news/news.md deleted file mode 100644 index 421fb254..00000000 --- a/develop/news/news.md +++ /dev/null @@ -1,116 +0,0 @@ ---- -layout: page -title: News -permalink: /news/ -nav_order: 2 -hide: - - navigation - - footer ---- - -
-# News and Events -
- -Sandbox data scientists routinely lead or contribute to courses and workshops at host universities in Denmark. Check out upcoming events in the table below! - -

Upcoming Sandbox Training Events

- -{{ read_csv('../assets/upcomingcourses.csv') }} - ---- - -

News & Updates

- -#### 9/2/2024 - The Proteomics Sandbox supports a new course at SDU! -During the Spring Semester 2024, Sandbox data scientist Jacob Fredegaard Hansen will be assisting with teaching and tools in the course BMB834: Protein Structure, Dynamics, and Modelling at the University of Southern Denmark. Here, Jacob will provide Sandbox support, and materials will be used for applying computational methods for protein structure retrieval and visualization, as well as for applying high-performance computing (HPC) methods for protein structure modeling. - -#### 1/2/2024 - The Sandbox attends the DDSA PhD Meetup and 2024 D3A conference at Nyborg Strand -Several Sandbox staff represented the project at both the DDSA PhD Meetup with a practical presentation on research data management and with posters at D3A, the national data science meeting. It was nice to meet up in person and also make new connections with the excellent cadre of conference attendees. Thanks to the DDSA secretariat for their invitation and organizational efforts. - -#### 31/1/2024 - New posted manuscript - 'A primer for synthetic health data' -In collaboration with Prof. Henning Langberg at KU Public Health and funding from Erhvervsfyrtaarn Sund Vaegt, Jennifer Bartell has developed a manuscript that discusses technical, regulatory, and deployment solutions and challenges for synthetic health data from a broad perspective. This manuscript was developed in collaboration with Sandbox partners Sander Boisen Valentin and Martin Boegsted of AAU, and we plan to submit it to a journal soon. For now, check out [our manuscript on arXiv](https://arxiv.org/abs/2401.17653)! - -#### 12/12/2023 - NNF Collaborative Data Science award news: the SE3D project! -Today we got the news that we will be able to hire 5 new research staff focused on synthetic health data over the next 4 years. The SE3D project - Synthetic health data: ethical development and deployment via deep learning approaches - will be led by Sandbox PIs Martin Boegsted (AAU) and Anders Krogh (KU) alongside Sandbox project lead Jennifer Bartell (KU) and a new collaborator, Prof. Jan Trzaskowski from AAU Law. We're really excited to set up this research arm that shares so many Sandbox interests and potential for interaction. The project starts from 1 May 2024, with much thanks to the NNF for their continued support of our ideas. Look out for job ads in the spring from KU and AAU! - -#### 7/11/2023 - 'From Data Chaos to Data Harmony: Managing NGS Data in a Wetlab' at the annual DeiC conference -Sandbox data scientist Jose Alejandro Romero Herrera gave a talk in the Data Management speaker track at the annual Danish E-Infrastructure Consortium (DeiC) conference in Kolding, Denmark. The talk was well received at the biggest DeiC conference ever (250 participants). - -#### 9/11/2023 - Updated Proteomics Sandbox App supporting of a Biostatistics course at SDU -The Proteomics Sandbox Application has recently undergone a significant update, enhancing its security features to ensure safer usage for its users. In this latest iteration, Sandbox data scientist Jacob Fredegaard Hansen has expanded the app's software suite by introducing two new tools: DIA-NN and MZmine, catering to the metabolomics field. Furthermore, the pre-existing software within the application has been refreshed and updated to the latest versions, ensuring that the Proteomics Sandbox Application remains at the cutting-edge of the field. Excitingly, this application will be actively utilized in the course "BMB831: Biostatistics in R II" at the University of Southern Denmark throughout this autumn, showcasing its relevance and applicability in academic settings. - -#### 25/10/2023 - 'Research Data Management for NGS Data' with DeiC (Technical University of Denmark) -Sandbox data scientist Jose Alejandro Romero Herrera ran the first instance of a new module on research data management practices he developed specifically for NGS data. Twelve participants were hosted in conjunction with DeiC at DTU, and were exposed to tools like bash, conda, git, and cookie cutter in their quest to organize their omics data. - -#### 07/09/2023 - 'Digging into the Health Data Science Sandbox' at the Danish Bioinformatics Conference 2023 (Aarhus University) -The full team of Sandbox data scientists hosted a 4 hour workshop at the Danish Bioinformatics conference where they gave a taster session of each of our 3 omics apps. We learned that multi-omics analysis were a substantial draw for the crowd at the DBC and are making plans to address this interest in future events. - -#### 29/08/2023 - 3 days of Sandbox demos at Aarhus University -Sandbox data scientist Samuele Soraggi hosted a three day speed run through Sandbox apps at the Bioinformatics Research Center. The 26 participants joined for genomics, transcriptomics, and/or proteomics app demos depending on their interests. This thorough omics demo had maxed out participant sign-ups and an enthusiastic crew enjoyed the sessions alongside a bit of networking across disciplines. We plan to host more of these type of workshops given the event's success! - -#### 19/06/2023 - 3 days of bulk RNA-seq at the University of Copenhagen -Our teaching team (from the Sandbox, the HeaDS DataLab, and reNEW's genomics platform) hosted another 3 day workshop on bulk RNA-seq. The 34 participants used the updated version of the UCloud Transcriptomics App which provided the smoothest experience yet for both trainers and trainees. A new goal for the next course run is to add a student project to support independent implementation and exploration of the course content. - -#### 31/05/2023 - Sandbox App updates on UCloud rolled out -New versions of the Genomics Sandbox App, the Transcriptomics Sandbox App, and the Proteomics Sandbox App have all been launched on UCloud this month! Check out the new components in the training modules such as a GWAS module in Genomics and new tools in Transcriptomics and Proteomics. Updates were also informed by the different courses supported during Spring 2023. With these courses wrapping up this month, the associated new training materials have also been included in the new versions of the apps. - -#### 18/01/2023 - Second bulk RNA-seq course at the University of Copenhagen - -On 18th of January we taught the second iteration of our bulk RNA-seq course to researchers (from PhD students to professors) at SUND at the University of Copenhagen. We had ~50 workshop participants joining us for three days of lectures and exercises on UCloud. This time, we introduced preprocessing theory (read QC, alignment and quantification) and the use of automated workflows using the [nf-core rnaseq pipeline](https://nf-co.re/rnaseq). - -For those that could not enroll for this session, you can find the updated material [here](https://hds-sandbox.github.io/bulk_RNAseq_course/). We have moved the datasets and slides to a [zenodo repository](https://zenodo.org/record/7565997) - -We'd like to extend our thanks to our workshop collaborators, data scientists from the SUND DataLab at KU's Center for Health Data Science as well as the genomics platform at the NNF Center for Stem Cell Medicine (reNEW). - - -#### 10/01/2023 - Sandbox support for Spring 2023 courses - -##### Sandbox support for 'Population Genomics' - -Exercises for an [MS course on Population Genomics](https://kursuskatalog.au.dk/en/course/117821/Population-Genomics) taught by Prof. Kasper Munch at Aarhus University are being implemented on UCloud by Sandbox data scientist Samuele Soraggi. Students will explore the training materials on UCloud during the Spring 2023 semester, after which the materials will be accessible to any UCloud user via the Genomics Sandbox App. - -##### 'Fra real-world data til personlig medicin' with Course Platform & Sandbox support -The second round of the course ['Fra real-world data til personlig medicin'](https://personligmedicin.ku.dk/kursus/realworld/) in KU's MS in Personlig Medicin begins in January with an introduction to CLL-TIM, a predictive model for chronic lymphocytic leukemia deployed by Prof. Carsten Niemann, an introduction by Sandbox coordinator Jennifer Bartell to the new Course Platform at Computerome built with Sandbox help for hosting courses with HPC resources, and an introduction to building predictive models using TidyModels in R by Prof. Rasmus Broendum. The course will run through April with 10 continuing education students building their own predictive models using a new and improved synthetic CLL dataset developed by Sandbox data scientist Sander Boisen Valentin. Jennifer and Rasmus are also manning the Sandbox Slack workspace to field student questions about the dataset and their model building. - -##### Sandbox support for 'Single-cell, Single-Molecule: The Next Level in Cell Biology' -An NNF-funded course, ['Single-cell, Single-Molecule: The Next Level in Cell Biology'](https://kursuskatalog.au.dk/en/course/118020/Single-cell-Single-molecule-The-Next-Level-in-Cell-Biology) combining experimental and computational approaches to RNA sequencing is starting at Aarhus University. In addition to course-responsible professor Stig Andersen and co-teachers Victoria Birkedal and Thomas Boesen, Sandbox PI Mikkel Schierup will be contributing along with Sandbox data scientist Samuele Soraggi. Samuele is adapting the Transcriptomics App material on UCloud to supply tutorials and exercises for this hefty course as well as serving as a teaching assistant. The course materials will be available to all users of the Transcriptomics Sandbox App on UCloud in the future. - -#### 8/01/2023 - Soft launch of the new Course Platform at Computerome -Sandbox data scientist Jesper Roy Christiansen has been integral to the development of a new ['Course Platform'](https://www.computerome.dk/solutions/course-platform) at Computerome, the HPC platform at the Technical University of Denmark. Built as a collaboration between the Sandbox and Computerome, the Course Platform will host its first users, students in 'Fra real-world data til personlig medicin', a course of KU's MS in Personlig Medicin. Sandbox coordinator Jennifer Bartell and Sandbox PI Martin Boegsted have also been involved in testing this new system during course setup. See the above link as well as [HPC Access](https://hds-sandbox.github.io/access) for more details on this platform and how you can also use this new platform to host courses (with or without Sandbox involvement!). - - -#### 30/11/2022 - Sandbox support for 'Advanced Statistical Learning' -Sandbox data scientist Samuele Soraggi spent two weeks teaching for the Fall 2023 course ['Advanced Statistical Learning'](https://kursuskatalog.au.dk/da/course/115396/Advanced-Statistical-Learning) taught by Prof. Asger Hobolth at Aarhus University. - - -#### 15/11/2022 - Sandbox support for a workshop in the series 'Workshops in Applied Bioinformatics' at SDU -Sandbox data scientist Jacob Fredegaard Hansen created a tutorial on how to use ColabFold for a one day workshop as part of the ['Workshops in Applied Bioinformatics'](https://odin.sdu.dk/sitecore/index.php?a=searchfagbesk&internkode=bmb209&lang=en) series taught by Sandbox collaborator Veit Schwammle. The material is accessible on the Sandbox website (see [Modules](https://hds-sandbox.github.io/modules/), Proteomics) for any UCloud user alongside the UCloud ColabFold App. - - -#### 10/12/2022 - Transcriptomics Sandbox app launched on UCloud! - -We have deployed our second standalone Sandbox app on UCloud! Please see the [Access](https://hds-sandbox.github.io/access/UCloud) page for instructions on how to find our Sandbox apps on UCloud - This one is titled 'Transcriptomics Sandbox' and module documentation is linked from the UCloud app page as well as here in [Modules](https://hds-sandbox.github.io/modules). - - -#### 06/09/2022 - Genomics Sandbox app launched on UCloud! - -We have deployed our first standalone Sandbox app on UCloud! Please see the [Access](https://hds-sandbox.github.io/access/UCloud) page for instructions on how to find our Sandbox apps on UCloud - this first one is titled 'Genomics Sandbox' and module documentation is linked from the UCloud app page as well as here in [Modules](https://hds-sandbox.github.io/modules). - -#### 18/08/2022 - Bulk RNA-seq course at University of Copenhagen - -Today we began teaching our brand new bulk RNA-seq course to researchers (from PhD students to professors) at SUND at the University of Copenhagen. We had 32 workshop participants join us for two days of lectures and exercises on UCloud. We'd like to extend our thanks to our workshop collaborators, data scientists from the SUND DataLab at KU's Center for Health Data Science as well as the genomics platform at the NNF Center for Stem Cell Medicine (reNEW). - -For those that could not enroll for this session, you can find the relevant material [here](https://hds-sandbox.github.io/bulk_RNAseq_course/). - -#### 01/06/2022 - Genomics course at Aarhus University - -A month-long course in Genomics taught by Professors Mikkel Schierup and Stig Andersen has started with lead supercomputing support on UCloud by Sandbox data scientist and course instructor Samuele Soraggi. Computational exercises in NGS analysis were deployed in a UCloud project for use by 47 graduate students with primarily molecular biology and clinical backgrounds and no prior supercomputing experience! Post-course update: We received many positive reviews on use of the Genomics Sandbox training materials on UCloud! - -#### 22/04/2022 - Basics of Personalized Medicine - final wrap-up - -Our first course, Basics of Personalized Medicine, wrapped up this month with student project presentations which described their approaches to analysis of the synthetic Chronic Lymphocytic Leukemia dataset created for the course. Course reviews highlighted the helpfulness of Sandbox staff in troubleshooting R problems and the tremendous amount that students learned about predictive modeling. - -#### 04/01/2022 - Basics of Personalized Medicine - MS in Personal Medicine program - -The first course supported by the Sandbox is launching this month - 'Basics of Personalized Medicine' - where students in the new Master in Personal Medicine program at University of Copenhagen are introduced to predictive modeling using electronic health records. diff --git a/develop/sitemaps.xml b/develop/sitemaps.xml deleted file mode 100644 index 96981381..00000000 --- a/develop/sitemaps.xml +++ /dev/null @@ -1,20 +0,0 @@ - - -https://hds-sandbox.github.io/sitemap.xml - - -https://hds-sandbox.github.io/bulk_RNAseq_course/sitemap.xml - - -https://hds-sandbox.github.io/scRNASeq_course/sitemap.xml - - -https://hds-sandbox.github.io/ELIXIR-workshop/sitemap.xml - - -https://hds-sandbox.github.io/proteomics-sandbox/sitemap.xml - - -https://hds-sandbox.github.io/RDM_NGS_course/sitemap.xml - - \ No newline at end of file diff --git a/develop/stylesheets/.DS_Store b/develop/stylesheets/.DS_Store deleted file mode 100644 index 5008ddfc..00000000 Binary files a/develop/stylesheets/.DS_Store and /dev/null differ diff --git a/develop/stylesheets/extra.css b/develop/stylesheets/extra.css deleted file mode 100644 index 23187dca..00000000 --- a/develop/stylesheets/extra.css +++ /dev/null @@ -1,91 +0,0 @@ -[data-md-color-scheme="brightness"] { - /* Primary color shades */ - --md-primary-fg-color: #4266A1; - --md-primary-fg-color--light: #b4dbf3; - --md-primary-fg-color--dark: #b4dbf3; - --md-primary-bg-color: hsla(0, 0%, 100%, 1); - --md-primary-bg-color--light: hsla(0, 0%, 100%, 1); - --md-text-link-color: #4266A1 -/* --md-text-link-color: hsla(231, 48%, 48%, 1); */ - /* Accent color shades */ - --md-accent-fg-color: #b4dbf3; - --md-accent-fg-color--transparent: #b4dbf3; - --md-accent-bg-color: hsla(0, 0%, 100%, 1); - --md-accent-bg-color--light: hsla(0, 0%, 100%, 1); - /* Code block color shades */ - --md-code-bg-color: hsla(0, 0%, 96%, 1); - --md-code-fg-color: hsla(200, 18%, 26%, 1); - /* Footer */ - --md-footer-bg-color: #4266A1; - --md-footer-bg-color--dark: hsla(0, 0%, 0%, 0.32); - --md-footer-fg-color: hsla(0, 0%, 100%, 1); - --md-footer-fg-color--light: hsla(0, 0%, 100%, 0.7); - --md-footer-fg-color--lighter: hsla(0, 0%, 100%, 0.3); -} - -[data-md-color-scheme="slate"] { - --md-primary-fg-color: #b4dbf3; - --md-primary-fg-color--light: #b4dbf3; - --md-primary-fg-color--dark: #b4dbf3; - --md-primary-bg-color: hsla(0, 0%, 100%, 1); - --md-primary-bg-color--light: hsla(0, 0%, 100%, 0.7); - --md-text_link-color: #4266A1 - /* --md-text-link-color: hsla(231, 48%, 48%, 1);*/ - /* Accent color shades */ - --md-accent-fg-color: #b4dbf3; - --md-accent-fg-color--transparent: #b4dbf3; - --md-accent-bg-color: hsla(0, 0%, 100%, 1); - --md-accent-bg-color--light: hsla(0, 0%, 100%, 0.7); - --md-hue: 210; - /* Footer */ - --md-footer-bg-color: #b4dbf3; - --md-footer-bg-color--dark: hsla(0, 0%, 0%, 0.32); - --md-footer-fg-color: hsla(0, 0%, 100%, 1); - --md-footer-fg-color--light: hsla(0, 0%, 100%, 0.7); - --md-footer-fg-color--lighter: hsla(0, 0%, 100%, 0.3); -} - -/* Admonition font size */ - -.md-typeset .admonition, -.md-typeset details { - font-size: 16px -} - -/*--------*/ -/* Images */ -/*--------*/ -img[alt=sandbox_v2] { - width: 30%; -} - - img[alt=workflow] { - width: 60%; - display: block; - margin-left: auto; - margin-right: auto; -} - - img[alt=moduleshorizontal] { - width: 70%; - display: block; - margin-left: auto; - margin-right: auto; - 67 } - -img[src*='#right'] { - float: right; - width: 35%; -} - - figcaption { - text-align: center; - } - -/* -img[alt=genomics] { - width: 35%; - display: block; - margin-left: auto; - margin-right: auto; -} */ diff --git a/develop/assets/images/EHRs.png b/images/EHRs.png similarity index 100% rename from develop/assets/images/EHRs.png rename to images/EHRs.png diff --git a/develop/assets/images/HPC.png b/images/HPC.png similarity index 100% rename from develop/assets/images/HPC.png rename to images/HPC.png diff --git a/develop/assets/images/Modules_horizontal.png b/images/Modules_horizontal.png similarity index 100% rename from develop/assets/images/Modules_horizontal.png rename to images/Modules_horizontal.png diff --git a/develop/assets/images/Sandbox_PIs.png b/images/Sandbox_PIs.png similarity index 100% rename from develop/assets/images/Sandbox_PIs.png rename to images/Sandbox_PIs.png diff --git a/develop/assets/images/Sandbox_workflow.png b/images/Sandbox_workflow.png similarity index 100% rename from develop/assets/images/Sandbox_workflow.png rename to images/Sandbox_workflow.png diff --git a/develop/assets/images/Sandbox_workflow_ai_horizontal2.png b/images/Sandbox_workflow_ai_horizontal2.png similarity index 100% rename from develop/assets/images/Sandbox_workflow_ai_horizontal2.png rename to images/Sandbox_workflow_ai_horizontal2.png diff --git a/develop/assets/images/Sandbox_workflow_ai_horizontal2w-01 2.png b/images/Sandbox_workflow_ai_horizontal2w-01 2.png similarity index 100% rename from develop/assets/images/Sandbox_workflow_ai_horizontal2w-01 2.png rename to images/Sandbox_workflow_ai_horizontal2w-01 2.png diff --git a/develop/assets/images/Sandbox_workflow_ai_horizontal2w-01.png b/images/Sandbox_workflow_ai_horizontal2w-01.png similarity index 100% rename from develop/assets/images/Sandbox_workflow_ai_horizontal2w-01.png rename to images/Sandbox_workflow_ai_horizontal2w-01.png diff --git a/develop/assets/images/SynthDataQualities.png b/images/SynthDataQualities.png similarity index 100% rename from develop/assets/images/SynthDataQualities.png rename to images/SynthDataQualities.png diff --git a/develop/assets/images/Tradeoff_base.png b/images/Tradeoff_base.png similarity index 100% rename from develop/assets/images/Tradeoff_base.png rename to images/Tradeoff_base.png diff --git a/develop/assets/images/Tradeoff_base.svg b/images/Tradeoff_base.svg similarity index 100% rename from develop/assets/images/Tradeoff_base.svg rename to images/Tradeoff_base.svg diff --git a/develop/assets/images/apps.png b/images/apps.png similarity index 100% rename from develop/assets/images/apps.png rename to images/apps.png diff --git a/develop/assets/images/browser.png b/images/browser.png similarity index 100% rename from develop/assets/images/browser.png rename to images/browser.png diff --git a/develop/assets/images/chooseapp.png b/images/chooseapp.png similarity index 100% rename from develop/assets/images/chooseapp.png rename to images/chooseapp.png diff --git a/develop/assets/images/configure_NGS.png b/images/configure_NGS.png similarity index 100% rename from develop/assets/images/configure_NGS.png rename to images/configure_NGS.png diff --git a/develop/assets/images/dashboard.png b/images/dashboard.png similarity index 100% rename from develop/assets/images/dashboard.png rename to images/dashboard.png diff --git a/develop/assets/favicon.png b/images/favicon copy.png similarity index 100% rename from develop/assets/favicon.png rename to images/favicon copy.png diff --git a/develop/assets/images/favicon.png b/images/favicon.png similarity index 100% rename from develop/assets/images/favicon.png rename to images/favicon.png diff --git a/develop/assets/images/genomics2.png b/images/genomics2.png similarity index 100% rename from develop/assets/images/genomics2.png rename to images/genomics2.png diff --git a/develop/assets/images/illustration.png b/images/illustration.png similarity index 100% rename from develop/assets/images/illustration.png rename to images/illustration.png diff --git a/develop/assets/images/interface_jupyterlab.png b/images/interface_jupyterlab.png similarity index 100% rename from develop/assets/images/interface_jupyterlab.png rename to images/interface_jupyterlab.png diff --git a/develop/images/logo.png b/images/logo.png similarity index 100% rename from develop/images/logo.png rename to images/logo.png diff --git a/develop/assets/images/notebook.png b/images/notebook.png similarity index 100% rename from develop/assets/images/notebook.png rename to images/notebook.png diff --git a/develop/assets/images/openning_notebook.png b/images/openning_notebook.png similarity index 100% rename from develop/assets/images/openning_notebook.png rename to images/openning_notebook.png diff --git a/develop/assets/images/proteomics.png b/images/proteomics.png similarity index 100% rename from develop/assets/images/proteomics.png rename to images/proteomics.png diff --git a/develop/assets/images/reuse_setup.png b/images/reuse_setup.png similarity index 100% rename from develop/assets/images/reuse_setup.png rename to images/reuse_setup.png diff --git a/develop/assets/images/running.png b/images/running.png similarity index 100% rename from develop/assets/images/running.png rename to images/running.png diff --git a/develop/assets/images/running_NGS.png b/images/running_NGS.png similarity index 100% rename from develop/assets/images/running_NGS.png rename to images/running_NGS.png diff --git a/develop/assets/images/sandbox_courses.png b/images/sandbox_courses.png similarity index 100% rename from develop/assets/images/sandbox_courses.png rename to images/sandbox_courses.png diff --git a/develop/assets/images/sandbox_v2_med.png b/images/sandbox_v2_med.png similarity index 100% rename from develop/assets/images/sandbox_v2_med.png rename to images/sandbox_v2_med.png diff --git a/develop/assets/images/setup_1.png b/images/setup_1.png similarity index 100% rename from develop/assets/images/setup_1.png rename to images/setup_1.png diff --git a/develop/assets/images/setup_2.png b/images/setup_2.png similarity index 100% rename from develop/assets/images/setup_2.png rename to images/setup_2.png diff --git a/develop/assets/images/setup_3.png b/images/setup_3.png similarity index 100% rename from develop/assets/images/setup_3.png rename to images/setup_3.png diff --git a/develop/assets/images/submit.png b/images/submit.png similarity index 100% rename from develop/assets/images/submit.png rename to images/submit.png diff --git a/develop/assets/images/transcriptomics.png b/images/transcriptomics.png similarity index 100% rename from develop/assets/images/transcriptomics.png rename to images/transcriptomics.png diff --git a/develop/assets/images/workspace.png b/images/workspace.png similarity index 100% rename from develop/assets/images/workspace.png rename to images/workspace.png diff --git a/develop/assets/images/zelda-dark-world.png b/images/zelda-dark-world.png similarity index 100% rename from develop/assets/images/zelda-dark-world.png rename to images/zelda-dark-world.png diff --git a/develop/assets/images/zelda-light-world.png b/images/zelda-light-world.png similarity index 100% rename from develop/assets/images/zelda-light-world.png rename to images/zelda-light-world.png diff --git a/img/logo.png b/img/logo.png new file mode 100644 index 00000000..2970f746 Binary files /dev/null and b/img/logo.png differ diff --git a/develop/index.md b/index.qmd similarity index 83% rename from develop/index.md rename to index.qmd index 862b8197..c55afda7 100644 --- a/develop/index.md +++ b/index.qmd @@ -1,46 +1,40 @@ ---- -# Feel free to add content and custom Front Matter to this file. -# To modify the layout, see https://jekyllrb.com/docs/themes/#overriding-theme-defaults - -layout: home -hide: - - navigation - - toc - - footer ---- - - -

Welcome to the Health Data Science Sandbox

-

a collaborative project with team members spanning five Danish universities

- -
- ![workflow](../assets/images/Sandbox_workflow_ai_horizontal2w-01.png){ width="60%" } -
- ---- - -The Health Data Science Sandbox is a national project coordinated by the [Center for Health Data Science](https://heads.ku.dk/) at the University of Copenhagen. We're working with a network of health data science experts to build training resources on academic supercomputers for students and researchers in Denmark. Our Sandbox contains [training modules](https://hds-sandbox.github.com/modules/index.html) that pair topical datasets with recommended analysis tools, pipelines, and learning materials/tutorials in a portable, containerized format. - -
- ![moduleshorizontal](../assets/images/Modules_horizontal.png) -
- -

To get involved as a trainee, researcher, or educator in Denmark:

-
-TRAINEES: [join](https://hds-sandbox.github.io/news/news.html) our next scheduled workshop or a supported university course - -TRAINEES/RESEARCHERS: explore [training modules](https://hds-sandbox.github.io/modules/index.html) independently on UCloud - -RESEARCHERS: adapt [training modules](https://hds-sandbox.github.io/modules/index.html) or [code repositories](https://github.com/hds-sandbox) to your research - -EDUCATORS: [host](https://hds-sandbox.github.io/access/) a training event or course in the Sandbox with our support -
---- - -
**A note on Sandbox data policy**
- -The Sandbox aims to be a resource for learning new analysis approaches and tools for health data science on useful, interesting, and safe-to-share datasets. All person-specific datasets in the Sandbox are non-sensitive and GDPR-safe because they are 1) sourced from public databases, 2) fully anonymous/non-sensitive from a GDPR perspective, and/or 3) synthetic. To learn more, check out [Datasets](https://hds-sandbox.github.io/datasets/datapolicy.html) where we explain our data policy in detail and our approach to synthetic data generation. - ---- - -**Thanks to the Novo Nordisk Foundation for funding the National Health Data Science Project! Please give credit if you use our open-source materials in any form (NNF grant number NNF20OC0063268).** +--- +title: "Danish Health Data Science Sandbox" +format: html +date-modified: last-modified +date-format: long +date: 2024-01-01 +bibliography: resources/references.bib +--- + +

Welcome to the Health Data Science Sandbox

+

a collaborative project with team members spanning five Danish universities

+ +![](images/Sandbox_workflow_ai_horizontal2w-01.png){width="60%" fig-align="center"} + +--- + +The Health Data Science Sandbox is a national project coordinated by the [Center for Health Data Science](https://heads.ku.dk/) at the University of Copenhagen. We're working with a network of health data science experts to build training resources on academic supercomputers for students and researchers in Denmark. Our Sandbox contains [training modules](https://hds-sandbox.github.com/modules/index.html) that pair topical datasets with recommended analysis tools, pipelines, and learning materials/tutorials in a portable, containerized format. + + +![](images/Modules_horizontal.png) + +

To get involved as a trainee, researcher, or educator in Denmark:

+
+TRAINEES: [join](https://hds-sandbox.github.io/news/news.html) our next scheduled workshop or a supported university course + +TRAINEES/RESEARCHERS: explore [training modules](https://hds-sandbox.github.io/modules/index.html) independently on UCloud + +RESEARCHERS: adapt [training modules](https://hds-sandbox.github.io/modules/index.html) or [code repositories](https://github.com/hds-sandbox) to your research + +EDUCATORS: [host](https://hds-sandbox.github.io/access/) a training event or course in the Sandbox with our support +
+--- + +
**A note on Sandbox data policy**
+ +The Sandbox aims to be a resource for learning new analysis approaches and tools for health data science on useful, interesting, and safe-to-share datasets. All person-specific datasets in the Sandbox are non-sensitive and GDPR-safe because they are 1) sourced from public databases, 2) fully anonymous/non-sensitive from a GDPR perspective, and/or 3) synthetic. To learn more, check out [Datasets](https://hds-sandbox.github.io/datasets/datapolicy.html) where we explain our data policy in detail and our approach to synthetic data generation. + +--- + +**Thanks to the Novo Nordisk Foundation for funding the National Health Data Science Project! Please give credit if you use our open-source materials in any form (NNF grant number NNF20OC0063268).** diff --git a/mkdocs.yml b/mkdocs.yml deleted file mode 100755 index 02f97029..00000000 --- a/mkdocs.yml +++ /dev/null @@ -1,190 +0,0 @@ -# Project information -site_name: 'Health Data Science Sandbox' -site_url: 'https://hds-sandbox.github.io/' # This should be the GitHub Pages -site_description: 'Supporting training and research in health data science' - -# Repository -repo_name: hds-sandbox -repo_url: https://github.com/hds-sandbox - -# Configuration -theme: - - # Use the Material for MkDocs theme - # url: https://squidfunk.github.io/mkdocs-material/ - - name: material - custom_dir: overrides - - # Necessary for search to work properly - include_search_page: false - search_index_only: true - - # Default values, taken from mkdocs_theme.yml - language: en - features: - - content.code.annotate - # - content.tabs.link - - content.tooltips - # - header.autohide - # - navigation.expand - - navigation.indexes - # - navigation.instant - # - navigation.prune - - navigation.sections - - navigation.footer - - navigation.tabs - - navigation.tabs.sticky - - navigation.top - - navigation.tracking - - search.highlight - - search.share - - search.suggest - - toc.follow - # - toc.integrate - - palette: - # Sandbox colours - "brightness" and "slate" - are defined in stylesheets/extra.css - # Palette toggle for light mode - - scheme: brightness - toggle: - icon: material/brightness-7 - name: Switch to dark mode - - # Palette toggle for dark mode - - scheme: slate - toggle: - icon: material/brightness-4 - name: Switch to light mode - - font: - text: Roboto - code: Roboto Mono - - favicon: images/favicon.png - logo: images/logo.png - -# Changes to website colours and image parameters -extra_css: - - stylesheets/extra.css - -extra_javascript: - - javascripts/mathjax.js - - https://polyfill.io/v3/polyfill.min.js?features=es6 - - https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js - -plugins: - - tags: - tags_file: keywords.md # Keywords file to use for mkdocs tags - #- bibtex: - # bib_file: references.bib # Reference file in bibtex format - - search - - mkdocs-video # embed videos to the docs - - minify: # minify HTML of a page before it is written to disk - minify_html: true - - mkdocs-jupyter - - table-reader - - git-revision-date-localized: - enable_creation_date: false - -# Analytics and social media -extra: - analytics: - provider: google - property: !ENV GOOGLE_ANALYTICS_KEY - social: - - icon: fontawesome/brands/github - link: https://github.com/hds-sandbox - - icon: fontawesome/brands/twitter - link: https://twitter.com/ucph_heads - homepage: http://hds-sandbox.github.io/ - -# Extensions - -markdown_extensions: - - abbr # abbreviations - - admonition - - attr_list # Add HTML/CSS to Markdown elements - - def_list # Definition lists - - footnotes - - md_in_html - - toc: # Table of contents - permalink: true # Adds anchor link, can customise symbol with emoji - - - tables - - - # Python extensions - can see descriptions - # at https://facelessuser.github.io/pymdown-extensions/ - - - pymdownx.arithmatex: # LaTeX - generic: true - - pymdownx.betterem: # improves emphasis (bold/italic) handling - smart_enable: all - - pymdownx.critic # useful for marking .md file without changes to html - - pymdownx.caret # improved functionality for caret symbol - - pymdownx.details # collapsable elements that hide content - - pymdownx.emoji: - emoji_index: !!python/name:material.extensions.emoji.twemoji - emoji_generator: !!python/name:material.extensions.emoji.to_svg - options: - custom_icons: - - overrides/.icons - - pymdownx.highlight: # code highlighting - anchor_linenums: true - - pymdownx.inlinehilite # inline code highlight - - pymdownx.keys # to make entering and styling keyboard key presses easier - - pymdownx.mark # highlight/mark text - - pymdownx.smartsymbols # special characters (e.g. arrows, tm, fractions) - - pymdownx.superfences # arbitrary nesting of code and content blocks inside each other - - pymdownx.tabbed: # add tabs to .md file - alternate_style: true - - pymdownx.tasklist: # checkbox task list - custom_checkbox: true - - pymdownx.tilde # delete and subscript - -# Extra configs -use_directory_urls: false -docs_dir: develop # source md files should be saved in the develop folder -site_dir: docs # built webpage will be created in the docs folder - -# Page tree -# Include your page titles and associating .md files here - -nav: - - Home: index.md - - About: about/about.md - - News: news/news.md - - HPC access: - - access/index.md - - 'Computerome': access/Computerome.md - - 'UCloud': access/UCloud.md - - Datasets: - # - datasets/index.md - - 'Data Policy': datasets/datapolicy.md - - 'Synthetic Data': datasets/synthdata.md - # - 'Assembled Datasets': datasets/datasets.md - - Modules: - - modules/index.md - - 'Genomics': - - 'Introduction to Next Generation Sequencing Data': https://hds-sandbox.github.io/NGS_summer_course_Aarhus/ - - 'Introduction to Population Genomics': https://hds-sandbox.github.io/PopulationGenomicsCourse/ - - 'Introduction to GWAS': https://hds-sandbox.github.io/GWAS_course/ - - 'Interactive Genomic Browser': https://hds-sandbox.github.io/modules/index.html - - 'Transcriptomics': # modules/transcriptomics.md - - 'Bulk RNAseq': https://hds-sandbox.github.io/bulk_RNAseq_course/ - - 'Single-Cell RNAseq': https://hds-sandbox.github.io/scRNASeq_course/ - - 'Cirrocumulus': https://hds-sandbox.github.io/modules/index.html - - 'RNAseq in Rstudio': https://hds-sandbox.github.io/modules/index.html - - 'Proteomics': - - 'Proteomics Sandbox': https://hds-sandbox.github.io/proteomics-sandbox/index.html - - 'ColabFold': https://hds-sandbox.github.io/proteomics-sandbox/colabfold.html - - 'Electronic Health Records': - - 'CLL dataset': https://hds-sandbox.github.io/modules/index.html - - 'HPC Lab': - - 'HeaDS DataLab': https://center-for-health-data-science.github.io/index.html - - 'RDM for NGS data': https://hds-sandbox.github.io/RDM_NGS_course - - Contact: contact/contact.md - - HeaDS website: https://heads.ku.dk/ - - Workshop: workshop/workshop.md - diff --git a/develop/modules/AlphaFold_0122.md b/modules/AlphaFold_0122.md similarity index 100% rename from develop/modules/AlphaFold_0122.md rename to modules/AlphaFold_0122.md diff --git a/develop/modules/EHRs.md b/modules/EHRs.md similarity index 97% rename from develop/modules/EHRs.md rename to modules/EHRs.md index 06caaa5e..e2810d2a 100644 --- a/develop/modules/EHRs.md +++ b/modules/EHRs.md @@ -1,17 +1,17 @@ ---- -layout: page -title: EHRs -parent: Modules -has_children: true -nav_order: 2 -hide: - - navigation - - footer - - toc ---- - -# Electronic Health Records - -Electronic health records (EHRs) are digital records kept in the public health sector that record the medical histories of individuals, and access is normally highly restricted to preserve patient privacy. This data is sometimes also shared (partly or in full) in secondary patient registries that support research of a specific disease or condition (such as cystic fibrosis). These datasets are extraordinarily valuable in the development of predictive models used in precision medicine. - -Modules linked to EHR analysis are currently under development. +--- +layout: page +title: EHRs +parent: Modules +has_children: true +nav_order: 2 +hide: + - navigation + - footer + - toc +--- + +# Electronic Health Records + +Electronic health records (EHRs) are digital records kept in the public health sector that record the medical histories of individuals, and access is normally highly restricted to preserve patient privacy. This data is sometimes also shared (partly or in full) in secondary patient registries that support research of a specific disease or condition (such as cystic fibrosis). These datasets are extraordinarily valuable in the development of predictive models used in precision medicine. + +Modules linked to EHR analysis are currently under development. diff --git a/develop/modules/bulk_rnaseq.md b/modules/bulk_rnaseq.md similarity index 100% rename from develop/modules/bulk_rnaseq.md rename to modules/bulk_rnaseq.md diff --git a/develop/modules/clinProteomics_0122.md b/modules/clinProteomics_0122.md similarity index 100% rename from develop/modules/clinProteomics_0122.md rename to modules/clinProteomics_0122.md diff --git a/develop/modules/course_template.md b/modules/course_template.md similarity index 100% rename from develop/modules/course_template.md rename to modules/course_template.md diff --git a/develop/modules/genomics.md b/modules/genomics.md similarity index 97% rename from develop/modules/genomics.md rename to modules/genomics.md index 1a66fdb8..5b4353e0 100644 --- a/develop/modules/genomics.md +++ b/modules/genomics.md @@ -1,13 +1,13 @@ ---- -layout: page -title: Genomics -parent: Modules -has_children: true -nav_order: 2 ---- - -# Genomics - -Genomics is the study of genomes, the complete set of an organism's DNA. Genomics research now encompasses functional and structural studies, epigenomics, and metagenomics, and genomic medicine is under active implementation and extension in the health sector. - -Modules linked to genomics topics are currently under construction. +--- +layout: page +title: Genomics +parent: Modules +has_children: true +nav_order: 2 +--- + +# Genomics + +Genomics is the study of genomes, the complete set of an organism's DNA. Genomics research now encompasses functional and structural studies, epigenomics, and metagenomics, and genomic medicine is under active implementation and extension in the health sector. + +Modules linked to genomics topics are currently under construction. diff --git a/modules/index.qmd b/modules/index.qmd new file mode 100644 index 00000000..db34f1a9 --- /dev/null +++ b/modules/index.qmd @@ -0,0 +1,82 @@ +--- +title: "Training modules" +format: html +date-modified: last-modified +date-format: long +date: 2024-01-01 +--- + +Sandbox resources have been organized as training modules focused on key topics in health data science. We are constantly adding additional resources and have plans to create additional modules on medical imaging and wearable device data. Feel free to adapt these resources for your own purposes (with credit to the National Health Data Science Sandbox project and other projects they acknowledge in the specific materials). + +You can **access our training modules** through: + ++ **In-person workshops and courses** at host universities (check [News](../news.qmd) for announcements) ++ Course/workshop repositories on our **[GitHub page](https://github.com/hds-sandbox)** - some tool assembly may be required! ++ Independently accessible **Sandbox apps on [UCloud](https://cloud.sdu.dk)**, the academic (interactive) HPC at University of Southern Denmark ++ Independently accessible **Sandbox apps on [GenomeDK](https://genome.au.dk)**, the bioinformatics high-throughput HPC at University of Aarhus ++ Virtual machines deployed on the **[Course Platform](https://www.computerome.dk/solutions/course-platform) at Computerome**, the academic HPC at the Technical University of Denmark (Sandbox rollout still under development!) +**tutorials and guides** and **popular tools for analysis and visualization**. [Email us](mailto:nhds_sandbox@sund.ku.dk) with any questions, comments or suggestions for new workshops! + +## Genomics + +![](../images/genomics2.png){fig-align="right" width="20%"} + +Genomics is the study of genomes, the complete set of an organism's DNA. Genomics research now encompasses functional and structural studies, epigenomics, and metagenomics, and genomic medicine is under active implementation and extension in the health sector. + +**Use the [Genomics Sandbox App](https://cloud.sdu.dk/app/jobs/create?app=genomics&version=2023.03.01) on UCloud to explore the resources below:** + +- [Introduction to Next Generation Sequencing data](https://hds-sandbox.github.io/NGS_summer_course_Aarhus/) (last update: June 2023) +- [Introduction to Population Genomics](https://hds-sandbox.github.io/PopulationGenomicsCourse/) (implementation of a course by Prof. Kasper Munch of Aarhus University) (last update: March 2023) +- [Introduction to GWAS](https://hds-sandbox.github.io/GWAS_course/) (last update: March 2023) + +## Transcriptomics + +![](../images/transcriptomics.png){fig-align="right" width="20%"} + +Transcriptomics is the study of transcriptomes, which investigates RNA transcripts within a cell or tissue to determine what genes are being expressed and in what proportion. These RNA transcripts include mRNAs, tRNA, rRNA and other non-coding RNA presents in a cell. + +**Use the [Transcriptomics Sandbox App](https://cloud.sdu.dk/app/jobs/create?app=transcriptomics&version=2023.03) on UCloud to explore these resources:** + +- [Bulk RNAseq](https://hds-sandbox.github.io/bulk_RNAseq_course) (last update: June 2023) +- [Single-Cell RNAseq](https://hds-sandbox.github.io/scRNASeq_course/) (last update: May 2023) +- Cirrocumulus (a popular tool for visualizing different types of RNA-seq data and downstream analysis) +- RNAseq in RStudio (RStudio session with pre-installed RNAseq analysis packages for exploring with your own uploaded data) + + +## Proteomics + +![](../images/proteomics.png){fig-align="right" width="20%"} + +Proteomics is the study of proteins that are produced by an organism. Proteomics allows us to analyse protein compositon and structure, which have great importance in determining their function. + +**#### **Use the [Proteomics Sandbox App](https://cloud.sdu.dk/app/jobs/create?app=proteomics&version=Mar2023) on UCloud to explore pre-installed tools for proteomics analysis and other resources:** + +- [Proteomics Sandbox Documentation](https://hds-sandbox.github.io/proteomics-sandbox/index.html) (last update: May 2023) +- Introduction to Clinical Proteomics (course under development) + +We also offer a tutorial on UCloud's [ColabFold app](https://cloud.sdu.dk/app/jobs/create?app=colabfold&version=1.5.2), a tool that allows predictions with AlphaFold2 or RoseTTAFold. + +- [ColabFold Intro](https://hds-sandbox.github.io/proteomics-sandbox/colabfold.html) (last update: October 2022) + + +## Electronic Health Records + +![](../images/EHRs.png){fig-align="right" width="20%"} + +Electronic health records (EHRs) are digital records kept in the public health sector that record the medical histories of individuals, and access is normally highly restricted to preserve patient privacy. This data is sometimes also shared (partly or in full) in secondary patient registries that support research of a specific disease or condition (such as breast cancer or cystic fibrosis). These datasets are extraordinarily valuable in the development of predictive models used in precision medicine. + +The chronic lymphocytic leukemia synthetic dataset listed below is generated solely from public data and is of low utility, so we don't recommend its use beyond the course it was designed for (with much explanation for the students on its construction and caveats). Please see [Synthetic Data](../datasets/synthdata.qmd) for more information. + +- Chronic Lymphocytic Leukemia synthetic dataset created for use in "Fra realworld data til personlig medicin", a course from University of Copenhagen's [MS in Personlig Medicin](https://personligmedicin.ku.dk/) (last update: January 2023) +- Intro to EHR analysis (workshop under development) + +## Data Carpentry and management + +![](../images/HPC.png){fig-align="right" width="20%"} + +Computing skills are an important foundation for health data science (and using the above training modules), but formal training is often lacking as biomedical researchers navigate increasingly difficult computational tasks in their projects. These skills range from programming to use of high performance computers (hpc) to proper research data management. + +- [HPC Startup Guide](https://hds-sandbox.github.io/access/index.html) (instructions for accessing and navigating compute resources at Computerome and UCloud) +- [HeaDS DataLab workshop materials](https://center-for-health-data-science.github.io/index.html) (workshops for programming and good practices developed by the Center for Health Data Science at University of Copenhagen, which are sometimes co-taught by Sandbox staff! Includes **R**, **python**, **bash**, and **git**!) +- [RDM for NGS data](https://hds-sandbox.github.io/RDM_NGS_course) (workshop on how to handle NGS data following simple guidelines to increase the FAIRability of your data) +- Intro to HPC (workshop under development) \ No newline at end of file diff --git a/develop/modules/proteomics.md b/modules/proteomics.md similarity index 97% rename from develop/modules/proteomics.md rename to modules/proteomics.md index 70ed4a7c..0d4be27d 100644 --- a/develop/modules/proteomics.md +++ b/modules/proteomics.md @@ -1,11 +1,11 @@ ---- -layout: page -title: Proteomics -parent: Modules -has_children: true -nav_order: 3 ---- - -# Proteomics - -Proteomics is the study of proteins summed across a complete sample (ranging from a single cell to a whole organism). High-throughput measurement is conducted using mass spectrometry techniques and protein arrays, and provides insight into protein expression profiles and interactions. +--- +layout: page +title: Proteomics +parent: Modules +has_children: true +nav_order: 3 +--- + +# Proteomics + +Proteomics is the study of proteins summed across a complete sample (ranging from a single cell to a whole organism). High-throughput measurement is conducted using mass spectrometry techniques and protein arrays, and provides insight into protein expression profiles and interactions. diff --git a/develop/modules/transcriptomics.md b/modules/transcriptomics.md similarity index 96% rename from develop/modules/transcriptomics.md rename to modules/transcriptomics.md index 47cb0239..8abf743d 100644 --- a/develop/modules/transcriptomics.md +++ b/modules/transcriptomics.md @@ -1,11 +1,11 @@ ---- -layout: page -title: Transcriptomics -parent: Modules -has_children: true -nav_order: 2 ---- - -# Transcriptomics - -Transcriptomics is the study of RNA transcripts and provides insight into gene expression patterns. State-of-the-art approaches rely on high-throughput sequencing of transcripts sampled by various methods. +--- +layout: page +title: Transcriptomics +parent: Modules +has_children: true +nav_order: 2 +--- + +# Transcriptomics + +Transcriptomics is the study of RNA transcripts and provides insight into gene expression patterns. State-of-the-art approaches rely on high-throughput sequencing of transcripts sampled by various methods. diff --git a/news.qmd b/news.qmd new file mode 100644 index 00000000..7b2a5ede --- /dev/null +++ b/news.qmd @@ -0,0 +1,12 @@ +--- +title: "News" +listing: + fields: [date, title, author] + contents: news/*.qmd + type: table + sort: + - "date desc" +--- + + +Sandbox data scientists routinely lead or contribute to courses and workshops at host universities in Denmark. Check out upcoming events in the table below! \ No newline at end of file diff --git a/news/2022-01-04-basicpm.qmd b/news/2022-01-04-basicpm.qmd new file mode 100644 index 00000000..8e67d131 --- /dev/null +++ b/news/2022-01-04-basicpm.qmd @@ -0,0 +1,9 @@ +--- +title: "Basics of Personalized Medicine - MSc course" +description: "Basics of Personalized Medicine - MS in Personal Medicine program" +author: Jennifer Bartell +date: 2022-06-01 +categories: [personalized-medicine, AAU] +--- + +The first course supported by the Sandbox is launching this month - 'Basics of Personalized Medicine' - where students in the new Master in Personal Medicine program at University of Copenhagen are introduced to predictive modeling using electronic health records. \ No newline at end of file diff --git a/news/2022-04-22-basicpm-wrapup.qmd b/news/2022-04-22-basicpm-wrapup.qmd new file mode 100644 index 00000000..c81cc96e --- /dev/null +++ b/news/2022-04-22-basicpm-wrapup.qmd @@ -0,0 +1,9 @@ +--- +title: "Basics of Personalized Medicine - final wrap-up" +description: "Basics of Personalized Medicine - final wrap-up" +author: Jennifer Bartell +date: 2022-06-01 +categories: [personalized-medicine, AAU] +--- + +Our first course, Basics of Personalized Medicine, wrapped up this month with student project presentations which described their approaches to analysis of the synthetic Chronic Lymphocytic Leukemia dataset created for the course. Course reviews highlighted the helpfulness of Sandbox staff in troubleshooting R problems and the tremendous amount that students learned about predictive modeling. \ No newline at end of file diff --git a/news/2022-06-01-genomics-au.qmd b/news/2022-06-01-genomics-au.qmd new file mode 100644 index 00000000..6fbc41a2 --- /dev/null +++ b/news/2022-06-01-genomics-au.qmd @@ -0,0 +1,9 @@ +--- +title: "Genomics course at Aarhus University" +description: "Genomics course at Aarhus University" +author: Samuele Soraggi +date: 2022-06-01 +categories: [genomics, AU] +--- + +A month-long course in Genomics taught by Professors Mikkel Schierup and Stig Andersen has started with lead supercomputing support on UCloud by Sandbox data scientist and course instructor Samuele Soraggi. Computational exercises in NGS analysis were deployed in a UCloud project for use by 47 graduate students with primarily molecular biology and clinical backgrounds and no prior supercomputing experience! Post-course update: We received many positive reviews on use of the Genomics Sandbox training materials on UCloud! \ No newline at end of file diff --git a/news/2022-08-18-bulk-ku.qmd b/news/2022-08-18-bulk-ku.qmd new file mode 100644 index 00000000..9b383f4b --- /dev/null +++ b/news/2022-08-18-bulk-ku.qmd @@ -0,0 +1,11 @@ +--- +title: "Bulk RNA-seq course at University of Copenhagen" +description: "Bulk RNA-seq course at University of Copenhagen" +author: Jennuifer Bartell +date: 2022-08-18 +categories: [transcriptomics, course, KU, HeaDS, reNEW] +--- + +Today we began teaching our brand new bulk RNA-seq course to researchers (from PhD students to professors) at SUND at the University of Copenhagen. We had 32 workshop participants join us for two days of lectures and exercises on UCloud. We'd like to extend our thanks to our workshop collaborators, data scientists from the SUND DataLab at KU's Center for Health Data Science as well as the genomics platform at the NNF Center for Stem Cell Medicine (reNEW). + +For those that could not enroll for this session, you can find the relevant material [here](https://hds-sandbox.github.io/bulk_RNAseq_course/). \ No newline at end of file diff --git a/news/2022-09-06-genomics-launch.qmd b/news/2022-09-06-genomics-launch.qmd new file mode 100644 index 00000000..fc01c551 --- /dev/null +++ b/news/2022-09-06-genomics-launch.qmd @@ -0,0 +1,9 @@ +--- +title: "Genomics Sandbox app launched on UCloud!" +description: "Genomics Sandbox app launched on UCloud!" +author: Samuele Soraggi +date: 2022-09-06 +categories: [genomics, ucloud] +--- + +We have deployed our first standalone Sandbox app on UCloud! Please see the [Access](https://hds-sandbox.github.io/access/UCloud) page for instructions on how to find our Sandbox apps on UCloud - this first one is titled 'Genomics Sandbox' and module documentation is linked from the UCloud app page as well as here in [Modules](https://hds-sandbox.github.io/modules). \ No newline at end of file diff --git a/news/2022-11-15-support-bioinf-sdu.qmd b/news/2022-11-15-support-bioinf-sdu.qmd new file mode 100644 index 00000000..61c8a29c --- /dev/null +++ b/news/2022-11-15-support-bioinf-sdu.qmd @@ -0,0 +1,9 @@ +--- +title: "Sandbox support within 'Workshops in Applied Bioinformatics' at SDU" +description: "Sandbox support for a workshop in the series 'Workshops in Applied Bioinformatics' at SDU" +author: Jacob Fredegaard Hansen +date: 2022-11-15 +categories: [bioinformatics, SDU] +--- + +Sandbox data scientist Jacob Fredegaard Hansen created a tutorial on how to use ColabFold for a one day workshop as part of the ['Workshops in Applied Bioinformatics'](https://odin.sdu.dk/sitecore/index.php?a=searchfagbesk&internkode=bmb209&lang=en) series taught by Sandbox collaborator Veit Schwammle. The material is accessible on the Sandbox website (see [Modules](https://hds-sandbox.github.io/modules/), Proteomics) for any UCloud user alongside the UCloud ColabFold App. \ No newline at end of file diff --git a/news/2022-11-30-advancedstatlearning.qmd b/news/2022-11-30-advancedstatlearning.qmd new file mode 100644 index 00000000..873f6754 --- /dev/null +++ b/news/2022-11-30-advancedstatlearning.qmd @@ -0,0 +1,9 @@ +--- +title: "Sandbox support for 'Advanced Statistical Learning'" +description: "Sandbox support for 'Advanced Statistical Learning'" +author: Samuele Soraggi +date: 2022-11-30 +categories: [AU, statistics, ML] +--- + +Sandbox data scientist Samuele Soraggi spent two weeks teaching for the Fall 2023 course ['Advanced Statistical Learning'](https://kursuskatalog.au.dk/da/course/115396/Advanced-Statistical-Learning) taught by Prof. Asger Hobolth at Aarhus University. \ No newline at end of file diff --git a/news/2022-12-10-transcriptomics-launch.qmd b/news/2022-12-10-transcriptomics-launch.qmd new file mode 100644 index 00000000..1c44b84a --- /dev/null +++ b/news/2022-12-10-transcriptomics-launch.qmd @@ -0,0 +1,9 @@ +--- +title: "Transcriptomics Sandbox app launched on UCloud!" +description: "Transcriptomics Sandbox app launched on UCloud!" +author: Jose AR Herrera +date: 2022-11-15 +categories: [transcriptomics, ucloud] +--- + +We have deployed our second standalone Sandbox app on UCloud! Please see the [Access](https://hds-sandbox.github.io/access/UCloud) page for instructions on how to find our Sandbox apps on UCloud - This one is titled 'Transcriptomics Sandbox' and module documentation is linked from the UCloud app page as well as here in [Modules](https://hds-sandbox.github.io/modules). \ No newline at end of file diff --git a/news/2023-01-08-platform-computerome.qmd b/news/2023-01-08-platform-computerome.qmd new file mode 100644 index 00000000..d69dc9c8 --- /dev/null +++ b/news/2023-01-08-platform-computerome.qmd @@ -0,0 +1,9 @@ +--- +title: "Soft launch of the new Course Platform at Computerome" +description: "Soft launch of the new Course Platform at Computerome" +author: Jesper R Christiansen +date: 2023-01-08 +categories: [computerome] +--- + +Sandbox data scientist Jesper Roy Christiansen has been integral to the development of a new ['Course Platform'](https://www.computerome.dk/solutions/course-platform) at Computerome, the HPC platform at the Technical University of Denmark. Built as a collaboration between the Sandbox and Computerome, the Course Platform will host its first users, students in 'Fra real-world data til personlig medicin', a course of KU's MS in Personlig Medicin. Sandbox coordinator Jennifer Bartell and Sandbox PI Martin Boegsted have also been involved in testing this new system during course setup. See the above link as well as [HPC Access](https://hds-sandbox.github.io/access) for more details on this platform and how you can also use this new platform to host courses (with or without Sandbox involvement!). \ No newline at end of file diff --git a/news/2023-01-10-spring-support.qmd b/news/2023-01-10-spring-support.qmd new file mode 100644 index 00000000..d68e91b2 --- /dev/null +++ b/news/2023-01-10-spring-support.qmd @@ -0,0 +1,21 @@ +--- +title: "Sandbox support for Spring 2023 courses" +description: "Sandbox support for Spring 2023 courses" +author: Jennifer Bartell +date: 2023-01-10 +categories: [courses, population-genomics, personalized-medicine, single-cell, HeaDS, reNEW, transcriptomics] +--- + +The Health Data Science sandbox is working with the following courses during spring 2023: + + + +- **Sandbox support for Population Genomics** + +Exercises for an [MS course on Population Genomics](https://kursuskatalog.au.dk/en/course/117821/Population-Genomics) taught by Prof. Kasper Munch at Aarhus University are being implemented on UCloud by Sandbox data scientist Samuele Soraggi. Students will explore the training materials on UCloud during the Spring 2023 semester, after which the materials will be accessible to any UCloud user via the Genomics Sandbox App. + +- **Fra real-world data til personlig medicin with Course Platform & Sandbox support** +The second round of the course ['Fra real-world data til personlig medicin'](https://personligmedicin.ku.dk/kursus/realworld/) in KU's MS in Personlig Medicin begins in January with an introduction to CLL-TIM, a predictive model for chronic lymphocytic leukemia deployed by Prof. Carsten Niemann, an introduction by Sandbox coordinator Jennifer Bartell to the new Course Platform at Computerome built with Sandbox help for hosting courses with HPC resources, and an introduction to building predictive models using TidyModels in R by Prof. Rasmus Broendum. The course will run through April with 10 continuing education students building their own predictive models using a new and improved synthetic CLL dataset developed by Sandbox data scientist Sander Boisen Valentin. Jennifer and Rasmus are also manning the Sandbox Slack workspace to field student questions about the dataset and their model building. + + - **Sandbox support for 'Single-cell, Single-Molecule: The Next Level in Cell Biology'** +An NNF-funded course, ['Single-cell, Single-Molecule: The Next Level in Cell Biology'](https://kursuskatalog.au.dk/en/course/118020/Single-cell-Single-molecule-The-Next-Level-in-Cell-Biology) combining experimental and computational approaches to RNA sequencing is starting at Aarhus University. In addition to course-responsible professor Stig Andersen and co-teachers Victoria Birkedal and Thomas Boesen, Sandbox PI Mikkel Schierup will be contributing along with Sandbox data scientist Samuele Soraggi. Samuele is adapting the Transcriptomics App material on UCloud to supply tutorials and exercises for this hefty course as well as serving as a teaching assistant. The course materials will be available to all users of the Transcriptomics Sandbox App on UCloud in the future. \ No newline at end of file diff --git a/news/2023-01-18-bulk-KU.qmd b/news/2023-01-18-bulk-KU.qmd new file mode 100644 index 00000000..418ea8ab --- /dev/null +++ b/news/2023-01-18-bulk-KU.qmd @@ -0,0 +1,13 @@ +--- +title: "Second bulk RNA-seq course at the University of Copenhagen" +description: "Second bulk RNA-seq course at the University of Copenhagen" +author: Jennifer Bartell +date: 2023-01-18 +categories: [workshop, course, KU, nextflow] +--- + +On 18th of January we taught the second iteration of our bulk RNA-seq course to researchers (from PhD students to professors) at SUND at the University of Copenhagen. We had ~50 workshop participants joining us for three days of lectures and exercises on UCloud. This time, we introduced preprocessing theory (read QC, alignment and quantification) and the use of automated workflows using the [nf-core rnaseq pipeline](https://nf-co.re/rnaseq). + +For those that could not enroll for this session, you can find the updated material [here](https://hds-sandbox.github.io/bulk_RNAseq_course/). We have moved the datasets and slides to a [zenodo repository](https://zenodo.org/record/7565997) + +We'd like to extend our thanks to our workshop collaborators, data scientists from the SUND DataLab at KU's Center for Health Data Science as well as the genomics platform at the NNF Center for Stem Cell Medicine (reNEW). \ No newline at end of file diff --git a/news/2023-05-31-rollout-ucloud.qmd b/news/2023-05-31-rollout-ucloud.qmd new file mode 100644 index 00000000..4c574e82 --- /dev/null +++ b/news/2023-05-31-rollout-ucloud.qmd @@ -0,0 +1,9 @@ +--- +title: "Sandbox App updates on UCloud rolled out" +description: "Sandbox App updates on UCloud rolled out" +author: Jennifer Bartell +date: 2023-05-31 +categories: [genomics, transcriptomics, proteomics, ucloud, KU, HeaDS, reNEW, transcriptomics] +--- + +New versions of the Genomics Sandbox App, the Transcriptomics Sandbox App, and the Proteomics Sandbox App have all been launched on UCloud this month! Check out the new components in the training modules such as a GWAS module in Genomics and new tools in Transcriptomics and Proteomics. Updates were also informed by the different courses supported during Spring 2023. With these courses wrapping up this month, the associated new training materials have also been included in the new versions of the apps. \ No newline at end of file diff --git a/news/2023-06-19-KU-bulk.qmd b/news/2023-06-19-KU-bulk.qmd new file mode 100644 index 00000000..bfef23b3 --- /dev/null +++ b/news/2023-06-19-KU-bulk.qmd @@ -0,0 +1,9 @@ +--- +title: "Workshop on bulkRNA-seq data" +description: "3 days of bulk RNA-seq at the University of Copenhagen" +author: Jennifer Bartell +date: 2023-06-19 +categories: [workshop, KU, HeaDS, reNEW, transcriptomics] +--- + +Our teaching team (from the Sandbox, the HeaDS DataLab, and reNEW's genomics platform) hosted another 3 day workshop on bulk RNA-seq. The 34 participants used the updated version of the UCloud Transcriptomics App which provided the smoothest experience yet for both trainers and trainees. A new goal for the next course run is to add a student project to support independent implementation and exploration of the course content. \ No newline at end of file diff --git a/news/2023-08-29-aarhus-workshop.qmd b/news/2023-08-29-aarhus-workshop.qmd new file mode 100644 index 00000000..71b5f78c --- /dev/null +++ b/news/2023-08-29-aarhus-workshop.qmd @@ -0,0 +1,10 @@ +--- +title: "Sandbox workshop in Aarhus" +description: "3 days of Sandbox demos at Aarhus University" +author: Samuele Soraggi +date: 2023-08-29 +categories: [workshop, AU] +--- + + +Sandbox data scientist Samuele Soraggi hosted a three day speed run through Sandbox apps at the Bioinformatics Research Center. The 26 participants joined for genomics, transcriptomics, and/or proteomics app demos depending on their interests. This thorough omics demo had maxed out participant sign-ups and an enthusiastic crew enjoyed the sessions alongside a bit of networking across disciplines. We plan to host more of these type of workshops given the event's success! \ No newline at end of file diff --git a/news/2023-09-07-workshop-conference.qmd b/news/2023-09-07-workshop-conference.qmd new file mode 100644 index 00000000..deec1abb --- /dev/null +++ b/news/2023-09-07-workshop-conference.qmd @@ -0,0 +1,10 @@ +--- +title: "'Digging into the Health Data Science Sandbox' workshop" +description: "'Digging into the Health Data Science Sandbox' at the Danish Bioinformatics Conference 2023 (Aarhus University)" +author: Jennifer Bartell +date: 2023-09-07 +categories: [workshop, conference, AU] +--- + + +The full team of Sandbox data scientists hosted a 4 hour workshop at the Danish Bioinformatics conference where they gave a taster session of each of our 3 omics apps. We learned that multi-omics analysis were a substantial draw for the crowd at the DBC and are making plans to address this interest in future events. \ No newline at end of file diff --git a/news/2023-10-25-RDM_NGS.qmd b/news/2023-10-25-RDM_NGS.qmd new file mode 100644 index 00000000..16921a7b --- /dev/null +++ b/news/2023-10-25-RDM_NGS.qmd @@ -0,0 +1,11 @@ +--- +title: "A course on RDS for NGS data" +description: "'Research Data Management for NGS Data' with DeiC (Technical University of Denmark)" +author: Jose AR Herrera +date: 2023-11-07 +categories: [course, data-management, DTU] +--- + + + +Sandbox data scientist Jose Alejandro Romero Herrera ran the first instance of a new module on research data management practices he developed specifically for NGS data. Twelve participants were hosted in conjunction with DeiC at DTU, and were exposed to tools like bash, conda, git, and cookie cutter in their quest to organize their omics data. \ No newline at end of file diff --git a/news/2023-11-07-RDMtalk.qmd b/news/2023-11-07-RDMtalk.qmd new file mode 100644 index 00000000..90649d30 --- /dev/null +++ b/news/2023-11-07-RDMtalk.qmd @@ -0,0 +1,9 @@ +--- +title: "From Data Chaos to Data Harmony" +description: "From Data Chaos to Data Harmony: Managing NGS Data in a Wetlab - at the annual DeiC conference" +author: Jennifer Bartell +date: 2023-11-07 +categories: [conference, data-management, talk] +--- + +Sandbox data scientist Jose Alejandro Romero Herrera gave a talk in the Data Management speaker track at the annual Danish E-Infrastructure Consortium (DeiC) conference in Kolding, Denmark. The talk was well received at the biggest DeiC conference ever (250 participants). \ No newline at end of file diff --git a/news/2023-11-09-proteomics_biostat_SDU.qmd b/news/2023-11-09-proteomics_biostat_SDU.qmd new file mode 100644 index 00000000..37b23066 --- /dev/null +++ b/news/2023-11-09-proteomics_biostat_SDU.qmd @@ -0,0 +1,9 @@ +--- +title: "Updates from SDU" +description: "Updated Proteomics Sandbox App supporting of a Biostatistics course at SDU" +author: Jacob Fredegaard Hansen +date: 2023-11-09 +categories: [proteomics, course, SDU] +--- + +The Proteomics Sandbox Application has recently undergone a significant update, enhancing its security features to ensure safer usage for its users. In this latest iteration, Sandbox data scientist Jacob Fredegaard Hansen has expanded the app's software suite by introducing two new tools: DIA-NN and MZmine, catering to the metabolomics field. Furthermore, the pre-existing software within the application has been refreshed and updated to the latest versions, ensuring that the Proteomics Sandbox Application remains at the cutting-edge of the field. Excitingly, this application will be actively utilized in the course "BMB831: Biostatistics in R II" at the University of Southern Denmark throughout this autumn, showcasing its relevance and applicability in academic settings. \ No newline at end of file diff --git a/news/2023-12-12-SE3D.qmd b/news/2023-12-12-SE3D.qmd new file mode 100644 index 00000000..d7840c49 --- /dev/null +++ b/news/2023-12-12-SE3D.qmd @@ -0,0 +1,9 @@ +--- +title: "NNF Collaborative Data Science award news: the SE3D project!" +#description: "Data Scientist, Copenhagen University" +author: Jennifer Bartell +date: 2023-12-12 +categories: [novo-nordisk-foundation, grant, AAU, KU] +--- + +Today we got the news that we will be able to hire 5 new research staff focused on synthetic health data over the next 4 years. The SE3D project - Synthetic health data: ethical development and deployment via deep learning approaches - will be led by Sandbox PIs Martin Boegsted (AAU) and Anders Krogh (KU) alongside Sandbox project lead Jennifer Bartell (KU) and a new collaborator, Prof. Jan Trzaskowski from AAU Law. We're really excited to set up this research arm that shares so many Sandbox interests and potential for interaction. The project starts from 1 May 2024, with much thanks to the NNF for their continued support of our ideas. Look out for job ads in the spring from KU and AAU! \ No newline at end of file diff --git a/news/2024-01-31-manuscript.qmd b/news/2024-01-31-manuscript.qmd new file mode 100644 index 00000000..5b781fa9 --- /dev/null +++ b/news/2024-01-31-manuscript.qmd @@ -0,0 +1,9 @@ +--- +title: "A primer for Synthetic health data" +#description: "Data Scientist, Copenhagen University" +author: Jennifer Bartell +date: 2024-01-31 +categories: [manuscript, synthetic-data, KU, AAU] +--- + +In collaboration with Prof. Henning Langberg at KU Public Health and funding from Erhvervsfyrtaarn Sund Vaegt, Jennifer Bartell has developed a manuscript that discusses technical, regulatory, and deployment solutions and challenges for synthetic health data from a broad perspective. This manuscript was developed in collaboration with Sandbox partners Sander Boisen Valentin and Martin Boegsted of AAU, and we plan to submit it to a journal soon. For now, check out [our manuscript on arXiv](https://arxiv.org/abs/2401.17653)! \ No newline at end of file diff --git a/news/2024-02-01-DDSAD3A.qmd b/news/2024-02-01-DDSAD3A.qmd new file mode 100644 index 00000000..cd0d2bec --- /dev/null +++ b/news/2024-02-01-DDSAD3A.qmd @@ -0,0 +1,9 @@ +--- +title: "DDSA PhD meetup and D3A conference" +#description: "Data Scientist, Copenhagen University" +author: Jennifer Bartell +date: 2024-02-01 +categories: [conference, DDSA] +--- + +Several Sandbox staff represented the project at both the DDSA PhD Meetup with a practical presentation on research data management and with posters at D3A, the national data science meeting. It was nice to meet up in person and also make new connections with the excellent cadre of conference attendees. Thanks to the DDSA secretariat for their invitation and organizational efforts. \ No newline at end of file diff --git a/news/2024-02-09-proteomics-sandbox.qmd b/news/2024-02-09-proteomics-sandbox.qmd new file mode 100644 index 00000000..82d68b52 --- /dev/null +++ b/news/2024-02-09-proteomics-sandbox.qmd @@ -0,0 +1,9 @@ +--- +title: "Course support at SDU" +#description: "Data Scientist, Copenhagen University" +author: Jacob Fredegaard Hansen +date: 2024-02-09 +categories: [proteomics, SDU, course] +--- + +During the Spring Semester 2024, Sandbox data scientist Jacob Fredegaard Hansen will be assisting with teaching and tools in the course BMB834: Protein Structure, Dynamics, and Modelling at the University of Southern Denmark. Here, Jacob will provide Sandbox support, and materials will be used for applying computational methods for protein structure retrieval and visualization, as well as for applying high-performance computing (HPC) methods for protein structure modeling. \ No newline at end of file diff --git a/overrides/.icons/custom/orcid.svg b/overrides/.icons/custom/orcid.svg deleted file mode 100644 index 2bddf44f..00000000 --- a/overrides/.icons/custom/orcid.svg +++ /dev/null @@ -1,17 +0,0 @@ - - - - - - - - - - - \ No newline at end of file diff --git a/overrides/main.html b/overrides/main.html deleted file mode 100644 index dbb20687..00000000 --- a/overrides/main.html +++ /dev/null @@ -1,16 +0,0 @@ -{% extends "base.html" %} - -{% block site_meta %} - -{% endblock %} \ No newline at end of file diff --git a/develop/recommended/recommended.md b/recommended/recommended.md similarity index 98% rename from develop/recommended/recommended.md rename to recommended/recommended.md index 366aaf6b..0ac9ae43 100644 --- a/develop/recommended/recommended.md +++ b/recommended/recommended.md @@ -1,17 +1,17 @@ ---- -layout: page -title: Recommended -permalink: /recommended/ -nav_order: 5 -hide: - - footer - - toc - - navigation ---- - - -## Recommended Resources in Health Data Science - -Many outside resources are available to support education in health data science, ranging from beginner-level introductions to R or Python (the primary languages of health data science) to other teaching resources and tutorials created at universities and life science organizations. - -We encourage you to explore the training platform provided by [ELIXIR](https://elixir-europe.org/) (Europe's distributed infrastructure for life-science data). On this platform ([TeSS](https://tess.elixir-europe.org/)), you can find a registry of training materials as well as webinars, workshops, and in-person courses in bioinformatics, modelling, data management, and life science database usage among other topics. +--- +layout: page +title: Recommended +permalink: /recommended/ +nav_order: 5 +hide: + - footer + - toc + - navigation +--- + + +## Recommended Resources in Health Data Science + +Many outside resources are available to support education in health data science, ranging from beginner-level introductions to R or Python (the primary languages of health data science) to other teaching resources and tutorials created at universities and life science organizations. + +We encourage you to explore the training platform provided by [ELIXIR](https://elixir-europe.org/) (Europe's distributed infrastructure for life-science data). On this platform ([TeSS](https://tess.elixir-europe.org/)), you can find a registry of training materials as well as webinars, workshops, and in-person courses in bioinformatics, modelling, data management, and life science database usage among other topics. diff --git a/renv.lock b/renv.lock new file mode 100644 index 00000000..7a1495b2 --- /dev/null +++ b/renv.lock @@ -0,0 +1,237 @@ +{ + "R": { + "Version": "4.3.2", + "Repositories": [ + { + "Name": "CRAN", + "URL": "https://cloud.r-project.org" + } + ] + }, + "Packages": { + "curl": { + "Package": "curl", + "Version": "5.2.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R" + ], + "Hash": "ce88d13c0b10fe88a37d9c59dba2d7f9" + }, + "dbplyr": { + "Package": "dbplyr", + "Version": "2.4.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "DBI", + "R", + "R6", + "blob", + "cli", + "dplyr", + "glue", + "lifecycle", + "magrittr", + "methods", + "pillar", + "purrr", + "rlang", + "tibble", + "tidyr", + "tidyselect", + "utils", + "vctrs", + "withr" + ], + "Hash": "59351f28a81f0742720b85363c4fdd61" + }, + "dplyr": { + "Package": "dplyr", + "Version": "1.1.4", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "R6", + "cli", + "generics", + "glue", + "lifecycle", + "magrittr", + "methods", + "pillar", + "rlang", + "tibble", + "tidyselect", + "utils", + "vctrs" + ], + "Hash": "fedd9d00c2944ff00a0e2696ccf048ec" + }, + "ggplot2": { + "Package": "ggplot2", + "Version": "3.4.4", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "MASS", + "R", + "cli", + "glue", + "grDevices", + "grid", + "gtable", + "isoband", + "lifecycle", + "mgcv", + "rlang", + "scales", + "stats", + "tibble", + "vctrs", + "withr" + ], + "Hash": "313d31eff2274ecf4c1d3581db7241f9" + }, + "gt": { + "Package": "gt", + "Version": "0.10.1", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "base64enc", + "bigD", + "bitops", + "cli", + "commonmark", + "dplyr", + "fs", + "glue", + "htmltools", + "htmlwidgets", + "juicyjuice", + "magrittr", + "markdown", + "reactable", + "rlang", + "sass", + "scales", + "tidyselect", + "vctrs", + "xml2" + ], + "Hash": "03009c105dfae79460b8eb9d8cf791e4" + }, + "gtable": { + "Package": "gtable", + "Version": "0.3.4", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "cli", + "glue", + "grid", + "lifecycle", + "rlang" + ], + "Hash": "b29cf3031f49b04ab9c852c912547eef" + }, + "knitr": { + "Package": "knitr", + "Version": "1.45", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "evaluate", + "highr", + "methods", + "tools", + "xfun", + "yaml" + ], + "Hash": "1ec462871063897135c1bcbe0fc8f07d" + }, + "markdown": { + "Package": "markdown", + "Version": "1.12", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "commonmark", + "utils", + "xfun" + ], + "Hash": "765cf53992401b3b6c297b69e1edb8bd" + }, + "rmarkdown": { + "Package": "rmarkdown", + "Version": "2.25", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "bslib", + "evaluate", + "fontawesome", + "htmltools", + "jquerylib", + "jsonlite", + "knitr", + "methods", + "stringr", + "tinytex", + "tools", + "utils", + "xfun", + "yaml" + ], + "Hash": "d65e35823c817f09f4de424fcdfa812a" + }, + "tidyverse": { + "Package": "tidyverse", + "Version": "2.0.0", + "Source": "Repository", + "Repository": "CRAN", + "Requirements": [ + "R", + "broom", + "cli", + "conflicted", + "dbplyr", + "dplyr", + "dtplyr", + "forcats", + "ggplot2", + "googledrive", + "googlesheets4", + "haven", + "hms", + "httr", + "jsonlite", + "lubridate", + "magrittr", + "modelr", + "pillar", + "purrr", + "ragg", + "readr", + "readxl", + "reprex", + "rlang", + "rstudioapi", + "rvest", + "stringr", + "tibble", + "tidyr", + "xml2" + ], + "Hash": "c328568cd14ea89a83bd4ca7f54ae07e" + } + } +} \ No newline at end of file diff --git a/resources/bioschema.html b/resources/bioschema.html new file mode 100644 index 00000000..b315d249 --- /dev/null +++ b/resources/bioschema.html @@ -0,0 +1,34 @@ + \ No newline at end of file diff --git a/resources/bioschema_example.html b/resources/bioschema_example.html new file mode 100644 index 00000000..a4cf9733 --- /dev/null +++ b/resources/bioschema_example.html @@ -0,0 +1,93 @@ + \ No newline at end of file diff --git a/references.bib b/resources/references.bib similarity index 99% rename from references.bib rename to resources/references.bib index 9cde9542..0f32c935 100644 --- a/references.bib +++ b/resources/references.bib @@ -20,4 +20,4 @@ @misc{creative_commons_2022 url={https://creativecommons.org/}, publisher={Creative Commons}, note = {Accessed: 2022-08-11} -} +} \ No newline at end of file diff --git a/resources/samplesheet.csv b/resources/samplesheet.csv new file mode 100644 index 00000000..35df78b0 --- /dev/null +++ b/resources/samplesheet.csv @@ -0,0 +1,9 @@ +sample,fastq_1,fastq_2,strandedness,condition +Control_3,778339/merge/Irrel_kd_3.fastq.gz,,unstranded,control +Control_2,778339/merge/Irrel_kd_2.fastq.gz,,unstranded,control +Control_1,778339/merge/Irrel_kd_1.fastq.gz,,unstranded,control +Mov10_oe_3,778339/merge/Mov10_oe_3.fastq.gz,,unstranded,MOV10_overexpression +Mov10_oe_2,778339/merge/Mov10_oe_2.fastq.gz,,unstranded,MOV10_overexpression +Mov10_oe_1,778339/merge/Mov10_oe_1.fastq.gz,,unstranded,MOV10_overexpression +Mov10_kd_3,778339/merge/Mov10_kd_3.fastq.gz,,unstranded,MOV10_knockdown +Mov10_kd_2,778339/merge/Mov10_kd_2.fastq.gz,,unstranded,MOV10_knockdown diff --git a/develop/workshop/img/addedfolder_binfconf2023.png b/workshop/img/addedfolder_binfconf2023.png similarity index 100% rename from develop/workshop/img/addedfolder_binfconf2023.png rename to workshop/img/addedfolder_binfconf2023.png diff --git a/develop/workshop/img/addedfolder_transcriptomics.png b/workshop/img/addedfolder_transcriptomics.png similarity index 100% rename from develop/workshop/img/addedfolder_transcriptomics.png rename to workshop/img/addedfolder_transcriptomics.png diff --git a/develop/workshop/img/addfile2_transcriptomics.png b/workshop/img/addfile2_transcriptomics.png similarity index 100% rename from develop/workshop/img/addfile2_transcriptomics.png rename to workshop/img/addfile2_transcriptomics.png diff --git a/develop/workshop/img/addfolder.png b/workshop/img/addfolder.png similarity index 100% rename from develop/workshop/img/addfolder.png rename to workshop/img/addfolder.png diff --git a/develop/workshop/img/addfolder_transcriptomics.png b/workshop/img/addfolder_transcriptomics.png similarity index 100% rename from develop/workshop/img/addfolder_transcriptomics.png rename to workshop/img/addfolder_transcriptomics.png diff --git a/develop/workshop/img/chooseapp_transcriptomics.png b/workshop/img/chooseapp_transcriptomics.png similarity index 100% rename from develop/workshop/img/chooseapp_transcriptomics.png rename to workshop/img/chooseapp_transcriptomics.png diff --git a/develop/workshop/img/choosefolder_binfconf2023.png b/workshop/img/choosefolder_binfconf2023.png similarity index 100% rename from develop/workshop/img/choosefolder_binfconf2023.png rename to workshop/img/choosefolder_binfconf2023.png diff --git a/develop/workshop/img/project.png b/workshop/img/project.png similarity index 100% rename from develop/workshop/img/project.png rename to workshop/img/project.png diff --git a/develop/workshop/img/rstudio_transcriptomics.png b/workshop/img/rstudio_transcriptomics.png similarity index 100% rename from develop/workshop/img/rstudio_transcriptomics.png rename to workshop/img/rstudio_transcriptomics.png diff --git a/develop/workshop/img/rsutdio2_transcriptomics.png b/workshop/img/rsutdio2_transcriptomics.png similarity index 100% rename from develop/workshop/img/rsutdio2_transcriptomics.png rename to workshop/img/rsutdio2_transcriptomics.png diff --git a/develop/workshop/img/running_transcriptomics.png b/workshop/img/running_transcriptomics.png similarity index 100% rename from develop/workshop/img/running_transcriptomics.png rename to workshop/img/running_transcriptomics.png diff --git a/develop/workshop/img/setting_transcriptomics.png b/workshop/img/setting_transcriptomics.png similarity index 100% rename from develop/workshop/img/setting_transcriptomics.png rename to workshop/img/setting_transcriptomics.png diff --git a/develop/workshop/workshop.md b/workshop/workshop.md similarity index 98% rename from develop/workshop/workshop.md rename to workshop/workshop.md index 4aaf92f1..7c2a57ff 100644 --- a/develop/workshop/workshop.md +++ b/workshop/workshop.md @@ -1,146 +1,146 @@ ---- -permalink: /workshop/ -hide: - - navigation - - toc - - footer ---- - -

Sandbox Workshop

-

Autumn 2023

- -

Welcome to the homepage for in-person workshops introducing the Health Data Science Sandbox to potential users. Thanks for joining us!

- -!!! info "Upcoming Workshop at AAU" - Intro to the Health Data Science Sandbox at Aalborg University - - Interested in adding **new analysis techniques** to your health data science skill set? Curious about how to use Danish **supercomputing resources**? Looking for how to host **coding material for your own course** as an academic educator? Join our workshop for a demo of the Health Data Science Sandbox, a platform for training and research being built collaboratively across 5 universities. We will introduce our mission, network, and resources and then give you an opportunity to try out our omics-focused training module with **single cell RNA sequencing data** in a guided live session on UCloud (experience with R is a plus but not required). The workshop is open for students, researchers, and educators. - - - Instructors: Sandbox Data Scientists Jakob Skelmose (AAU CLINDA) and Samuele Soraggi (AU BiRC) - - Time: 6 December 2023, 1200-1600 - - Location: SLV249 moedelokale 11.01.032 at AAU SUND - - Prerequisites: anyone can join for the first hour to hear about Sandbox resources, how to use them, and plans for the future. Some experience with R or Python will help you if you join the following transcriptomics app demo for the rest of the session (but all are welcome). - - [Sign up](https://www.moodle.aau.dk/course/view.php?id=50047) on AAU's Moodle or [email the Sandbox](mailto:nhds_sandbox@sund.ku.dk) with questions - - - -

Agenda

- 1. The Sandbox Concept - 30 minutes - 2. Accessing Sandbox resources - 10 minutes - 3. Try out our transcriptomics module - 2 hours - 4. Discussion and feedback - 20 minutes - -## The Sandbox concept -The Health Data Science Sandbox aims to be a training resource for bioinformaticians, data scientists, and those generally curious about how to investigate large biomedical datasets. We are an active and developing project seeking interested users (both trainees and educators). All of our open-source materials are available on our [Github page](https://github.com/hds-sandbox) and much more information is available on the rest of the website you are currently visiting! We work with both UCloud and Computerome (major Danish academic supercomputers) - see our [HPC Access page](https://hds-sandbox.github.io/access/index.html) for more info on each set up. - -## Access Sandbox resources -We currently provide training materials and resources as topical apps on UCloud, the supercomputer located at the University of Southern Denmark. To use these resources, you'll need the following: - - 1. Log onto UCloud at the address http://cloud.sdu.dk using your university credentials. - 2. the ability to navigate in linux / RStudio / Jupyter. You don't need to be an expert, but it is beyond our ambitions (and course material) to teach you how to code and how to run analyses simultaneously. We recommend a basic R or Python course before diving in. - - **Note:** - - 1. To use Sandbox materials outside of the workshop, you can request a project by clicking on `apply for resources` in your uCloud dashboard. - 2. If you are a BSc or MSc student, you need a supervisor to apply on your behalf, or you can try to apply yourself mentioning the supervisor approval in the application. - 3. Remember, however, that you have 1000Kr of computing credit, and around 50GB of free storage to work on uCLoud. - -## Try out our transcriptomics module -So our Sandbox data scientists have finished their intro at the workshop? Great, now the brave ones in the audience can try out one of our apps in a live session. Today we are demoing: - -![Transcriptomics](../assets/images/transcriptomics.png){ align=left width="10%" } -### Transcriptomics -If you're interested in bulk or single cell RNA sequencing analysis and visualization, join Sandbox Data Scientist Samuele Soraggi from Aarhus University in testing out our Transcriptomics Sandbox app. - -Follow these instructions to try our app: - -1. Click on the button below to join the project for today: - - - - - Green Button - - - - - Go to Link - - -You should see a message on your browser where you have to accept the invitation to the project. This will add you to a project on uCloud, where we have data and extra computing credit for the course. - -2. Be sure you have joined the project. Check if you have the project OMICS workshop from the project menu (red circle). Afterwards, click on the App menu (green circle) -![](./img/project.png) - -3. * Find the app `Transcriptomics Sandbox` (red circle), which is under the title `Featured`. - -![](./img/chooseapp_transcriptomics.png) - -* Click on it. You will get into the settings window. Choose any Job Name (Nr 1 in the figure below), how many hours you want to use for the job (Nr 2; choose at least 3 hours, you can increase this later), and how many CPUs (Nr 3, choose at least 4 CPUs). Choose the course `RNAseq in RStudio` from the drop-down menu (Nr 4). Finally, click on the blue button `Add Folder`. - -![](./img/setting_transcriptomics.png) - -* Now, click on the browsing bar that appears (red circle). - -![](./img/addfolder.png) - -* In the appearing window, you should see already a folder called `Intro_to_scRNAseq_R`. Click on `Use` at its right (red circle) - -![](./img/choosefolder_binfconf2023.png) - - -* Afterwards, you should have something like this in the settings page: - -![](./img/addedfolder_binfconf2023.png) - -* Now, click on Submit to start the app (the button is on the right side of the settings page) - -* You will now enter a waiting queue. When the session starts, the timer begins to count down (red circle), and you should be able to open the interface through the button (green circle). Note the buttons to add time to your session (blue circle) and the button to stop the session when you are done (pink circle) - -![](./img/running_transcriptomics.png) - -* Open the interface by clicking on the button (green circle of figure above). Sometimes you are warned of a missing connection: simply refresh the page. You will enter `Rstudio`, well-known interface to code in `R`. - -* Run the following command to download the tutorial: `download.file("https://raw.githubusercontent.com/hds-sandbox/ELIXIR-workshop/main/Notebooks/scRNAseq_Tutorial_R.Rmd", "tutorial_scrna.Rmd")` - -* Open the file `tutorial_scrnaR.Rmd` that should now appear in the file browser of Rstudio. Click now on `visual` (on the tool bar) if you need to see the tutorial in a more readable format. - -* The executable code is inside chunks (called cells) to be executed in order from the first to the last using the little green arrow appearing on the right side of each code cell. - -* Read carefully through the tutorial and execute the code cells. You will see the outputs appearing as you proceed. - - -## Discussion and feedback -We hope you enjoyed the live demo. If you have broader questions, suggestions, or concerns, now is the time to raise them! If you are totally toast for the day, remember that you can check out longer versions of our tutorials as well as other topics and tools in each of the [Sandbox modules](https://hds-sandbox.github.io/modules/index.html) or join us for a multi-day in person [course](https://hds-sandbox.github.io/news/news.html). - -As data scientists, we also would be really happy for some quantifiable info and feedback - we want to build things that the Danish health data science community is excited to use. Please answer these [5 questions](survey link) for us before you head out for the day (*link activated on day of the workshop*). - - -

Nice meeting you and we hope to see you again!

- - - - - - - - +--- +permalink: /workshop/ +hide: + - navigation + - toc + - footer +--- + +

Sandbox Workshop

+

Autumn 2023

+ +

Welcome to the homepage for in-person workshops introducing the Health Data Science Sandbox to potential users. Thanks for joining us!

+ +!!! info "Upcoming Workshop at AAU" + Intro to the Health Data Science Sandbox at Aalborg University + + Interested in adding **new analysis techniques** to your health data science skill set? Curious about how to use Danish **supercomputing resources**? Looking for how to host **coding material for your own course** as an academic educator? Join our workshop for a demo of the Health Data Science Sandbox, a platform for training and research being built collaboratively across 5 universities. We will introduce our mission, network, and resources and then give you an opportunity to try out our omics-focused training module with **single cell RNA sequencing data** in a guided live session on UCloud (experience with R is a plus but not required). The workshop is open for students, researchers, and educators. + + - Instructors: Sandbox Data Scientists Jakob Skelmose (AAU CLINDA) and Samuele Soraggi (AU BiRC) + - Time: 6 December 2023, 1200-1600 + - Location: SLV249 moedelokale 11.01.032 at AAU SUND + - Prerequisites: anyone can join for the first hour to hear about Sandbox resources, how to use them, and plans for the future. Some experience with R or Python will help you if you join the following transcriptomics app demo for the rest of the session (but all are welcome). + - [Sign up](https://www.moodle.aau.dk/course/view.php?id=50047) on AAU's Moodle or [email the Sandbox](mailto:nhds_sandbox@sund.ku.dk) with questions + + + +

Agenda

+ 1. The Sandbox Concept - 30 minutes + 2. Accessing Sandbox resources - 10 minutes + 3. Try out our transcriptomics module - 2 hours + 4. Discussion and feedback - 20 minutes + +## The Sandbox concept +The Health Data Science Sandbox aims to be a training resource for bioinformaticians, data scientists, and those generally curious about how to investigate large biomedical datasets. We are an active and developing project seeking interested users (both trainees and educators). All of our open-source materials are available on our [Github page](https://github.com/hds-sandbox) and much more information is available on the rest of the website you are currently visiting! We work with both UCloud and Computerome (major Danish academic supercomputers) - see our [HPC Access page](https://hds-sandbox.github.io/access/index.html) for more info on each set up. + +## Access Sandbox resources +We currently provide training materials and resources as topical apps on UCloud, the supercomputer located at the University of Southern Denmark. To use these resources, you'll need the following: + + 1. Log onto UCloud at the address http://cloud.sdu.dk using your university credentials. + 2. the ability to navigate in linux / RStudio / Jupyter. You don't need to be an expert, but it is beyond our ambitions (and course material) to teach you how to code and how to run analyses simultaneously. We recommend a basic R or Python course before diving in. + + **Note:** + + 1. To use Sandbox materials outside of the workshop, you can request a project by clicking on `apply for resources` in your uCloud dashboard. + 2. If you are a BSc or MSc student, you need a supervisor to apply on your behalf, or you can try to apply yourself mentioning the supervisor approval in the application. + 3. Remember, however, that you have 1000Kr of computing credit, and around 50GB of free storage to work on uCLoud. + +## Try out our transcriptomics module +So our Sandbox data scientists have finished their intro at the workshop? Great, now the brave ones in the audience can try out one of our apps in a live session. Today we are demoing: + +![Transcriptomics](../assets/images/transcriptomics.png){ align=left width="10%" } +### Transcriptomics +If you're interested in bulk or single cell RNA sequencing analysis and visualization, join Sandbox Data Scientist Samuele Soraggi from Aarhus University in testing out our Transcriptomics Sandbox app. + +Follow these instructions to try our app: + +1. Click on the button below to join the project for today: + + + + + Green Button + + + + + Go to Link + + +You should see a message on your browser where you have to accept the invitation to the project. This will add you to a project on uCloud, where we have data and extra computing credit for the course. + +2. Be sure you have joined the project. Check if you have the project OMICS workshop from the project menu (red circle). Afterwards, click on the App menu (green circle) +![](./img/project.png) + +3. * Find the app `Transcriptomics Sandbox` (red circle), which is under the title `Featured`. + +![](./img/chooseapp_transcriptomics.png) + +* Click on it. You will get into the settings window. Choose any Job Name (Nr 1 in the figure below), how many hours you want to use for the job (Nr 2; choose at least 3 hours, you can increase this later), and how many CPUs (Nr 3, choose at least 4 CPUs). Choose the course `RNAseq in RStudio` from the drop-down menu (Nr 4). Finally, click on the blue button `Add Folder`. + +![](./img/setting_transcriptomics.png) + +* Now, click on the browsing bar that appears (red circle). + +![](./img/addfolder.png) + +* In the appearing window, you should see already a folder called `Intro_to_scRNAseq_R`. Click on `Use` at its right (red circle) + +![](./img/choosefolder_binfconf2023.png) + + +* Afterwards, you should have something like this in the settings page: + +![](./img/addedfolder_binfconf2023.png) + +* Now, click on Submit to start the app (the button is on the right side of the settings page) + +* You will now enter a waiting queue. When the session starts, the timer begins to count down (red circle), and you should be able to open the interface through the button (green circle). Note the buttons to add time to your session (blue circle) and the button to stop the session when you are done (pink circle) + +![](./img/running_transcriptomics.png) + +* Open the interface by clicking on the button (green circle of figure above). Sometimes you are warned of a missing connection: simply refresh the page. You will enter `Rstudio`, well-known interface to code in `R`. + +* Run the following command to download the tutorial: `download.file("https://raw.githubusercontent.com/hds-sandbox/ELIXIR-workshop/main/Notebooks/scRNAseq_Tutorial_R.Rmd", "tutorial_scrna.Rmd")` + +* Open the file `tutorial_scrnaR.Rmd` that should now appear in the file browser of Rstudio. Click now on `visual` (on the tool bar) if you need to see the tutorial in a more readable format. + +* The executable code is inside chunks (called cells) to be executed in order from the first to the last using the little green arrow appearing on the right side of each code cell. + +* Read carefully through the tutorial and execute the code cells. You will see the outputs appearing as you proceed. + + +## Discussion and feedback +We hope you enjoyed the live demo. If you have broader questions, suggestions, or concerns, now is the time to raise them! If you are totally toast for the day, remember that you can check out longer versions of our tutorials as well as other topics and tools in each of the [Sandbox modules](https://hds-sandbox.github.io/modules/index.html) or join us for a multi-day in person [course](https://hds-sandbox.github.io/news/news.html). + +As data scientists, we also would be really happy for some quantifiable info and feedback - we want to build things that the Danish health data science community is excited to use. Please answer these [5 questions](survey link) for us before you head out for the day (*link activated on day of the workshop*). + + +

Nice meeting you and we hope to see you again!

+ + + + + + + + diff --git a/develop/workshop/workshop_3demos.md b/workshop/workshop_3demos.md similarity index 98% rename from develop/workshop/workshop_3demos.md rename to workshop/workshop_3demos.md index f8d484c5..2f08ac52 100644 --- a/develop/workshop/workshop_3demos.md +++ b/workshop/workshop_3demos.md @@ -1,62 +1,62 @@ ---- -permalink: /workshop/ -hide: - - navigation - - toc - - footer ---- - -

Sandbox Workshop

-

Autumn 2023

- -

Welcome to the homepage for in-person workshops introducing the Health Data Science Sandbox to potential users. Thanks for joining us!

- -

Agenda

- + The Sandbox Concept - 30 minutes - + Accessing Sandbox resources - 10 minutes - + Try out our transcriptomics module - 2 hours - + Discussion and feedback - 20 minutes - -## The Sandbox concept -The Health Data Science Sandbox aims to be a training resource for bioinformaticians, data scientists, and those generally curious about how to investigate large biomedical datasets. We are an active and developing project seeking interested users (both trainees and educators). All of our open-source materials are available on our [Github page](https://github.com/hds-sandbox) and much more information is available on the rest of the website you are currently visiting! We work with both UCloud and Computerome (major Danish academic supercomputers) - see our [HPC Access page](https://hds-sandbox.github.io/access/index.html) for more info on each set up. - -## Access Sandbox resources -We currently provide training materials and resources as topical apps on UCloud, the supercomputer located at the University of Southern Denmark. To use these resources, you'll need the following: - - 1. a Danish university ID so you can sign on to UCloud via WAYF. See [this guide](https://hds-sandbox.github.io/access/UCloud.html) and/or follow along with our live demo. - 2. the ability to navigate in linux / RStudio / Jupyter. You don't need to be an expert, but it is beyond our ambitions (and course material) to teach you how to code and how to run analyses simultaneously. We recommend a basic R or Python course before diving in. - 3. our invite link to the correct UCloud project that will be shared on the day of the workshop. This way, we can provide you compute resources for the active sessions of the workshop. To use Sandbox materials outside of the workshop, you'll need to check with the local DeiC office at your university about how to request compute hours on UCloud. - -## Try out a module -So our Sandbox data scientists have finished their intro at the workshop? Great, now it's time to choose your poison *(cough)* topic of interest for today. Your options are below: - -![Genomics](../assets/images/genomics2.png){ align=left width="10%" } -### Genomics -If you're interested in NGS technologies and applications ranging from genome assembly to variant calling to metagenomics, join Sandbox Data Scientist Samuele Soraggi in testing out our Genomics Sandbox app. This app supports a semester-length course on NGS as well as a Population Genomics course run regularly at Aarhus University. Sign into UCloud and then click this [invite link](not active yet). - - -![Transcriptomics](../assets/images/transcriptomics.png){ align=left width="10%" } -### Transcriptomics -If you're interested in bulk or single cell RNA sequencing analysis and visualization, join Sandbox Data Scientist Jose Alejandro Romero Herrera (Alex) in testing out our Transcriptomics Sandbox app. This app supports regular 3-4 day workshops at University of Copenhagen and provides stand-alone visualisation tools. Sign into UCloud and then click this [invite link](not active yet). - - -![proteomics](../assets/images/proteomics.png){ align=left width="10%" } -### Proteomics -Interested in modern methods for protein structure prediction? Join Sandbox Data Scientist Jacob Fredegaard Hansen as he walks you through how to use ColabFold on UCloud. Jacob can also demo our Proteomics Sandbox, which contains a suite of proteomics analysis tools that will support a future course in clinical proteomics but is already available on UCloud for interested users. Sign into UCloud and then click this [invite link](not active yet). - - -## Discussion and feedback -We hope you enjoyed the live demo. If you have broader questions, suggestions, or concerns, now is the time to raise them! If you are totally toast for the day, remember that you can check out longer versions of our tutorials as well as other topics and tools in each of the [Sandbox modules](https://hds-sandbox.github.io/modules/index.html) or join us for a multi-day in person [course](https://hds-sandbox.github.io/news/news.html). - -As data scientists, we also would be really happy for some quantifiable info and feedback - we want to build things that the Danish health data science community is excited to use. Please answer these [5 questions](survey link) for us before you head out for the day (*link activated on day of the workshop*). - - -

Nice meeting you and we hope to see you again!

- - - - - - - - +--- +permalink: /workshop/ +hide: + - navigation + - toc + - footer +--- + +

Sandbox Workshop

+

Autumn 2023

+ +

Welcome to the homepage for in-person workshops introducing the Health Data Science Sandbox to potential users. Thanks for joining us!

+ +

Agenda

+ + The Sandbox Concept - 30 minutes + + Accessing Sandbox resources - 10 minutes + + Try out our transcriptomics module - 2 hours + + Discussion and feedback - 20 minutes + +## The Sandbox concept +The Health Data Science Sandbox aims to be a training resource for bioinformaticians, data scientists, and those generally curious about how to investigate large biomedical datasets. We are an active and developing project seeking interested users (both trainees and educators). All of our open-source materials are available on our [Github page](https://github.com/hds-sandbox) and much more information is available on the rest of the website you are currently visiting! We work with both UCloud and Computerome (major Danish academic supercomputers) - see our [HPC Access page](https://hds-sandbox.github.io/access/index.html) for more info on each set up. + +## Access Sandbox resources +We currently provide training materials and resources as topical apps on UCloud, the supercomputer located at the University of Southern Denmark. To use these resources, you'll need the following: + + 1. a Danish university ID so you can sign on to UCloud via WAYF. See [this guide](https://hds-sandbox.github.io/access/UCloud.html) and/or follow along with our live demo. + 2. the ability to navigate in linux / RStudio / Jupyter. You don't need to be an expert, but it is beyond our ambitions (and course material) to teach you how to code and how to run analyses simultaneously. We recommend a basic R or Python course before diving in. + 3. our invite link to the correct UCloud project that will be shared on the day of the workshop. This way, we can provide you compute resources for the active sessions of the workshop. To use Sandbox materials outside of the workshop, you'll need to check with the local DeiC office at your university about how to request compute hours on UCloud. + +## Try out a module +So our Sandbox data scientists have finished their intro at the workshop? Great, now it's time to choose your poison *(cough)* topic of interest for today. Your options are below: + +![Genomics](../assets/images/genomics2.png){ align=left width="10%" } +### Genomics +If you're interested in NGS technologies and applications ranging from genome assembly to variant calling to metagenomics, join Sandbox Data Scientist Samuele Soraggi in testing out our Genomics Sandbox app. This app supports a semester-length course on NGS as well as a Population Genomics course run regularly at Aarhus University. Sign into UCloud and then click this [invite link](not active yet). + + +![Transcriptomics](../assets/images/transcriptomics.png){ align=left width="10%" } +### Transcriptomics +If you're interested in bulk or single cell RNA sequencing analysis and visualization, join Sandbox Data Scientist Jose Alejandro Romero Herrera (Alex) in testing out our Transcriptomics Sandbox app. This app supports regular 3-4 day workshops at University of Copenhagen and provides stand-alone visualisation tools. Sign into UCloud and then click this [invite link](not active yet). + + +![proteomics](../assets/images/proteomics.png){ align=left width="10%" } +### Proteomics +Interested in modern methods for protein structure prediction? Join Sandbox Data Scientist Jacob Fredegaard Hansen as he walks you through how to use ColabFold on UCloud. Jacob can also demo our Proteomics Sandbox, which contains a suite of proteomics analysis tools that will support a future course in clinical proteomics but is already available on UCloud for interested users. Sign into UCloud and then click this [invite link](not active yet). + + +## Discussion and feedback +We hope you enjoyed the live demo. If you have broader questions, suggestions, or concerns, now is the time to raise them! If you are totally toast for the day, remember that you can check out longer versions of our tutorials as well as other topics and tools in each of the [Sandbox modules](https://hds-sandbox.github.io/modules/index.html) or join us for a multi-day in person [course](https://hds-sandbox.github.io/news/news.html). + +As data scientists, we also would be really happy for some quantifiable info and feedback - we want to build things that the Danish health data science community is excited to use. Please answer these [5 questions](survey link) for us before you head out for the day (*link activated on day of the workshop*). + + +

Nice meeting you and we hope to see you again!

+ + + + + + + +