Skip to content

Commit

Permalink
Merge pull request #5 from ProjectPythia/main
Browse files Browse the repository at this point in the history
[pull] main from ProjectPythia:main
  • Loading branch information
jukent authored May 14, 2024
2 parents c23ec22 + ec613c6 commit b1a1ab1
Show file tree
Hide file tree
Showing 26 changed files with 270 additions and 730 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/nightly-build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ jobs:
if: ${{ github.repository_owner == 'ProjectPythia' }}
uses: ProjectPythia/cookbook-actions/.github/workflows/build-book.yaml@main
with:
environment_name: dask-cookbook-dev
environment_name: dask-cookbook

link-check:
if: ${{ github.repository_owner == 'ProjectPythia' }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/publish-book.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
build:
uses: ProjectPythia/cookbook-actions/.github/workflows/build-book.yaml@main
with:
environment_name: dask-cookbook-dev
environment_name: dask-cookbook

deploy:
needs: build
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/trigger-book-build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,6 @@ jobs:
build:
uses: ProjectPythia/cookbook-actions/.github/workflows/build-book.yaml@main
with:
environment_name: dask-cookbook-dev
environment_name: dask-cookbook
artifact_name: book-zip-${{ github.event.number }}
# Other input options are possible, see ProjectPythia/cookbook-actions/.github/workflows/build-book.yaml
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -128,3 +128,6 @@ dmypy.json

# Pyre type checker
.pyre/

# data
/data
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
<img src="notebooks/images/NCAR_CISL_NSF_banner.jpeg" alt="NCAR CISL logo" />

# NCAR Dask Tutorial
# Dask Cookbook

[![nightly-build](https://github.com/ProjectPythia/cookbook-template/actions/workflows/nightly-build.yaml/badge.svg)](https://github.com/ProjectPythia/cookbook-template/actions/workflows/nightly-build.yaml)
[![Binder](https://mybinder.org/badge_logo.svg)](http://binder.projectpythia.org/v2/gh/ProjectPythia/dask-cookbook/main?labpath=notebooks)
Expand All @@ -25,17 +23,21 @@ The motivation behind this repository is to provide a clear and concise resource
<a href="https://github.com/benkirk/demo_containers/graphs/contributors">
<img src="https://contrib.rocks/image?repo=benkirk/demo_containers" />
</a>
<a href="https://github.com/ProjectPythia/dask-cookbook/graphs/contributors">
<img src="https://contrib.rocks/image?repo=ProjectPythia/dask-cookbook" />
</a>

## Note on Content Origin

This cookbook is part of the extensive material used in our NCAR tutorial, ["Using Dask on HPC systems"](https://github.com/NCAR/dask-tutorial.git), which was held in February 2023. The complete tutorial series also includes an in-depth exploration and practical use cases of Dask on HPC systems and best practices for Dask on HPC . For the complete set of materials, including these additional insights on Dask on HPC, please refer to the main tutorial content available [here](https://ncar.github.io/dask-tutorial/README.html).
This cookbook is derived from the extensive material used in the NCAR tutorial, ["Using Dask on HPC systems"](https://github.com/NCAR/dask-tutorial.git), which was held in February 2023. The NCAR tutorial series also includes an in-depth exploration and practical use cases of Dask on HPC systems and best practices for Dask on HPC. For the complete set of NCAR tutorial materials, including these additional insights
on Dask on HPC, please refer to the main NCAR tutorial content available [here](https://ncar.github.io/dask-tutorial/README.html).

## Structure

In the first chapter of this cookbook, we provide step-by-step tutorials on the basic concepts of Dask, including Dask arrays and Dask dataframes, which are powerful tools for parallel computing and distributed data processing. We explain the key differences between these Dask data structures and their counterparts in NumPy and Pandas.

In the second chapter of the repository, we move on to more advanced topics, such as distributed computing and Dask+Xarray integration. We provide examples of how to use Dask+Xarray to efficiently work with large, labelled multi-dimensional datasets.
Finally, we will discuss some best practices regarding Dask + Xarray.
Finally, we will discuss some best practices regarding Dask+Xarray.

## Running the Notebooks

Expand Down Expand Up @@ -78,7 +80,7 @@ If you are interested in running this material locally on your computer, you wil

```bash
conda env create -f environment.yml
conda activate dask-cookbook-example
conda activate dask-cookbook
```

1. Move into the `notebooks` directory and start up Jupyterlab
Expand Down
27 changes: 17 additions & 10 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
title: Dask Cookbook
author: Negin Sobhani, Brian Vanderwende, Deepak Cherian, and Ben Kirk
logo: notebooks/images/logos/pythia_logo-white-rtext.svg
copyright: "2023"
copyright: "2024"

execute:
# To execute notebooks via a binder instead, replace 'cache' with 'binder'
Expand All @@ -14,7 +14,7 @@ execute:

# Add a few extensions to help with parsing content
parse:
myst_enable_extensions: # default extensions to enable in the myst parser. See https://myst-parser.readthedocs.io/en/latest/using/syntax-optional.html
myst_enable_extensions: # default extensions to enable in the myst parser. See https://myst-parser.readthedocs.io/en/stable/using/syntax-optional.html
- amsmath
- colon_fence
- deflist
Expand All @@ -33,12 +33,14 @@ sphinx:
html_permalinks_icon: '<i class="fas fa-link"></i>'
html_theme_options:
home_page_in_toc: true
repository_url: https://github.com/negin513/dask-cookbook.git # Online location of your book
repository_url: https://github.com/ProjectPythia/dask-cookbook.git # Online location of your book
repository_branch: main # Which branch of the repository should be used when creating links (optional)
use_issues_button: true
use_repository_button: true
use_edit_page_button: true
google_analytics_id: G-T52X8HNYE8
use_fullscreen_button: true
analytics:
google_analytics_id: G-T52X8HNYE8
github_url: https://github.com/ProjectPythia
twitter_url: https://twitter.com/project_pythia
icon_links:
Expand All @@ -47,12 +49,14 @@ sphinx:
icon: fab fa-youtube-square
type: fontawesome
launch_buttons:
binderhub_url: http://binder.mypythia.org
binderhub_url: http://binder.projectpythia.org
notebook_interface: jupyterlab
extra_navbar: |
Theme by <a href="https://projectpythia.org">Project Pythia</a>.<br><br>
All code in Pythia Cookbooks is licensed under Apache 2.0. All other non-code content is licensed under <a href="https://creativecommons.org/licenses/by/4.0/">Creative Commons BY 4.0 (CC BY 4.0)</a>.<br><br>
logo_link: https://projectpythia.org
logo:
link: https://projectpythia.org
navbar_start:
- navbar-logo
navbar_end:
- navbar-icon-links
navbar_links:
- name: Home
url: https://projectpythia.org
Expand All @@ -65,6 +69,9 @@ sphinx:
- name: Community
url: https://projectpythia.org/index.html#join-us
footer_logos:
NCAR: notebooks/images/logos/NCAR-contemp-logo-blue.svg
NCAR: notebooks/images/logos/NSF-NCAR_Lockup-UCAR-Dark_102523.svg
Unidata: notebooks/images/logos/Unidata_logo_horizontal_1200x300.svg
UAlbany: notebooks/images/logos/UAlbany-A2-logo-purple-gold.svg
footer_start:
- footer-logos
- footer-info
6 changes: 6 additions & 0 deletions _static/custom.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
.bd-main .bd-content .bd-article-container {
max-width: 100%; /* default is 60em */
}
.bd-page-width {
max-width: 100%; /* default is 88rem */
}
12 changes: 4 additions & 8 deletions environment.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
name: dask-cookbook-dev
name: dask-cookbook
channels:
- conda-forge
- nodefaults

dependencies:
- jupyter-book
- jupyterlab>=3
- jupyterlab >=3
- jupyter_server
- cfgrib
- cftime
Expand All @@ -22,13 +22,9 @@ dependencies:
- netcdf4
- nodejs
- pandas
- pip
- pre-commit
- pydap
- python-graphviz
- python=3.9
- scipy
- xarray>=2022.3.0
- pip
- pip:
- sphinx-pythia-theme
- sphinx-pythia-theme
- xarray
44 changes: 12 additions & 32 deletions notebooks/00-dask-overview.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,33 +2,18 @@
"cells": [
{
"cell_type": "markdown",
"id": "4db5d14d-ee57-4791-9ab7-58cb2ff2cc3b",
"id": "a1ccac1c-5d4e-47dc-9d5e-4f417126df94",
"metadata": {},
"source": [
"<img src=\"https://raw.githubusercontent.com/NCAR/dask-tutorial/main/images/NCAR-contemp-logo-blue.png\"\n",
" width=\"750px\"\n",
" alt=\"NCAR logo\"\n",
" style=\"vertical-align:middle;margin:30px 0px\"/>\n",
"\n",
"<img src=\"https://docs.dask.org/en/stable/_images/dask_horizontal.svg\"\n",
" width=\"30%\"\n",
" alt=\"Dask logo\"\n",
" align=\"right\"\n",
"/>\n",
"\n",
"# Dask Overview\n",
"\n",
"**ESDS Dask Tutorial | 06 February, 2023** \n",
"\n",
"Negin Sobhani, Brian Vanderwende, Deepak Cherian, Ben Kirk \n",
"Computational & Information Systems Lab (CISL) \n",
"[negins@ucar.edu](mailto:negins@ucar.edu), [vanderwb@ucar.edu](mailto:vanderwb@ucar.edu)\n",
"\n",
"\n",
"---------"
]
},
{
"cell_type": "markdown",
"id": "5b6211a0-3762-41a2-8a45-6b19ce32f658",
"metadata": {},
"source": [
"**In this tutorial, you learn:**\n",
"### In this tutorial, you learn:\n",
"\n",
"* What is Dask?\n",
"* Why Dask in Geosciences?\n",
Expand All @@ -54,11 +39,6 @@
"\n",
"## What is Dask?\n",
"\n",
"<img src=\"https://raw.githubusercontent.com/NCAR/dask-tutorial/main/images/dask_horizontal.svg\"\n",
" width=\"500px\"\n",
" alt=\"NCAR logo\"\n",
" style=\"vertical-align:middle;margin:30px 0px\"/>\n",
"\n",
"* Dask is an open-source Python library for parallel and distributed computing that scales the existing Python ecosystem.\n",
"\n",
"* Dask was developed to scale Python packages such as Numpy, Pandas, and Xarray to multi-core machines and distributed clusters when datasets exceed memory.\n",
Expand Down Expand Up @@ -109,7 +89,7 @@
"\n",
"</ul>\n",
"And keep in mind - all of the above steps improve your code whether you end up using Dask or not!</br></br>\n",
"<img src=\"https://raw.githubusercontent.com/NCAR/dask-tutorial/main/images/dask_twitter.png\"\n",
"<img src=\"https://raw.githubusercontent.com/ProjectPythia/dask-cookbook/main/notebooks/images/dask_twitter.png\"\n",
" width=\"500px\"/>\n",
"\n",
"</div>\n",
Expand Down Expand Up @@ -166,11 +146,11 @@
"\n",
"These are very powerfull tools, but it is easy to write something using a delayed function that could be executed faster and more simply using a high-level collection \n",
"\n",
"<img src=\"https://raw.githubusercontent.com/NCAR/dask-tutorial/main/images/high_vs_low_level_coll_analogy.png\"\n",
"<img src=\"https://raw.githubusercontent.com/ProjectPythia/dask-cookbook/main/notebooks/images/high_vs_low_level_coll_analogy.png\"\n",
" width=\"83%\"\n",
" alt=\"Dask Collections\"/>\n",
" \n",
"*Image credit: Anaconda, Inc. and contributors*\n",
"*Image credit: Dask Contributors*\n",
"\n",
"\n",
"### 2. Dynamic Task Scheduling\n",
Expand All @@ -190,7 +170,7 @@
" width=\"75%\"\n",
" alt=\"Dask Distributed Cluster\"/>\n",
" \n",
"*Image credit: Anaconda, Inc. and contributors*\n",
"*Image credit: Dask Contributors*\n",
"\n",
"\n",
"We will learn more about Dask Collections and Dynamic Task Scheduling in the next tutorials."
Expand Down Expand Up @@ -231,7 +211,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
"version": "3.9.18"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
Expand Down
28 changes: 10 additions & 18 deletions notebooks/01-dask-array.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,21 +7,13 @@
"tags": []
},
"source": [
"<img src=\"https://raw.githubusercontent.com/NCAR/dask-tutorial/main/images/NCAR-contemp-logo-blue.png\"\n",
" width=\"750px\"\n",
" alt=\"NCAR logo\"\n",
" style=\"vertical-align:middle;margin:30px 0px\"/>\n",
"<img src=\"https://docs.dask.org/en/stable/_images/dask_horizontal.svg\"\n",
" width=\"30%\"\n",
" alt=\"Dask logo\"\n",
" align=\"right\"\n",
"/>\n",
"\n",
"# Dask Arrays\n",
"\n",
"**ESDS Dask Tutorial | 06 February, 2023** \n",
"\n",
"Negin Sobhani, Brian Vanderwende, Deepak Cherian, Ben Kirk \n",
"Computational & Information Systems Lab (CISL) \n",
"[negins@ucar.edu](mailto:negins@ucar.edu), [vanderwb@ucar.edu](mailto:vanderwb@ucar.edu)\n",
"\n",
"\n",
"---------\n",
"# Dask Array\n",
"\n",
"### In this tutorial, you learn:\n",
"\n",
Expand All @@ -31,8 +23,8 @@
"\n",
"**Related Dask Array Documentation**\n",
"\n",
"* [Dask Array documentation](https://docs.dask.org/en/latest/array.html)\n",
"* [Dask Array API](https://docs.dask.org/en/latest/array-api.html)\n",
"* [Dask Array documentation](https://docs.dask.org/en/stable/array.html)\n",
"* [Dask Array API](https://docs.dask.org/en/stable/array-api.html)\n",
"* [Dask Array examples](https://examples.dask.org/array.html)\n",
"\n",
"\n",
Expand All @@ -41,7 +33,7 @@
"\n",
"<img src=\"https://docs.dask.org/en/stable/_images/dask-array.svg\" width=\"500px\" style=\"horizontal-align:middle\"/>\n",
"\n",
"*Image credit: Anaconda, Inc. and contributors*\n",
"*Image credit: Dask Contributors*\n",
"\n",
"Dask Array can be used as a drop-in replacement for NumPy arrays, with a similar API and support for a subset of NumPy functions. \n",
"\n",
Expand Down Expand Up @@ -1114,7 +1106,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
"version": "3.9.18"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
Expand Down
Loading

0 comments on commit b1a1ab1

Please sign in to comment.