Skip to content

Commit

Permalink
Merge pull request #88 from nesi/episode_reorder
Browse files Browse the repository at this point in the history
Discussed changes
  • Loading branch information
MattBixley authored Sep 12, 2023
2 parents ddd74f3 + 9a0de65 commit 42b1e88
Show file tree
Hide file tree
Showing 3 changed files with 58 additions and 62 deletions.
12 changes: 6 additions & 6 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,14 +49,14 @@ sched:

episode_order:
- 01-cluster
# - 02-filedir
- 04-modules
- 02-filedir
- 03-break1
- 04-moduless
- 05-scheduler
# - 06-lunch
- 064-parallel
- 08-break2
- 06-lunch
- 07-resources
- 08-break2
- 064-parallel
- 09-scaling
# - 095-writing-good-code

Expand Down Expand Up @@ -91,7 +91,7 @@ kind: "lesson"
# Workshop working directory.
working_dir:
- /nesi/project/nesi99991
- resbaz23
- introhpc2309

# Start time in minutes (0 to be clock-independent, 540 to show a start at 09:00 am).
# 600 is 10am
Expand Down
56 changes: 1 addition & 55 deletions _episodes/064-parallel.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ Number of CPUs to use is specified by the Slurm option `--cpus-per-task`.

{% include {{ site.snippets }}/parallel/smp-example.snip %}


### Distributed-Memory (MPI)

Message Passing Interface (MPI) is a communication standard for distributed-memory multiproccessing.
Expand Down Expand Up @@ -72,7 +71,6 @@ A job array can be specified using `--array`

If you are writing your own code, then this is something you will probably have to specify yourself.


{% include {{ site.snippets }}/parallel/array-example.snip %}

we can also compare how these jobs look by checking.
Expand Down Expand Up @@ -126,61 +124,9 @@ However, unless that function is where the majority of time is spent, this is un

{%- comment -%} (matlab, numpy?) {%- endcomment -%}

{%- comment -%}
Python [Multiproccessing](https://docs.python.org/3/library/multiprocessing.html)
MATLAB [Parpool](https://au.mathworks.com/help/parallel-computing/parpool.html) {%- endcomment -%}
MATLAB [Parpool](https://au.mathworks.com/help/parallel-computing/parpool.html)

Shared memory parallelism is what is used in our example script `array_sum.r`.

## Scaling Test

Last time we submitted a job, we did not specify a number of CPUs, and therefore got the default of `2` (1 'core').

As a reminder, our slurm script `example-job.sl` should currently look like this.

```
{% include example_scripts/example-job.sl.1 %}
```
{: .language-bash}

Using the information we collected from the previous job (`nn_seff <job-id>`), we will submit the same job again with more CPUs and our best estimates of required resources.
We ask for more CPUs using by adding `#SBATCH --cpus-per-task 4` to our script.

Your script should now look like this:

```
{% include example_scripts/example-job.sl.2 %}
```
{: .language-bash}

And then submit using `sbatch` as we did before.

> ## acctg-freq
>
> We will also add the argument `--acctg-freq 1`.
> By default SLURM records job data every 30 seconds. This means any job running for less than 30
> seconds will not have it's memory use recorded.
> This is the same as specifying `#SBATCH --acctg-freq 1` inside the script.
{: .callout}

```
{{ site.remote.prompt }} sbatch --acctg-freq 1 example-job.sl
```
{: .language-bash}

{% include {{ site.snippets }}/scheduler/basic-job-script.snip %}

> ## Watch
>
> We can prepend any command with `watch` in order to periodically (default 2 seconds) run a command. e.g. `watch
> squeue --me` will give us up to date information on our running jobs.
> Care should be used when using `watch` as repeatedly running a command can have adverse effects.
{: .callout}

Checking on our job with `sacct`.
Oh no!
{% include {{ site.snippets }}/scaling/OOM.snip %}

{: .language-bash}

{% include links.md %}
52 changes: 51 additions & 1 deletion _episodes/07-resources.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,57 @@ keypoints:
<!--
- scaling testing involves running jobs with increasing resources and measuring the efficiency in order to establish a pattern informed decisions about future job submissions.-->

In previous episodes we covered *how* to request resources, but what you may not know is *what* resources you need to request. The solution to this problem is testing!
## What Resources?

Last time we submitted a job, we did not specify a number of CPUs, and therefore got the default of `2` (1 'core').

As a reminder, our slurm script `example-job.sl` should currently look like this.

```
{% include example_scripts/example-job.sl.1 %}
```
{: .language-bash}

Using the information we collected from the previous job (`nn_seff <job-id>`), we will submit the same job again with more CPUs and our best estimates of required resources.
We ask for more CPUs using by adding `#SBATCH --cpus-per-task 4` to our script.

Your script should now look like this:

```
{% include example_scripts/example-job.sl.2 %}
```
{: .language-bash}

And then submit using `sbatch` as we did before.

> ## acctg-freq
>
> We will also add the argument `--acctg-freq 1`.
> By default SLURM records job data every 30 seconds. This means any job running for less than 30
> seconds will not have it's memory use recorded.
> This is the same as specifying `#SBATCH --acctg-freq 1` inside the script.
{: .callout}

```
{{ site.remote.prompt }} sbatch --acctg-freq 1 example-job.sl
```
{: .language-bash}

{% include {{ site.snippets }}/scheduler/basic-job-script.snip %}

> ## Watch
>
> We can prepend any command with `watch` in order to periodically (default 2 seconds) run a command. e.g. `watch
> squeue --me` will give us up to date information on our running jobs.
> Care should be used when using `watch` as repeatedly running a command can have adverse effects.
{: .callout}

Checking on our job with `sacct`.
Oh no!
{% include {{ site.snippets }}/scaling/OOM.snip %}

{: .language-bash}

Understanding the resources you have available and how to use them most efficiently is a vital skill in high performance computing.

Below is a table of common resources and issues you may face if you do not request the correct amount.
Expand Down

0 comments on commit 42b1e88

Please sign in to comment.