Skip to content

Commit

Permalink
Added some figures, re-ordered some things.
Browse files Browse the repository at this point in the history
  • Loading branch information
CallumWalley committed Sep 19, 2023
1 parent b553ba4 commit b0a7ccc
Show file tree
Hide file tree
Showing 5 changed files with 595 additions and 96 deletions.
231 changes: 135 additions & 96 deletions _episodes/07-resources.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
---
title: "Using resources effectively"
#teaching: 25
teaching: 20
exercises: 5
exercises: 15
questions:
- "How can I review past jobs?"
- "How can I use this knowledge to create a more accurate submission script?"
Expand All @@ -28,8 +27,7 @@ As a reminder, our slurm script `example-job.sl` should currently look like this
```
{: .language-bash}

Using the information we collected from the previous job (`nn_seff <job-id>`), we will submit the same job again with more CPUs and our best estimates of required resources.
We ask for more CPUs using by adding `#SBATCH --cpus-per-task 4` to our script.
In order to request more CPUs we can add the line `#SBATCH --cpus-per-task 4` to our script.

Your script should now look like this:

Expand All @@ -40,28 +38,23 @@ Your script should now look like this:

And then submit using `sbatch` as we did before.

> ## acctg-freq
>
> We will also add the argument `--acctg-freq 1`.
> By default SLURM records job data every 30 seconds. This means any job running for less than 30
> seconds will not have it's memory use recorded.
> This is the same as specifying `#SBATCH --acctg-freq 1` inside the script.
{: .callout}

```
{{ site.remote.prompt }} sbatch --acctg-freq 1 example-job.sl
{{ site.remote.prompt }} sbatch example-job.sl
```
{: .language-bash}

{% include {{ site.snippets }}/scheduler/basic-job-script.snip %}

> ## Watch
>
> We can prepend any command with `watch` in order to periodically (default 2 seconds) run a command. e.g. `watch
> squeue --me` will give us up to date information on our running jobs.
> Care should be used when using `watch` as repeatedly running a command can have adverse effects.
> We can prepend any command with `watch` in order to periodically (default 2 seconds) run a command. e.g. `watch
> squeue --me` will give us up to date information on our running jobs.
> Care should be used when using `watch` as repeatedly running a command can have adverse effects.
> Exit `watch` with <kbd>ctrl</kbd> + <kbd>c</kbd>.
{: .callout}

Note in squeue, the number under cpus, should be '4'.

Checking on our job with `sacct`.
Oh no!
{% include {{ site.snippets }}/scaling/OOM.snip %}
Expand Down Expand Up @@ -104,60 +97,6 @@ Below is a table of common resources and issues you may face if you do not reque
</tbody>
</table>

## Estimating Required Resources

How do we know what resources to ask for in our scripts? In general, unless the software
documentation or user testimonials provide some idea, we won't know how much
memory or compute time a program will need.

> ## Read the Documentation
>
> NeSI maintains documentation that does have some guidance on using resources for some software
> However, as you noticed in the Modules lessons, we have a lot of software. So it is also advised to search
> the web for others that may have written up guidance for getting the most out of your specific software.
{: .callout}

## Running Test Jobs

As you may have to run this a few times you want to spend as little time waiting as possible.
A test job should not run for more than 15mins. This could involve using a smaller input, coarser parameters or using a subset of the calculations.
As well as being quick to run, you want your test job to be quick to start (e.g. get through queue quickly), the best way to ensure this is keep the resources requested (memory, CPUs, time) small.
Similar as possible to actual jobs e.g. same functions etc.
Use same workflow. (most issues are caused by small issues, typos, missing files etc, your test job is a jood chance to sort out these issues.).
Make sure outputs are going somewhere you can see them.

> ## Serial Test
>
> Often a good first test to run, is to execute your job *serially* e.g. using only 1 CPU.
> This not only saves you time by being fast to start, but serial jobs can often be easier to debug.
> If you confirm your job works in its most simple state you can identify problems caused by
> paralellistaion much more easily.
{: .callout}

You generally should ask for 20% to 30% more time and memory than you think the job will use.
Testing allows you to become more more precise with your resource requests. We will cover a bit more on running tests in the last lesson.

## Efficient way to run tests jobs using debug QOS (Quality of Service)

Before submitting a large job, first submit one as a test to make
sure everything works as expected. Often, users discover typos in their submit
scripts, incorrect module names or possibly an incorrect pathname after their job
has queued for many hours. Be aware that your job is not fully scanned for
correctness when you submit the job. While you may get an immediate error if your
SBATCH directives are malformed, it is not until the job starts to run that the
interpreter starts to process the batch script.

NeSI has an easy way for you to test your job submission. One can employ the debug
QOS to get a short, high priority test job. Debug jobs have to run within 15
minutes and cannot use more that 2 nodes. To use debug QOS, add or change the
following in your batch submit script
`#SBATCH --qos=debug`
`#SBATCH --time=15:00`

Adding these SBATCH directives will provide your job with the highest priority
possible, meaning it should start to run within a few minutes, provided
your resource request is not too large.

## Measuring Resource Usage of a Finished Job

If we check the status of our finished job using the `sacct` command we learned earlier.
Expand All @@ -173,18 +112,36 @@ If we check the status of our finished job using the `sacct` command we learned

With this information, we may determine a couple of things.

Memory efficiency can be determined by comparing **ReqMem** (requested memory) with **MaxRSS** (maximum used memory), MaxRSS is given in KB, so a unit conversion is usually required.
Memory efficiency can be determined by comparing <strong style="color:#66cdaa">ReqMem</strong> (requested memory) with <strong style="color:#00e400">MaxRSS</strong> (maximum used memory), MaxRSS is given in KB, so a unit conversion is usually required.

{% include figure.html url="" max-width="75%" caption=""
file="/fig/mem_eff.svg"
alt="Memory Efficiency Formula" %}

So for the above example we see that **0.1GB** (102048K) of our requested **1GB** meaning the memory efficincy was about 10%.
So for the above example we see that <strong style="color:#00e400">0.1GB</strong> (102048K) of our requested <strong style="color:#66cdaa">1GB</strong> meaning the memory efficincy was about <strong>10%</strong>.

CPU efficiency can be determined by comparing **TotalCPU** (CPU time), with the maximum possible CPU time. The maximum possible CPU time equal to **Alloc** (number of allocated CPUs) multiplied by **Elapsed** (Walltime, actual time passed).
CPU efficiency can be determined by comparing <strong style="color:#ff8c00">TotalCPU</strong>(CPU time), with the maximum possible CPU time. The maximum possible CPU time equal to <strong style="color:#ff1493">Alloc</strong> (number of allocated CPUs) multiplied by <strong style="color:#0000ff">Elapsed</strong> (Walltime, actual time passed).

For the above example **33 seconds** of computation was done where the maximum possible computation time was **96 seconds** (2 CPUs multiplied by 00:00:48), meaning the CPU efficiency was about 35%.
{% include figure.html url="" max-width="75%" caption=""
file="/fig/cpu_eff.svg"
alt="CPU Efficiency Formula" %}

For the above example <strong style="color:#ff8c00">33 seconds</strong> of computation was done,

where the maximum possible computation time was **96 seconds** (<strong style="color:#ff1493">2 CPUs</strong> multiplied by <strong style="color:#0000ff">48 seconds</strong>), meaning the CPU efficiency was about <strong>35%</strong>.

Time Efficiency is simply the <strong style="color:#0000ff">Elapsed Time</strong> divided by <strong style="color:#1ebfff">Time Requested</strong>.

{% include figure.html url="" max-width="75%" caption=""
file="/fig/time_eff.svg"
alt="Time Efficiency Formula" %}

<strong style="color:#0000ff">48 seconcds</strong> out of <strong style="color:#1ebfff">15 minutes</strong> requested give a time efficiency of about <strong>5%</strong>

> ## Efficiency Exercise
>
> Calculate for the job shown below,
>
>
> ```
> JobID JobName Alloc Elapsed TotalCPU ReqMem MaxRSS State
> --------------- ---------------- ----- ----------- ------------ ------- -------- ----------
Expand All @@ -205,7 +162,6 @@ For the above example **33 seconds** of computation was done where the maximum p
> {: .solution}
{: .challenge}


For convenience, NeSI has provided the command `nn_seff <jobid>` to calculate **S**lurm **Eff**iciency (all NeSI commands start with `nn_`, for **N**eSI **N**IWA).
```
{{ site.remote.prompt }} nn_seff <jobid>
Expand All @@ -214,24 +170,64 @@ For convenience, NeSI has provided the command `nn_seff <jobid>` to calculate **

{% include {{ site.snippets }}/resources/seff.snip %}

If you were to submit this same job again what resources would you request?
Knowing what we do now about job efficiency, lets submit the previous job again but with more appropriate resources.

{% include example_scripts/example-job.sl.2 %}

```
{{ site.remote.prompt }} sbatch example-job.sl
```
{: .language-bash}

Hopefully we will have better luck with this one!

## Measuring the System Load From Currently Running Tasks

On Mahuika, we allow users to connect directly to compute nodes from the
login node. This is useful to check on a running job and see how it's doing, however, we
only allow you to connect to nodes on which you have running jobs.
only allow you to connect to nodes on which you have running jobs.

The most reliable way to check current system stats is with `htop`.
`htop` is an interactive process viewer that can be launched from command line.

### Finding job node

Before we can check on our job, we need to find out where it is running.
We can do this with the command `squeue --me`, and looking under the 'NODELIST' column.

```
{{ site.remote.prompt }} squeue --me
```
{: .language-bash}

### Monitor System Processes With `htop`
{% include {{ site.snippets }}/resources/get-job-node.snip %}

The most reliable way to check current system stats is with `htop`. Some sample
output might look like the following (type `q` to exit `htop`):
Now that we know the location of the job (wbn189) we can use SSH to run htop there.

```
{{ site.remote.prompt }} htop -u <yourUsername>
{{ site.remote.prompt }} ssh wbn189 -t htop -u $USER
```
{: .language-bash}

You may get a message:

```
ECDSA key fingerprint is SHA256:Se1WKeayCfi3lAxDzS7fBlS83kBaBEvBgxHoAz2HVkM.
ECDSA key fingerprint is MD5:9d:03:fc:43:07:ac:ac:9b:78:85:45:52:ac:7a:ed:cd.
Are you sure you want to continue connecting (yes/no)?
```
{: .language-bash}

If so, type `yes` and <kbd>Enter</kbd>

You may also need to enter your cluster password.

If you cannot connect, it may be that the job has finished and you have lost permission to `ssh` to that node.

### Reading Htop

You may see something like this,

{% include {{ site.snippets }}/resources/monitor-processes-top.snip %}

Overview of the most important fields:
Expand All @@ -247,34 +243,77 @@ Overview of the most important fields:
accumulate time at twice the normal rate.
* `COMMAND`: What command was used to launch a process?

Running this command as is will show us information on tasks running on the login nod (where we should not be running resource intensive jobs anyway), in order to get information on a running job we will need to run htop on a compute node.
To exit press <kbd>q</kbd>.

#### Finding job node
Running this command as is will show us information on tasks running on the login node (where we should not be running resource intensive jobs anyway).

Running the command `sacct` we can see where our currently located jobs are located.
## Running Test Jobs

```
{{ site.remote.prompt }} squeue --me
```
{: .language-bash}
As you may have to run several iterations before you get it right, you should choose your test job carefully.
A test job should not run for more than 15 mins. This could involve using a smaller input, coarser parameters or using a subset of the calculations.
As well as being quick to run, you want your test job to be quick to start (e.g. get through queue quickly), the best way to ensure this is keep the resources requested (memory, CPUs, time) small.
Similar as possible to actual jobs e.g. same functions etc.
Use same workflow. (most issues are caused by small issues, typos, missing files etc, your test job is a jood chance to sort out these issues.).
Make sure outputs are going somewhere you can see them.

> ## Serial Test
>
> Often a good first test to run, is to execute your job *serially* e.g. using only 1 CPU.
> This not only saves you time by being fast to start, but serial jobs can often be easier to debug.
> If you confirm your job works in its most simple state you can identify problems caused by
> paralellistaion much more easily.
{: .callout}

{% include {{ site.snippets }}/resources/get-job-node.snip %}
You generally should ask for 20% to 30% more time and memory than you think the job will use.
Testing allows you to become more more precise with your resource requests. We will cover a bit more on running tests in the last lesson.

Now that we know the location of the job (wbn189) we can use SSH to run htop there.
> ## Efficient way to run tests jobs using debug QOS (Quality of Service)
>
> Before submitting a large job, first submit one as a test to make
> sure everything works as expected. Often, users discover typos in their submit
> scripts, incorrect module names or possibly an incorrect pathname after their job
> has queued for many hours. Be aware that your job is not fully scanned for
> correctness when you submit the job. While you may get an immediate error if your
> SBATCH directives are malformed, it is not until the job starts to run that the
> interpreter starts to process the batch script.
>
> NeSI has an easy way for you to test your job submission. One can employ the debug
> QOS to get a short, high priority test job. Debug jobs have to run within 15
> minutes and cannot use more that 2 nodes. To use debug QOS, add or change the
> following in your batch submit script
>
>```
>#SBATCH --qos=debug
>#SBATCH --time=15:00
> ```
>{: .language-bash}
>
> Adding these SBATCH directives will provide your job with the highest priority
> possible, meaning it should start to run within a few minutes, provided
> your resource request is not too large.
{: .callout}

```
{{ site.remote.prompt }} ssh wbn189 -t htop -u $USER
```
{: .language-bash}
## Initial Resource Requirements

As we have just discussed, the best and most reliable method of determining resource requirements is from testing,
but before we run our first test there are a couple of things you can do to start yourself off in the right area.

### Read the Documentation

NeSI maintains documentation that does have some guidance on using resources for some software
However, as you noticed in the Modules lessons, we have a lot of software. So it is also advised to search
the web for others that may have written up guidance for getting the most out of your specific software.

### Ask Other Users

If you know someone who has used the software before, they may be able to give you a ballpark figure.

<!-- Now that you know the efficiency of your small test job what next? Throw 100 more CPUs at the problem for 100x speedup? -->

> ## Next Steps
>
> You can use this knowledge to set up the
> next job with a closer estimate of its load on the system.
> next job with a closer estimate of its load on the system.
> A good general rule
> is to ask the scheduler for **30%** more time and memory than you expect the
> job to need.
Expand Down
11 changes: 11 additions & 0 deletions _includes/example_scripts/example-job.sl.3
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash -e

#SBATCH --job-name my_job
#SBATCH --account nesi99991
#SBATCH --mem 600M
#SBATCH --time 00:10:00
#SBATCH --cpus-per-task 4

module load R/4.3.1-gimkl-2022a
Rscript array_sum.r
echo "Done!"
Loading

0 comments on commit b0a7ccc

Please sign in to comment.