Skip to content

Commit

Permalink
fix up HPC page etc (#403)
Browse files Browse the repository at this point in the history
* Arranging docs - move filesystem stuff to HPC page
  • Loading branch information
philipmac authored Dec 8, 2023
1 parent badd379 commit 676c437
Show file tree
Hide file tree
Showing 3 changed files with 72 additions and 41 deletions.
79 changes: 72 additions & 7 deletions docs/source/hpc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,27 @@
RML HPC (Big Sky)
==================

*******************
Python environments
*******************

**NOTE, THIS IS ONLY relevant for HPC. Added for completeness.**

**NOTE: generate_vevn.sh is not used as of 08/11/2023, setup is documented in** :doc:`hpc`.

Workflows are currently run on RML HPC ("BigSky").

There are three environments currently on BigSky: (`dev`, `qa`, `prod`).
They were set up as follows:
(Note, this first step is only required once, and only to work around old versions of Python.)


Initial Setup:
--------------

To initialise, typically done only at start per env. Use miniconda to set up venv, clone this repo, and pip install to the newly created venv.

.. code-block::
.. code-block:: sh
# get miniconda dist
wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.3.1-0-Linux-x86_64.sh
Expand Down Expand Up @@ -44,7 +58,7 @@ E.g. for qa ``helper_scripts/hedwig_listener_qa.service`` would be copied into p

It's a good idea to test the ``ExecStart`` command can be run, eg:

``image_portal_workflows/helper_scripts/hedwig_reg_listen.sh listen``
``image_portal_workflows/helper_scripts/hedwig_listener_dev.service.sh``

The daemon by default polls prefect server in the given workpool and brings in the flow run details if something
has been submitted to the server.
Expand All @@ -59,16 +73,45 @@ Upon promotion into HPC env do:
cd image_portal_workflows/
git fetch
git checkout <label>
python -m pip install -e .
pip install -e . -r requirements.txt --upgrade
When there are changes in the workflows (e.g, a new task is added, task function signature has changed, etc), you should
redeploy the workflows. It can be done as follows:
redeploy the workflows. See Prefect section for details.

.. code-block::
*********************
NFS Filesystem layout
*********************

Pipeline inputs and outputs are housed on an Admin User accessable NFS, meaning that the Admin level users of the system can see directly into the same partitions which we read our inputs and ultimately place our outputs. Note, we do not do "work" in this Filesystem, only read in initial inputs, and write out selected outputs.
Different projects have different filesystem layouts.

We never write to the inputs (``Projects``) directory, only to outputs (``Assets``).
For example:

A workflow input_dir would be provided as:
``/RTB/darrellh/nguyenm8-2022-0920-test/SEM-2022-0922-Neta_2D_Test/DM4_sample/``

Which is used to define an input directory, note the Projects substr.
Inputs are provided via the parameter input_dir.
``/mnt/ai-fas12/RMLEMHedwigDev/Projects/RTB/darrellh/nguyenm8-2022-0920-test/SEM-2022-0922-Neta_2D_Test/DM4_sample/``

And this would be the corresponding output directory.
``/mnt/ai-fas12/RMLEMHedwigDev/Assets/RTB/darrellh/nguyenm8-2022-0920-test/SEM-2022-0922-Neta_2D_Test/DM4_sample/``

cd ~/image_portal_workflows
./helper_scripts/hedwig_reg_listen.sh register

On HPC, each environment (dev,qa,prod) has its own mount point. These are:
/mnt/ai-fas12/RMLEMHedwigDev/
/mnt/ai-fas12/RMLEMHedwigQA/
/mnt/ai-fas12/RMLEMHedwigProd/


Note, the partition is not mentioned, and by implication we know this is a Project.
So to create the input dir, we prepend the <name_of_partition>/Projects/<input_dir_param>
The above logic is done in utils.get_input_dir()

To find out what we are expected to work on we list this directory, see utils.list_files. Note, FIBSEM uses dirs to define stacks. We filter inputs with file extensions.

The input directory is listed, and a temp working directory is created for each input (a file for the 2D and BRT pipelines, or a directory containing a stack of tiffs for FIBSEM).

Spatialomics file layout.
-------------------------
Expand All @@ -78,3 +121,25 @@ Normally the dir structure is : ``$lab/$pi/$project/$session/$sample``
For Spatialomics this is not the case, the $sample is not really a sample, it's grouping of ROIs, PreROIs, etc from each of the slides.

More details can be found in :ref:`ref-workflow-spatial-omics`.


Working directory / temporary dir.
----------------------------------

The ``Projects`` directory above is relatively slow. There is a faster partition, which we use for a temporary working directory. For each input (eg file), I create one temporary directory. All work occurs in this directory. Upon the conclusion of the workflow, the contents of this directory are copied into the Assets directory (see Inputs/Outputs.
A list of Objects of class FilePath are used to these inputs.


*****
Spack
*****

- Note, although unused above, BigSky also has Spack available.

.. code-block:: sh
$ source /gs1/apps/user/rmlspack/share/spack/setup-env.sh
$ spack load -r python@3.8.6/eg2vaag
$ python -V
Python 3.8.6
$ spack unload -a
30 changes: 0 additions & 30 deletions docs/source/workflows/brt.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,36 +9,6 @@ A number of parameters can be passed to the "BRT" workflow. There are two types:
2) params that are used elsewhere within the worflow, providing metadata etc, eg input_dir.


Inputs/Outputs
--------------

There are inputs and outputs. We never write to the inputs directory, only to outputs.
The output dir is defined as input directory with s/Projects/Projects/. For example:

This would be a dev input directory, note the Projects substr.
Inputs are provided via the parameter input_dir.
/mnt/ai-fas12/RMLEMHedwigDev/Projects/RTB/darrellh/nguyenm8-2022-0920-test/SEM-2022-0922-Neta_2D_Test/DM4_sample/
And this would be the corresponding output directory.
/mnt/ai-fas12/RMLEMHedwigDev/Assets/RTB/darrellh/nguyenm8-2022-0920-test/SEM-2022-0922-Neta_2D_Test/DM4_sample/

For the above example, the workflow input_dir would be provided as:
/RTB/darrellh/nguyenm8-2022-0920-test/SEM-2022-0922-Neta_2D_Test/DM4_sample/

Note, the partition is not mentioned, and by implication we know this is a Project.
So to create the input dir, we prepend the <name_of_partition>/Projects/<input_dir_param>
The above logic is done in utils.get_input_dir()

To find out what we are expected to work on we list this directory, see utils.list_files. Note, FIBSEM uses dirs to define stacks. We filter inputs with file extensions.

The input directory is listed, and a temp working directory is created for each input (a file for the 2D and BRT pipelines, or a directory containing a stack of tiffs for FIBSEM).


Working directory / temporary dir.
----------------------------------

For each input (eg file), I create one temporary directory. All work occurs in this directory. Upon the conclusion of the workflow, the contents of this directory are copied into the Assets directory (see Inputs/Outputs.
A list of Objects of class FilePath are used to these inputs.

Note: objects passed between Prefect Tasks (eg FilePath objects), must be considered immutable. Updates to state made in one task will be lost and not available to the next task. Create a new object.
The map function is used extensively, and preserves order.

Expand Down
4 changes: 0 additions & 4 deletions docs/source/workflows/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,6 @@
Workflows
=========

Within HPC, each environment (dev,qa,prod) has its own mount point. These are:
/mnt/ai-fas12/RMLEMHedwigDev/
/mnt/ai-fas12/RMLEMHedwigQA/
/mnt/ai-fas12/RMLEMHedwigProd/

.. include:: brt.rst

Expand Down

0 comments on commit 676c437

Please sign in to comment.