Skip to content

Commit

Permalink
Merge pull request #3 from ukaea/support-csd3
Browse files Browse the repository at this point in the history
Add support for CSD3
  • Loading branch information
jameshod5 authored Aug 19, 2024
2 parents 5c3821d + b55d40b commit 8d514d4
Show file tree
Hide file tree
Showing 14 changed files with 95 additions and 133,242 deletions.
45 changes: 41 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
### FAIR MAST Data Ingestion

## Installation on CSD3
## Running on CSD3
### Installation on CSD3

After logging into your CSD3 account (on Icelake node), first load the correct Python module:

Expand All @@ -25,7 +26,7 @@ source fair-mast-ingestion/bin/activate
Update pip and install required packages:

```sh
python -m pip install --U pip
python -m pip install -U pip
python -m pip install -e .
```

Expand All @@ -40,12 +41,47 @@ Edit `uda/python/setup.py` and change the "version" to 1.3.9.

```sh
python -m pip install uda/python
cd ..
source ~/rds/rds-ukaea-mast-sPGbyCAPsJI/uda-ssl.sh
```

#### S3 Support (Optional)

Finally, for uploading to S3 we need to install `s5cmd` and make sure it is on the path:

```sh
wget https://github.com/peak/s5cmd/releases/download/v2.2.2/s5cmd_2.2.2_Linux-64bit.tar.gz
tar -xvzf s5cmd_2.2.2_Linux-64bit.tar.gz
PATH=$PWD:$PATH
```

And add a config file for the bucket keys, by creating a file called `.s5cfg.stfc`:

```
[default]
aws_access_key_id=<access-key>
aws_secret_access_key=<secret-key>
```

You should now be able to run the following commands.

## Local Ingestion
### Submitting runs on CSD3

1. First submit a job to collect all the metadata:

```sh
sbatch ./jobs/metadata.csd3.slurm.sh
```

2. Then submit an ingestion job

```sh
sbatch ./jobs/ingest.csd3.slurm.sh campaign_shots/tiny_campaign.csv s3://mast/test/shots/ amc
```

## Manually Running Ingestor

### Local Ingestion

The following section details how to ingest data into a local folder on freia with UDA.

Expand All @@ -61,7 +97,7 @@ mpirun -np 16 python3 -m src.main data/local campaign_shots/tiny_campaign.csv --

Files will be output in the NetCDF format to `data/local`.

## Ingestion to S3
### Ingestion to S3

The following section details how to ingest data into the s3 storage on freia with UDA.

Expand All @@ -80,3 +116,4 @@ mpirun -np 16 python3 -m src.main data/local campaign_shots/tiny_campaign.csv --
```

This will submit a job to the freia job queue that will ingest all of the shots in the tiny campaign and push them to the s3 bucket.

23 changes: 23 additions & 0 deletions jobs/ingest.csd3.slurm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/bin/bash
#SBATCH -A UKAEA-AP002-CPU
#SBATCH -p icelake
#SBATCH --job-name=fair-mast-ingest
#SBATCH --output=fair-mast-ingest_%A.out
#SBATCH --time=5:00:00
#SBATCH --mem=250G
#SBATCH --ntasks=128
#SBATCH -N 2


summary_file=$1
bucket_path=$2
num_workers=$SLURM_NTASKS

random_string=$(head /dev/urandom | tr -dc A-Za-z0-9 | head -c 16)
temp_dir="/rds/project/rds-sPGbyCAPsJI/local_cache/$random_string"
metadata_dir="/rds/project/rds-sPGbyCAPsJI/data/uda"

source /rds/project/rds-sPGbyCAPsJI/uda-ssl.sh

mpirun -np $num_workers \
python3 -m src.main $temp_dir $summary_file --metadata_dir $metadata_dir --bucket_path $bucket_path --upload --force --source_names ${@:3}
22 changes: 22 additions & 0 deletions jobs/metadata.csd3.slurm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/bin/bash
#SBATCH -A UKAEA-AP002-CPU
#SBATCH -p icelake
#SBATCH --job-name=fair-mast-ingest
#SBATCH --output=%A_%a.out
#SBATCH --time=0:20:00
#SBATCH --mem=60G
#SBATCH --ntasks=128
#SBATCH -N 2

num_workers=$SLURM_NTASKS

uda_path="/rds/project/rds-sPGbyCAPsJI/data/uda"
source /rds/project/rds-sPGbyCAPsJI/uda-ssl.sh

# Parse Signal and Source metadata from UDA
mpirun -n $num_workers python3 -m src.create_uda_metadata $uda_path campaign_shots/M9.csv
mpirun -n $num_workers python3 -m src.create_uda_metadata $uda_path campaign_shots/M8.csv
mpirun -n $num_workers python3 -m src.create_uda_metadata $uda_path campaign_shots/M7.csv
mpirun -n $num_workers python3 -m src.create_uda_metadata $uda_path campaign_shots/M6.csv
mpirun -n $num_workers python3 -m src.create_uda_metadata $uda_path campaign_shots/M5.csv

Loading

0 comments on commit 8d514d4

Please sign in to comment.