Project Setup and Execution Guide

Connect to the cluster via SSH

ssh <username>@uc2.scc.kit.edu

The very first action should be, to create a shared workspace:

ws_allocate ASR 60 # 60 Days
ws_allocate MT 60 # 60 Days

Add users to the workspace:

module load system/ws_addon

# Example: ws_share -t dir-w -u uxude ASR
ws_share -t dir-w -u <user> <workspace>

setfacl -Rm u:USERNAME:rwX,d:u:USERNAME:rwX $(ws_find ASR)
setfacl -Rm u:USERNAME:rwX,d:u:USERNAME:rwX $(ws_find MT)

To access the workspace, run:

cd $(ws_find ASR)
cd $(ws_find MT)

To check the remaining time of the workspace, run:

ws_list

To extend the workspace, run:

ws_extend ASR 30 # 30 Days
ws_extend MT 30 # 30 Days

Note that this is automatically done in the setup.sh script once the workspace is about to expire.

Download the project

git clone https://github.com/BertilBraun/Advanced-Improvement-in-Speech-Translation.git PST

Create a virtual environment

First, install miniconda by following the instructions here.

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh

# Initialize conda in your bash shell
~/miniconda3/bin/conda init bash
source ~/.bashrc

Then, create a virtual environment and install the required packages:

cd ~/PST
conda create --name pst
conda activate pst
# Install required packages from environment.yml
conda env update -f environment.yml
# Ensure setup completed and install additional packages
./setup.sh

Running

Ensure that the scripts start executing on the login node to avoid errors after the job has been submitted to the cluster.
```
./run_YOUR_SCRIPT.sh
```
Ensure that the script is executable:
```
chmod +x run_YOUR_SCRIPT.sh
```

Submitting to the cluster

Once you are sure that the script is executable and runs without errors, you can submit it to the cluster.

Make sure, to have set the correct SBATCH parameters in the script, such as timeouts, required cluster cores and GPUs and the job-name. Ensure, that the correct and required modules are being loaded by calling the ~/AI-ST/setup.sh script.

#SBATCH --job-name=process_audio                # job name
#SBATCH --partition=gpu_4                       # single, gpu_4
#SBATCH --time=02:00:00                         # wall-clock time limit  
#SBATCH --mem=200000                            # in MB check limits per node
#SBATCH --nodes=1                               # number of nodes to be used
#SBATCH --cpus-per-task=1                       # number of CPUs required per MPI task
#SBATCH --ntasks-per-node=1                     # maximum count of tasks per node
#SBATCH --mail-type=ALL                         # Notify user by email when certain event types occur.
#SBATCH --gres=gpu:4                            # number of GPUs required per node 
#SBATCH --output=../../ASR/logs/output_%j.txt   # standard output and error log
#SBATCH --error=../../ASR/logs/error_%j.txt     # %j is the job id, making each log file unique, therefore not overwriting each other

To then submit the script to the cluster, run:

sbatch run_YOUR_SCRIPT.sh

You can use the dev_gpu_4 partition for quick testing, but be aware that the maximum runtime is 30 minutes.

Monitoring

To monitor the status of your job, run:
```
squeue -l [(-i 2) to update every 2 seconds]
```
To cancel a job, run:
```
scancel <job-id>
```
The logs of the job are mostly stored in the directory of the script, depending on the task. The output and error logs are named output_<job-id>.txt and error_<job-id>.txt, respectively. The job-id is the number that is returned when submitting the job to the cluster.
Downloading the results

To download the results from the cluster, run:
```
scp [-r] <username>@uc2.scc.kit.edu:~/<PATH-ON-REMOTE> <LOCAL-PATH>
```
The -r flag is only required if you want to download a directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLUSTER_README.md

CLUSTER_README.md

Project Setup and Execution Guide

Files

CLUSTER_README.md

Latest commit

History

CLUSTER_README.md

File metadata and controls

Project Setup and Execution Guide