-
Create 4 VMs in Google Cloud Platform with the following specifications:
Instance Name Role Machine type vCPUs Memory (RAM) Disk Storage Task to be run VM1 Condor Host e2-medium 2 4 GB 10 GB * Condor Host Program VM2 Condor Submit e2-medium 1 4 GB 30 GB * Condor Submit Program
* Jupyter notebookVM3 Condor Executor 01 e2-medium 2 4 GB 10 GB * Condor Executor Program with Python environment VM4 Condor Executor 02 e2-medium 2 4 GB 10 GB * Condor Executor Program with Python environment
Just a note here, I am using the default network assigned by Google System for all 4 VM instances, without change any settings here.
1. SSH into condor-host VM:
2. Update the Ubuntu's repository:
> sudo apt update
3. Edit hosts file:
> sudo nano /etc/hosts
- Add this line to
/etc/hosts
file, but you need to change it to your VMs' internal ip address:
# new ip addresses added
10.128.0.2 CondorHost
10.128.0.6 SubmissionHost
10.128.0.4 Executor01
10.128.0.5 Executor02
After adding the 4 machine's IP into the /etc/hosts
file, press Ctrl + X
and then type y
4. Install HTCondor's Central manager node:
> curl -fsSL https://get.htcondor.org | sudo GET_HTCONDOR_PASSWORD="abc123" /bin/bash -s -- --no-dry-run --central-manager CondorHost
(Note: you could change the password: abc123
yourselves, to make sure it is consistent when you install HTCondor submit and HTCondor Executor)
Source: HTCondor's admin guide
5. Restart HTCondor:
> sudo systemctl restart condor
> sudo systemctl status condor
1. Again, ssh into HTCondor Submit as we've done before. Then run this command in the terminal to update its Ubuntu's repository
> sudo apt update
2. Edit hosts file:
> sudo nano /etc/hosts
- Add this line to
/etc/hosts
file, but you need to change it to your HTCondor Host VM's internal ip address:
# new ip address added
10.128.0.2 CondorHost
3. Install HTCondor's submit node:
%sh
> curl -fsSL https://get.htcondor.org | sudo GET_HTCONDOR_PASSWORD="abc123" /bin/bash -s -- --no-dry-run --submit CondorHost
(Note: you could change the password: abc123
yourselves, but please make sure the password is the same accross the HTCondor VMs.)
Source: HTCondor's admin guide
4. Restart HTCondor:
%sh
> sudo systemctl restart condor
> sudo systemctl status condor
Repeat these steps for both executor VMs:
1. SSH into the executor VM
2. Update Ubuntu:
> sudo apt update
3. Edit hosts file:
> sudo nano /etc/hosts
- Add this line to
/etc/hosts
file, but you need to change it to your VMs' internal ip address:
# new ip address added
10.128.0.2 CondorHost
5. Install HTCondor Executor:
> curl -fsSL https://get.htcondor.org | sudo GET_HTCONDOR_PASSWORD="abc123" /bin/bash -s -- --no-dry-run --execute CondorHost
(Note: you could change the password: abc123
yourselves, but please make sure the password is the same accross the VMs.)
Source: HTCondor's admin guide
6. Install Python dependencies that you need for your machine learning task:
> sudo apt update
> sudo apt install python3-pip -y
> sudo pip install pandas scikit-learn
7. After running those scripts on respective VMs, you will need to run these commands:
> sudo usermod -aG condor tanyongsheng_net
> # sudo usermod -aG condor docker # only when you want to use Docker runtime
> sudo systemctl restart condor
> sudo systemctl status condor