-
Notifications
You must be signed in to change notification settings - Fork 9
partitions
- General Purpose Partitions
- Approval Access Partitions
- Dedicated Access Partitions
- Using Constraints
- Retrieving a Node's Hardware Configuration
In the Discovery environment, collections of compute nodes are organized into “partitions”. Always submit jobs to specific partitions using sbatch, or srun.
Information on Discovery Cluster partitions can be found here. Current usage and node availability can be displayed by typing sinfo
: the NODELIST
column in sinfo
indicates the names of corresponding machines. The names of the public partitions are:
-debug
-express
-short
-long
-gpu
-multigpu
-large
A summary of a few commonly used partitions is below:
Partition Name | Old Name |
---|---|
debug, express, short, long, large | general, infiniband, ser-par-10g-2, ser-par-10g-3, ser-par-10g-4, ht-10g, largemem-10g, interactive-10g |
Timing, memory, and core limits for commonly used partitions are as follows:
Partition Name | Requires Approval? | Time Limit (default/max) | Core Limit | RAM Limit | Running Jobs (default/max) | Submitted Jobs (default/max) |
---|---|---|---|---|---|---|
debug | No | 20min/20min | 128 | 256GB | 10/25 | 25/100 |
express | No | 30min/1h | 2048 | 25TB | 50/250 | 250/1000 |
short | No | 4h/24h | 1024 | 25TB | 50/500 | 100/1000 |
long | Yes | 1day/5days | 1024 | 25TB | 25/250 | 50/500 |
large | Yes | 6h/6h | N/A | N/A | 100/100 | 100/1000 |
gpu | No | 4h/8h | N/A | N/A | 25/250 | 50/1000 |
These partitions contain several different types of machines, accessible via appropriate --constraint
calls (see Using Constraints):
Partition Name | CPU | Frequency | Core Number | Memory Available | Constraint |
---|---|---|---|---|---|
debug, express, short, long, large | Dual Intel Xeon E5-2650 | 2.00GHz | 16 | 128GB | E5-2650@2.00GHz |
Dual Intel Xeon E5-2680 v2 | 2.80GHz | 20 | 128GB | E5-2680v2@2.80GHz | |
Dual Intel Xeon E5-2690 v3 | 2.60GHz | 24 | 128GB | E5-2690v3@2.60GHz |
Constraints of the gpu partition are as follows:
Partition Name | CPU+GPU | Frequency | Core Number | Memory Available | Constraint |
---|---|---|---|---|---|
gpu | Dual Intel Xeon E5-2650+ one K20m NVIDIA GPU (23 nodes) | 2.00GHz | 16 | 128GB | E5-2650@2.00GHz |
Dual Intel Xeon E5-2690v3+ one K40m NVIDIA GPU (16 nodes) | 2.60GHz | 24 | 128GB | E5-2690v3@2.60GHz | |
Dual Intel Xeon E5-2680v4+ 8 k80 NVIDIA GPU (8 nodes) | 2.40GHz | 28 | 256GB | E5-2680v4@2.40GHz | |
Dual Intel Xeon E5-2680v4+ 4 p100 NVIDIA GPU (12 nodes) | 2.40GHz | 28 | 256GB | E5-2680v4@2.40GHz | |
Intel Gold 6132 + 4 v100-sxm2 NVIDIA GPU (24 nodes) | 2.60GHz | N/A | N/A | N/A |
In order to ensure fair distribution of resources, access to the long, large, and multigpu partitions is restricted to researchers who have demonstrated their need to use these partitions.
To request access to the long partition, open a general Research Computing ServiceNow ticket, detailing your requirements for needing access to the long partition. IT will be in contact with you and will require that you meet with a member of our staff for a consultation. Be prepared to share your code and an example of previous jobs that you’ve attempted to run. Note that if your code is easily check pointed, you are not a good candidate for using the long partition.
To request access to the multigpu partition, download and complete all parts of the multigpu access form located here. Attach this form to a general Research Computing ServiceNow ticket. Your request will be reviewed by two faculty members, and you will be notified of your application’s acceptance or rejection through the ServiceNow ticket that you submitted.
To request access to the large partition, download and complete all parts of the large partition access form located here. Attach this form to a general Research Computing ServiceNow ticket. Your request will be reviewed by a faculty member, and you will be notified of your application’s acceptance or rejection through the ServiceNow ticket that you submitted.
Several partitions are owned by ECE faculty, and access to them is restricted: their use is permitted only after obtaining explicit access from the respective owners. Information on these partitions can be found below:
Partition Name | Machines | Name Range | Old Name | CPUs per Machine | RAM | CPU | Constraint |
---|---|---|---|---|---|---|---|
ioannidis | 8 | c[3096-3103] |
ioannidis1 | 40 | 128GB | Intel Xeon CPU E5-2680v2 2.8GHz | E5-2680v2@2.80GHz |
8 | c[3120-3127] |
ioannidis2 | 56 | 500GB | Intel Xeon CPU E5-2680v4 2.4GHz | E5-2680v4@2.40GHz | |
danabrooks | 1 | c[4021] |
danabrooks | 96 | 256GB | Intel Xeon CPU E7-4830v3 2.8GHz | E7-4830v3@2.8GHz |
As there are many different CPU configurations in a partition, additional arguments need to be passed to use specific machines under either sbatch
(i.e., in batch-mode) or srun
(i.e., in interactive mode).
For example, to submit a job to a Dual Intel Xeon E5-2650 machine from the short
partition, you should evoke sbatch
with the following arguments:
#SBATCH --partition=short
#SBATCH --constraint=E5-2650@2.00GHz
As another example, if you want to use nodes in the old ioannidis1
partition, you should add the following constraints:
#SBATCH --partition=ioannidis
#SBATCH --constraint=E5-2680v2@2.80GHz
Appropriate constraints are listed in the tables above. Note that an alternative way of getting access to specific nodes is through their name. For example, submitting a job to specific node c3096
in the ioannidis1
partition (presuming c3096
is idle) can be done via:
#SBATCH --partition=ioannidis
#SBATCH -w c3096
This is useful if you are trying to ensure that your experiments run on the same machine.
Tip. See also
--nodelist
option for submitting jobs to multiple machines.
To get the CPU, memory, etc., configuration of all nodes with a single command, type
grep Feature /shared/centos7/etc/slurm/nodes.conf
from any node, including the gateway.
Alternatively, log in to a node in interactive mode and type:
lscpu
This will show you information about that node specifically.
Back to main page.