Quick Intro to Job Submission¶
Things to Keep in mind¶
Types of Jobs |
The jobs are basically categorized into four types based on their resource requirements (Small,Medium,Large,XLarge). |
Quick Testing |
Use the prempt partition for quick testing of your code. More details below. |
Special jobs |
Nvidia (GPU), bigmem (Large memory) jobs fall into this category and are limited in resource. |
Default WallTime |
The default Wall time of all the jobs is 5 hours (except for |
Default Memory |
By Default, 3.75GB of memory is assigned to every CPU requested. So if 4 CPUs ( |
Partitions:
compute
: General purpose partition for all the normal runsnvidia
: Partition of GPU jobsbigmem
: Partition for large memory jobs. Only jobs requesting more than 500GB will fall into this category.prempt
: Supports all types of jobs with a grace period of 30 minutes. More on this herexxl
: Special partition for grand challenge applications. Requires approval from management.visual
: We also have a few nodes which give you the liberty of running applications with GUI.
Partitions Summary¶
Partition |
Job Type |
(min,max)CPUs |
Max Time |
Max Jobs |
Example |
Remarks |
|
---|---|---|---|---|---|---|---|
1 |
|
||||||
Small |
(1,64) |
7 days |
200 |
#SBATCH -p compute
#SBATCH -n 25
|
The small jobs will be forwarded, preferably to Dalma nodes. |
||
Medium |
(28,512) |
7 days |
20 |
#SBATCH -p compute
#SBATCH -n 56
|
|||
Large |
(256,2048) |
2 days |
6 |
#SBATCH -p compute
#SBATCH -n 1024
|
|||
XLarge |
(1024,4096) |
2 days |
3 |
#SBATCH -p compute
#SBATCH -n 4000
|
|||
2 |
|
GPU |
(1,160) |
4 days |
10 |
#SBATCH -p nvidia
#SBATCH --gres=gpu:1
|
Max GPUs:12 |
3 |
|
Large Memory Jobs |
(1,40) |
4 days |
2 |
#SBATCH -p bigmem
#SBATCH --mem=700G
|
Jobs requesting more than 480GB will be fowarded to bigmem |
4 |
|
||||||
preempt-small |
(1,28) |
7 days |
1200 |
#SBATCH -p preempt
#SBATCH -n 25
#SBATCH -t 12:00:00
|
grace period of 30 mins |
||
preempt-big |
(28,8192) |
7 days |
100 |
#SBATCH -p preempt
#SBATCH -n 8100
#SBATCH -t 15:00:00
|
grace period of 30 mins |
Note
Kindly be advised that the resource and job limits mentioned above are indicative and subject to change based on resource utilization and availability.
Sample Job Script¶
- A job script consists of 2 parts:
Resources requirement.
Commands to be executed.
Points to be noted
Ask only what you need
Serial jobs would need only one CPU (
#SBATCH -n 1
)Make sure the walltime specified is not greater than the allowed time limit. More details can be found here.
By Default 3.75GB of memory is assigned for each CPU allocated and hence defining the memory requirement is optional
Difference between CPUs,Cores and Tasks
On Jubail HPC, One CPU is equivalent to one Core. Jubail also has 128 CPUs per node.
In Slurm, the resources (CPUs) are allocated in terms of tasks which are denoted by
-n
or--ntasks
.By Default, the value of
-n
or--ntasks
is one if left undefined.By Default, Each task is equivalent to one CPU.
But if you have defined
-c
or--cpus-per-task
in your job script, then the total number of CPUs allocated to you would be the multiple of-n
and-c
.
#!/bin/bash
#Define the resource requirements here using #SBATCH
#For requesting 10 CPUs
#SBATCH -c 10
#Max wallTime for the job
#SBATCH -t 24:00:00
#Resource requiremenmt commands end here
#Add the lines for running your code/application
module purge
module load abc
#activate any environments if required
conda activate myenv
#Execute the code
python abc.py
Common Job submission arguments:
-n
Select number of tasks to run (default 1 core per task)-N
Select number of nodes on which to run-t
Wallclock in days-hours:minutes:seconds (ex 4:00:00)-p
Select partition (compute, gpu, bigmem)-o
Output file ( with no–e
option, err and out are merged to the Outfile)-e
Keep a separate error File-d
Dependency with prior job (ex don’t start this job before job XXX terminates)-A
Select account (ex physics_ser, faculty_ser)-c
Number of cores required per task (default 1)--ntasks-per-node
Number of tasks on each node--mail-type=type
Notify on state change: BEGIN, END, FAIL or ALL--mail-user=user
Who to send email notification--mem
Maximum amount of memory per job (default is in MB, but can use GB suffix) (Note: not all memory is available to jobs, 8GB is reserved on each node for the OS) (So a 128GB node can allocate up to 120GB for jobs)
#SBATCH with -n
, -c
and -N
¶
It may sometimes be confusing to select between -n
, -c
and -N
. The following section attempts to
describe the difference between these parameters.
-n
refers to number of tasks. Tasks can communicate across the nodes.If the number of tasks, is greater than one, it is possible that they may distributed across multiple nodes.
-c
refers to number of cpus per task.-c
is always confined to a single node and is beneficial for multithreaded jobs.-N
assigns the tasks toN
number of nodes.Each task is by default assigned one cpu and each task is by default assigned a single node.
The values of
-n
,-c
and-N
are by default 1, if not specified.
Command |
Behaviour |
---|---|
#SBATCH -n 10
|
|
#SBATCH -c 10
|
|
#SBATCH -N 1
#SBATCH -n 10
|
|
#SBATCH -n 10
#SBATCH -c 20
|
|
Basic SLURM Commands¶
SLURM is the Resource Manager we use to schedule the jobs to the resources according to the requirements specified. Bellow are a few of the basic commands a user can use for his/her jobs:
Command |
Descirption |
---|---|
sbatch file1
|
|
squeue
|
|
scancel 127445
scancel -u wz22
|
|
Requesting a GPU node¶
To request a Gpu node you have two options:
Requesting only one GPU card of any type
#SBATCH -p nvidia
#SBATCH --gres=gpu:1
Requesting only one GPU card of a specific type( available types are v100 and a100)
#SBATCH -p nvidia
#SBATCH --gres=gpu:a100:1
If you would like to analyze your GPU jobs, Please refer to following section: Analyzing GPU Usage
Preempt Partition¶
Limitless high priority queue with the caveat that the jobs can be preempted (killed) to make space for other jobs demanding resources.
A grace period of 30 mins is given to the job to allow some time for a smooth termination or checkpointing, if needed.
We intend to increase the machine occupancy and reduce the waiting time in queues for those jobs that may have short runtime or are meant to be for testing ,otherwise jobs will be treated as regular jobs.
Default Walltime: 2 hours
Maximum Walltime: 7 days