Difference in Job Submission between Jubail and Dalma¶
This section highlights some of the important chnages you would need in your job submission scripts when moving from Dalma HPC to Jubail HPC.
We shall discuss each of the sections below:
Access
Partitions
Number of Tasks
Memory
Bigmem nodes
GPU nodes
Timelimit
Preempt jobs
Default Quota
Summary¶
Dalma |
Jubail |
|
---|---|---|
Access |
||
Dalma was accessed as follows: |
Jubail can be accessed as follows: |
|
ssh <NetID>@dalma.abudhabi.nyu.edu
|
ssh <NetID>@jubail.abudhabi.nyu.edu
|
|
Partitions |
||
Dalma had |
No |
|
Only one generalized partition called |
||
More info can be found here |
||
Dalma based job scipts with |
||
|
||
#SBATCH -p parallel
|
#SBATCH -p compute
|
|
Maximum number of Tasks/CPUs |
||
Maximum number of CPUs per node was 28 in Dalma. |
Maximum number of CPUs per node is 128 on Jubail. |
|
Small jobs (requiring less than 28 CPUs) will be sent to old Dalma nodes by the scheduler. |
||
Medium and Large jobs will be prioritized and sent to Jubail. |
||
Small jobs (less than 28 CPUs),mostly belonging to python and R, don’t need any change and will be suported on jubail as well. |
||
Medium and Large jobs (MPI Jobs) would need an adjustment as the |
||
Requesting 280 CPUs( #SBATCH -p parallel
#SBATCH -n 280
|
Requesting the nearest multiple of 128 cores as compared to what was requested on Dalma, 256 CPUs ( #SBATCH -p compute
#SBATCH -n 256
|
|
Memory |
||
Total Memory per node was 112GB on Dalma |
Total Memory per node is 480GB on Jubail |
|
Default memory assigned for a job was 4GB per CPU. |
Default memory assigned for a job is 3.75GB per CPU. |
|
Max allowed memory per node was 112GB. |
Max allowed memory per node is 480GB. |
|
#SBATCH -p parallel
#SBATCH --mem=80G
|
#SBATCH -p compute
#SBATCH --mem=200G
|
|
BigMem nodes |
||
Large memory nodes were requested using the |
Large memory nodes are requested using the |
|
Dalma has three large memory nodes. |
Jubail has four large memory nodes. |
|
|
|
|
Large mem nodes were requested when required memory was greater than 112GB #SBATCH -p bigmem
#SBATCH --mem=200G
|
Large mem nodes are requested ONLY when required memory is greater than 480GB #SBATCH -p bigmem
#SBATCH --mem=700G
|
|
GPU nodes |
||
Dalma had 14 GPU nodes with 2 Nvidia |
On addition to the Dalma GPU nodes, Jubail has 24 GPU nodes with one |
|
Dalma had exclusive GPU nodes. Hence, only GPU jobs were running on GPU nodes. |
Jubail has both exclusive ( |
|
Only Nvidia |
On Jubail, Users have an option to choose between Nvidia |
|
By Default, the GPU jobs will be sent to |
||
The users can test the performance differences between the |
||
Since, |
||
You can also mention in your job script if you would like to explicitly send your job to a100 nodes. |
||
When requesting a single GPU #SBATCH -p nvidia
#SBATCH --gres=gpu:1
|
The syntax on Jubail for requesting a single GPU is same as Dalma #SBATCH -p nvidia
#SBATCH --gres=gpu:1
|
|
When requesting a single GPU on new #SBATCH -p nvidia
#SBATCH --gres=gpu:a100:1
|
||
WallTime |
||
Max WallTime on Dalma was linked to the account they belong to (physics,students,engineering etc) |
Max Wall time on Jubail is linked to the type (size) of job submitted by the user irrespective of the account they belong to. |
|
The details of the types of Jobs and their respective limits can be found in the link here |
||
Preempt partition |
||
Partition used for quick testing with high job priority available for everyone. |
||
Max Walltime for preempt jobs was 30 minutes on Dalma |
Max Walltime for preempt jobs on Jubail is 7 days |
|
More info on this can be found here |
||
Default Quota |
||
Dalma had 4 storage systems |
Jubail has same storage systems as Dalma and same Default Quota for all storage systems except $HOME. |
|
Default Quota for Dalma per |
Default Quota for Jubail per |