Memory (RAM)

Access to the memory on each compute node is controlled by how many cores are allocated on each node for a particular job. For example, a job running on 4 cores on a single cobalt node will be able to access 32GB RAM (4 x 8GB). In other words, the RAM on each node is divided up equally among all the cores on the node. To calculate the total RAM available to a job on each node, use the following formula:

Mj = Mc * Nj

where Mj is the RAM accessible by the job on the node, Mc is the RAM/core on the node, and Nj is the number cores allocated to the job on the node. You can use the “upnodes” command to see the RAM/core for the compute nodes in each cluster. Also note that simply using more than one node for a job will not allow a serial process running on one node to use the RAM available on another node. In order to share memory across nodes, one should use a message-passing library such as OpenMPI.

IMPORTANT: Jobs that attempt to access more memory than allowed will be suspended by the scheduling system. A warning message will appear in the error file. In this case, you should increase the number of cores/node allocated to the job.

 

Swap

By default, swap is not allocated. Swap can be useful for high-memory jobs that may intermittently use more RAM than allocated (and therefor not crash)

Swap can be enabled on a per-job basis. A single whole node is required. Only integer units of GB are allowed. The swap space is deleted when the job finishes.

16:01 demo@sci ~]$ qsub -I -l nodes=1:red:ppn=16 -l other=swap:3G
qsub: waiting for job 1020992.jman to start
qsub: job 1020992.jman ready

Setting up swapspace version 1, size = 3145724 KiB
no label, UUID=dd927b68-25bf-4644-8253-e7f389a10f22
16:09 demo@n944024 ~]$ free -g16:13 demo@n944028 ~]$ free --mega
              total        used        free      shared  buff/cache   available
Mem:         257675        2372      251623           2        3679      254393
Swap:          3071           0        3071
16:09 demo@n944024 ~]$ exit
logout

qsub: job 1020992.jman completed