A “scratch drive” is a temporary storage area on a compute node which can be accessed by the user. Every node has a local scratch drive. If your application does intensive file I/O, it is recommended that you use the scratch drive or Dynamic Storage Aggregation (DSA) parallel scratch file system.

The Scratch Directory

The scratch directory (/scratch/$PBS_JOBID) is automatically created on each node of the job before your job starts and is destroyed after your job finishes. Therefore any data in the scratch directory will be lost once the job finishes.

The location of the scratch drive is given by the $SCRATCH and $TMPDIR environment variables. Any data you want to use on the local scratch drive should be copied to $SCRATCH.

If your application requires a static location for the scratch drive, you can also use /scratch/<USERID>, where <USERID> is your username. Note however, you may run into resource contention issues if you have more than 1 job running on a given compute node.

Dynamic Storage Aggregation – Parallel Scratch Drive

Sabalcore’s Dynamic Storage Aggregation (DSA) uses cutting-edge technology to aggregate the performance and capacity of all the local scratch drives of a job. DSA can be used to provide an independent shared parallel file system for the duration of the compute job. Most software applications can utilize the local drive in the head node. But I/O performance is limited to the speed of a single drive. For workloads with special I/O requirements, DSA can be used to significantly boost the performance without requiring code changes by the user. DSA uses the performance of RDMA over Infiniband and aggregates the storage capacity of all the local scratch drives of the job, creating a powerful, parallel file system in a single namespace. When the job completes, DSA wipes the drives before exiting.

How to Use DSA

To use DSA, simply add “-l other=scratch:shared” to the pbs directive and use $SCRATCH_SHARED for the directory. For example:

Interactive Job

[user@sci ~]$ qsub -I -l nodes=4:ppn=12:red -l other=scratch:shared
qsub: waiting for job 566450.jman to start
qsub: job 566450.jman ready
[user@n824002 ~]$ echo $SCRATCH_SHARED
/mnt/scratch/566450.jman
[user@n824002 ~]$

The above will create a parallel file system using the scratch drive on the 4 nodes.

Using in a .pbs script

#PBS -l nodes=4:red:ppn=12 -l other=scratch:shared
cd ~/work
# copy the input.dat file
cp input.dat $SCRATCH_SHARED

Manually copying files to each Scratch Drive of a Parallel Job

If your application requires the data on each node of your job, you can manually copy it to each node’s scratch drive. The following PBS script shows how to copy data to/from the scratch drives on every node of a job. The example copies one file (input.dat) to each node’s scratch drive before the main application starts. After the application finishes, one output.dat file from each node is copied to the working directory. The MPIEXEC_RANK variable is the id# of the node in the job. Note that the “\\” before $MPIEXEC_RANK is required.

#PBS -l nodes=4:red:ppn=12
cd ~/work
# copy the input.dat file to each node
mpiexec -comm none -pernode cp input.dat $SCRATCH
# run your application
mpiexec a.out
# copy the output.dat file from each node back to
# your home directory and give it a unique name.
mpiexec -comm none -pernode cp $SCRATCH/output.dat \
output.dat.\\$MPIEXEC_RANK
# put all the output.dat files into a single file.
cat output.dat.* > output.dat
# remove the individual output.dat files.
rm output.dat.*