4) Setting Up Your Job – Sabalcore Knowledge Base

In this section

Types of jobs – batch and interactive
Understand what a pbs script is
Specifying job parameters
An example pbs script
Accessing specialized nodes such as GPUs and High-memory nodes

Batch vs Interactive Jobs

–> Interactive Jobs – Jobs that require user input while the job is running. Most commonly, these are jobs that are launched by one of the job launchers in the remote desktop. Examples include STAR-CCM+, FDS, Ansys, etc. If you intend to run an interactive job, jump to –> section 8 “Remote Desktop – Workstation in the Cloud”

Batch Jobs – jobs that run without user interaction or a GUI and require a “pbs script” which defines the job’s requirements, cores, and application. Read this article.

pbs script (define the job’s parameters)

If you use one of the job launchers which configures everything for you, it is not necessary to create a pbs script. But it is helpful to understand this section. If you need to create your own job, use the following information.

The pbs script file is a simple text file that contains all the information needed to run your job such as location of your application, job parameters, number of processors to use, etc. The file is also considered a “script” file. A script file in Linux is similar to a batch file in MS Windows. There are example pbs scripts below that can be copied and modified for your needs. Typically, the name of the pbs file ends with .pbs. For example: myfile.pbs.

The PBS script contains directives that the system uses to determine on which node(s) the job should run. The directives should be at the beginning of your script file and begin with “#PBS”. The directives can also be supplied as command-line options to the qsub command when submitting jobs. The command-line options will override any directives in the PBS script. Any other line in your script which begins with ‘#’ is considered a comment line.

The PBS script at a minimum should contain:

The #PBS Nodes Directive or Multiple Nodes Directives
The location (path) of your working directory
one or more “module” commands to define the environment and select pre-installed software packages
The application’s start command(s)

Nodes Directive

The “nodes” directive specifies on which type of compute node(s) or you want your job to use. Use the “upnodes” command and the terminal prompt for a description of each system and description of the compute nodes.

The “nodes” directive is always the at the top of the .pbs file and has the following syntax:

#PBS -l nodes=X:cluster:ppn=Y

Where “X” is the number of nodes on which the job should run, “cluster” is the name of the cluster on which the job should run (see the “upnodes” command), and “Y” is the number of processor cores per node (ppn) that the job should use. For example to run a parallel job on 3 nodes of the copper cluster using 24 cores per node (72 cores in total), the first line of your PBS script should look like this:

#PBS -l nodes=3:copper:ppn=24

Multiple Nodes Directive

To better insure your job starts quickly, you should specify nodes from several clusters using the multiple nodes directive. In the example script below, the system will first try to start the job using the first “nodes=” directive (i.e. nodes of the cobalt cluster). If those nodes are unavailable, it will next try the second “nodes+=” directive (copper cluster) and so on. Note the successive “+” symbol for each directive. For example:

#PBS -l nodes=4:green:ppn=16
#PBS -l nodes+=3:blue:ppn=24
#PBS -l nodes++=5:red:ppn=12
#PBS -l nodes+++=1:blue:ppn=36

In this example, the job will start immediately on 4 cobalt nodes if they are available. If 4 cobalt nodes are not available, it will try to start on 3 copper nodes. If copper nodes are not available, the job will then try to start on 5 red nodes, then 1 blue node. Up to 6 directives can be specified. If none of the nodes are available, the job will wait in the queue until one of the directives is fulfilled. The benefit of using multiple nodes directives is to increase the chance of your job starting sooner.

The nodes directives can also be specified on the command line. For example:

qsub -I -l nodes=4:onyx:ppn=16,nodes+=4:red:ppn=16,nodes++=1:blue:ppn=36

The following sections provide more information on different types of jobs and some specific examples.

Path and Location of your Working Directory

$PBS_O_WORKDIR automatically specifies the current working directory or the directory from which the PBS script was submitted to the resource manager. So it is not necessary to explicitly define the path in this case. However, any directory within your user account can be defined here.

cd $PBS_O_WORKDIR

Specify the software environment – “module” command

The “module load [package]” loads and defines the environment for a pre-installed software application. For example:

module load fds

Listing the Available Module Packages

There are hundreds of software packages and libraries available at Sabalcore, and we are continually adding more.

[user@sci02 ~]$ module avail
------------- /uls/7/Modules/modulefiles--------------
adios2/2.6.0         gromacs/4.0.7_s         OpenFOAM/9
abinit/6.10.2        namd/2.13/ibverbs       OpenFOAM/v2306
abinit/7.0.5         hdf5/intel/1.12.2       OpenFOAM/v2206 
ansys/v202           rstudio/1.4.1106      openmpi/1.3.2
---

Listing Available versions of a specific Software Module

You can list the available versions of a specific packages using the ‘avail’ subcommand. For example:

[user@sci ~]$ module avail fds
------------------- /uls/6/Modules/modulefiles ---------------------
fds/5.5.3 fds/6.1.0 fds/6.1.2 fds/6.2.0 fds/6.3.0 fds/6.7.5  fds/6.7.7  fds/6.7.9

Getting Additional Details About a Package

The ‘help’ and ‘display’ subcommands will display additional information about a given package. The ‘help’ command will show basic information. Some packages also have examples on how to use them. The ‘help’ subcommand will show you how to run the examples. The ‘display’ subcommand shows you some more detailed information about how your environment is affected by loading the package. For example:

[user@sci ~]$ module help CALPUFF
----------- Module Specific Help for 'CALPUFF/6.42' ---------------
Name: CALPUFF
Version: 6.42
Installed: 01-Mar-2012
CALPUFF is an advanced non-steady-state meteorological and air quality
modeling system developed by ASG scientists. It is maintained by the
model developers and distributed by TRC. 
An example is available for CALPUFF. To use the example,
run the following commands:
cp -a /usr/local/CALPUFF-6.42/examples/CALPUFF .
cd CALPUFF
qsub CALPUFF.pbs

Specify the software start command and options

Lastly, specify the start command and options for the application.

fds_smp myfds.fds

Example PBS Scripts

Single Node Job in Batch Mode

Single node jobs are jobs which run on a single node and use at least 1 core. Below is an example PBS script which defines the job parameters for the fictitious “prog” application. To submit this job, first create a text file (for example “script.pbs”) containing the following lines, then use the “qsub script.pbs” command to submit the job to the queue for processing. The lines with the “#” are comments and are not required except for the “#PBS” which is required. Do not use the “#” in the job name because this will cause issues with the automated resources manager. Example PBS script for a single node job:

example-1.pbs:

#PBS -l nodes=1:red:ppn=12
#PBS -l nodes+=1:onyx:ppn=16
#set a job name, this is optional
#PBS -N my-test1
# This is the current working directory
cd $PBS_O_WORKDIR
# load any required modules to setup your environment
module load gcc
# Start the program with any required options
 ./program -s 0 -e 100

Note: For most pre-installed software at Sabalcore, when using 'module load ...' automatically loads the required libraries (such as the mpi library). You do not need to specify any additional modules in your pbs script for pre-installed software. For trouble shooting, use "module list" to list the modules loaded in your environment.

Parallel Job using Multiple Nodes

Parallel jobs are jobs which run on one or more nodes, use multiple cores, and use some form of inter-process communication (MPI). The software must be written to use MPI otherwise it will it will not run on multiple nodes. Contact Support if you have any questions.

Running parallel jobs at Sabalcore will probably be slightly different than you may be familiar with. In particular, you must start all MPI-type applications with our custom mpiexec tool rather than the normal mpirun or mpd. You must also include the “ppn=” portion of the nodes directive for parallel jobs.

–> All pre-installed distributed parallel software at Sabalcore uses Infiniband (IB) with OpenMPI. When using OpenMPI and mpiexec, Infiniband will be used by default. Use the upnodes command to see which clusters have IB support. <–

Below is an example PBS script which runs the fictitious “prog-mpi” application (which was installed by the customer in the customer’s account) in parallel mode on a total of 48 cores of the “red” or “green” cluster (whichever is available first).

example-2.pbs

#PBS -l nodes=3:red:ppn=16
#PBS -l nodes+=3:green:ppn=16
#PBS -N parallel-3X16
# Load the appropriate MPI package for your program
module load openmpi
# Change to the working directory
cd ~/work
# Start your program
mpiexec ~/bin/prog-mpi --input=input.txt

Note: ‘mpiexec’ is used to launch the program. This is a custom tool that works in conjunction with Torque (PBS). You do not have to specify a “machinefile” or the number of processors. mpiexec will automatically run on all the processors associated with the job. If you require some special handling of parallel jobs please ask support@sabalcore.com for more help.

Note: All pre-installed software at Sabalcore, when using ‘module load …’ automatically loads the required libraries (such as the mpi library). You do not need to specify any additional modules in your pbs script for pre-installed software at Sabalcore.

Other types of Nodes

GPU Nodes

A select number of nodes also have one Nvidia Tesla GPU. The GPUs are currently Tesla P4s. Reservation of the whole node is required to access the GPU.

qsub -I -l nodes=1:cuda:ppn=20

Note: there are several version of CUDA available. Use “module avail cuda” to see the current listing.

High Memory Nodes

A select number of “blue” nodes have 384GB of RAM, or 16GB per core To utilize these nodes, use the following examples:

qsub -I -l nodes=1:hmem:ppn=36

or

qsub -I -l nodes=1:hmem:ppn=8

Batch vs Interactive Jobs

pbs script (define the job’s parameters)

Nodes Directive

Multiple Nodes Directive

Path and Location of your Working Directory

Specify the software environment – “module” command

Listing the Available Module Packages

Listing Available versions of a specific Software Module

Getting Additional Details About a Package

Specify the software start command and options

Example PBS Scripts

Single Node Job in Batch Mode

Parallel Job using Multiple Nodes

Other types of Nodes

Related Articles

1) Introduction to Sabalcore – Start here

2) All about Cores, Compute Nodes, and Clusters

3) Usage and Account Balance

5) Starting, Stopping and Monitoring Jobs

6) Transferring Files

7) Global File Transfer