In this section
- Types of jobs – batch and interactive
- Understand what a pbs script is
- Specifying job parameters
- An example pbs script
- Accessing specialized nodes such as GPUs and High-memory nodes
Batch vs Interactive Jobs
–> Interactive Jobs – Jobs that require user input while the job is running. Most commonly, these are jobs that are launched by one of the job launchers in the remote desktop. Examples include STAR-CCM+, FDS, Ansys, etc. If you intend to run an interactive job, jump to –> section 8 “Remote Desktop – Workstation in the Cloud”
Batch Jobs – jobs that run without user interaction or a GUI and require a “pbs script” which defines the job’s requirements, cores, and application. Read this article.
pbs script (define the job’s parameters)
If you use one of the job launchers which configures everything for you, it is not necessary to create a pbs script. But it is helpful to understand this section. If you need to create your own job, use the following information.
The pbs script file is a simple text file that contains all the information needed to run your job such as location of your application, job parameters, number of processors to use, etc. The file is also considered a “script” file. A script file in Linux is similar to a batch file in MS Windows. There are example pbs scripts below that can be copied and modified for your needs. Typically, the name of the pbs file ends with .pbs. For example: myfile.pbs.
The PBS script contains directives that the system uses to determine on which node(s) the job should run. The directives should be at the beginning of your script file and begin with “#PBS”. The directives can also be supplied as command-line options to the qsub command when submitting jobs. The command-line options will override any directives in the PBS script. Any other line in your script which begins with ‘#’ is considered a comment line.
The PBS script at a minimum should contain:
- The #PBS Nodes Directive or Multiple Nodes Directives
- The location (path) of your working directory
- one or more “module” commands to define the environment and select pre-installed software packages
- The application’s start command(s)
Nodes Directive
The “nodes” directive specifies on which type of compute node(s) or you want your job to use. Use the “upnodes” command and the terminal prompt for a description of each system and description of the compute nodes.
The “nodes” directive is always the at the top of the .pbs file and has the following syntax:
#PBS -l nodes=X:cluster:ppn=Y
Where “X” is the number of nodes on which the job should run, “cluster” is the name of the cluster on which the job should run (see the “upnodes” command), and “Y” is the number of processor cores per node (ppn) that the job should use. For example to run a parallel job on 3 nodes of the copper cluster using 24 cores per node (72 cores in total), the first line of your PBS script should look like this:
#PBS -l nodes=3:copper:ppn=24
Multiple Nodes Directive
To better insure your job starts quickly, you should specify nodes from several clusters using the multiple nodes directive. In the example script below, the system will first try to start the job using the first “nodes=” directive (i.e. nodes of the cobalt cluster). If those nodes are unavailable, it will next try the second “nodes+=” directive (copper cluster) and so on. Note the successive “+” symbol for each directive. For example:
#PBS -l nodes=4:green:ppn=16 #PBS -l nodes+=3:blue:ppn=24 #PBS -l nodes++=5:red:ppn=12 #PBS -l nodes+++=1:blue:ppn=36
In this example, the job will start immediately on 4 cobalt nodes if they are available. If 4 cobalt nodes are not available, it will try to start on 3 copper nodes. If copper nodes are not available, the job will then try to start on 5 red nodes, then 1 blue node. Up to 6 directives can be specified. If none of the nodes are available, the job will wait in the queue until one of the directives is fulfilled. The benefit of using multiple nodes directives is to increase the chance of your job starting sooner.
The nodes directives can also be specified on the command line. For example:
qsub -I -l nodes=4:onyx:ppn=16,nodes+=4:red:ppn=16,nodes++=1:blue:ppn=36
The following sections provide more information on different types of jobs and some specific examples.
Path and Location of your Working Directory
$PBS_O_WORKDIR automatically specifies the current working directory or the directory from which the PBS script was submitted to the resource manager. So it is not necessary to explicitly define the path in this case. However, any directory within your user account can be defined here.
cd $PBS_O_WORKDIR
Specify the software environment – “module” command
The “module load [package]” loads and defines the environment for a pre-installed software application. For example:
module load fds
Listing the Available Module Packages
There are hundreds of software packages and libraries available at Sabalcore, and we are continually adding more.
[user@sci02 ~]$ module avail ------------- /uls/7/Modules/modulefiles-------------- adios2/2.6.0 gromacs/4.0.7_s OpenFOAM/9 abinit/6.10.2 namd/2.13/ibverbs OpenFOAM/v2306 abinit/7.0.5 hdf5/intel/1.12.2 OpenFOAM/v2206 ansys/v202 rstudio/1.4.1106 openmpi/1.3.2 ---
Listing Available versions of a specific Software Module
You can list the available versions of a specific packages using the ‘avail’ subcommand. For example:
[user@sci ~]$ module avail fds ------------------- /uls/6/Modules/modulefiles --------------------- fds/5.5.3 fds/6.1.0 fds/6.1.2 fds/6.2.0 fds/6.3.0 fds/6.7.5 fds/6.7.7 fds/6.7.9
Getting Additional Details About a Package
The ‘help’ and ‘display’ subcommands will display additional information about a given package. The ‘help’ command will show basic information. Some packages also have examples on how to use them. The ‘help’ subcommand will show you how to run the examples. The ‘display’ subcommand shows you some more detailed information about how your environment is affected by loading the package. For example:
[user@sci ~]$ module help CALPUFF ----------- Module Specific Help for 'CALPUFF/6.42' --------------- Name: CALPUFF Version: 6.42 Installed: 01-Mar-2012 CALPUFF is an advanced non-steady-state meteorological and air quality modeling system developed by ASG scientists. It is maintained by the model developers and distributed by TRC. An example is available for CALPUFF. To use the example, run the following commands: cp -a /usr/local/CALPUFF-6.42/examples/CALPUFF . cd CALPUFF qsub CALPUFF.pbs
Specify the software start command and options
Lastly, specify the start command and options for the application.
fds_smp myfds.fds
Example PBS Scripts
Single Node Job in Batch Mode
Single node jobs are jobs which run on a single node and use at least 1 core. Below is an example PBS script which defines the job parameters for the fictitious “prog” application. To submit this job, first create a text file (for example “script.pbs”) containing the following lines, then use the “qsub script.pbs” command to submit the job to the queue for processing. The lines with the “#” are comments and are not required except for the “#PBS” which is required. Do not use the “#” in the job name because this will cause issues with the automated resources manager. Example PBS script for a single node job:
example-1.pbs:
#PBS -l nodes=1:red:ppn=12 #PBS -l nodes+=1:onyx:ppn=16 #set a job name, this is optional #PBS -N my-test1 # This is the current working directory cd $PBS_O_WORKDIR # load any required modules to setup your environment module load gcc # Start the program with any required options ./program -s 0 -e 100
Note: For most pre-installed software at Sabalcore, when using 'module load ...' automatically loads the required libraries (such as the mpi library). You do not need to specify any additional modules in your pbs script for pre-installed software. For trouble shooting, use "module list" to list the modules loaded in your environment.
Parallel Job using Multiple Nodes
Parallel jobs are jobs which run on one or more nodes, use multiple cores, and use some form of inter-process communication (MPI). The software must be written to use MPI otherwise it will it will not run on multiple nodes. Contact Support if you have any questions.
Running parallel jobs at Sabalcore will probably be slightly different than you may be familiar with. In particular, you must start all MPI-type applications with our custom mpiexec tool rather than the normal mpirun or mpd. You must also include the “ppn=” portion of the nodes directive for parallel jobs.
–> All pre-installed distributed parallel software at Sabalcore uses Infiniband (IB) with OpenMPI. When using OpenMPI and mpiexec, Infiniband will be used by default. Use the upnodes command to see which clusters have IB support. <–
Below is an example PBS script which runs the fictitious “prog-mpi” application (which was installed by the customer in the customer’s account) in parallel mode on a total of 48 cores of the “red” or “green” cluster (whichever is available first).
example-2.pbs
#PBS -l nodes=3:red:ppn=16 #PBS -l nodes+=3:green:ppn=16 #PBS -N parallel-3X16 # Load the appropriate MPI package for your program module load openmpi # Change to the working directory cd ~/work # Start your program mpiexec ~/bin/prog-mpi --input=input.txt
Other types of Nodes
GPU Nodes
A select number of nodes also have one Nvidia Tesla GPU. The GPUs are currently Tesla P4s. Reservation of the whole node is required to access the GPU.
qsub -I -l nodes=1:cuda:ppn=20
Note: there are several version of CUDA available. Use “module avail cuda” to see the current listing.
High Memory Nodes
A select number of “blue” nodes have 384GB of RAM, or 16GB per core To utilize these nodes, use the following examples:
qsub -I -l nodes=1:hmem:ppn=36 or qsub -I -l nodes=1:hmem:ppn=8