Batch cluster usage

When you login to the cluster, you are connecting to the cluster’s head node.  Users can only access the compute node by using the batch scheduling system.

The batch scheduling system allows users to submit job requests using the qsub command.  The jobs will run when resources become available.  All output is captured to file by the batch job system, and users can request e-mail notifications when jobs start or end.

To submit a batch job, a user must create a short text file called a batch job script.  The batch job script contains instructions for running the batch job and the specific commands the batch job should execute.  Batch jobs run as a new user session, so the batch job script should include any commands needed to setup the user session and navigate to the location of data files, etc.
Below is a sample batch job script for a job that will use a single CPU core.

 #!/bin/bash -l
 #PBS -N mb-serial
 #PBS -l nodes=1:ppn=1
 #PBS -l walltime=10:00:00
 #PBS -q batch
 #PBS -m abe
 #Comment - batch job setup complete
 cd mrbayes
 module load mrbayes
 mb batch.nex

To submit a batch job script, execute:
qsub script.job

This will output a unique job identifier in the form of <job-number>.<hostname>.

The batch job script is a Linux shell script, so the first line specifies the shell interpreter to use when running the script. Note that for bash shell, the “-l” option must be used.  Lines starting with “#PBS” are instructions to the batch scheduling system.  Note that the “#PBS ” options can be overriden on the qsub command line.  For example, “qsub -N bar foo.job” will override any “#PBS -N” directives in the foo.job job script and name the job “bar”.

Common batch job options are:

#PBS or qsub optionUsage
-NName of the batch job:
The job name is used to name output files and is also displayed when using qstat to query the job status
-jJoin output and error output into one file:
#PBS -j
Use this option to collect all job output in a single file rather than having separate files for standard output and error.
-mMail options:
#PBS -m abe
Send e-mail to the user when the job aborts (a), begins (b) or ends (e). Any combination of these three is allowed. Without this option, e-mail will only be sent when a job aborts.
#PBS -m n
No e-mail will be sent.
-ME-mail user list.
#PBS -M,

List of additional e-mail addresses for messages. Note that e-mail is always sent to your address, so it does not need to be specified.
-l (dash lower case L)Resource list. There are two main types of resources, CPUs and time. Multiple #PBS -l lines can be used to request these separately.
#PBS -l walltime=100:00:00
This requests 100 hours of run time for the job.
#PBS -l nodes=2:ppn=8
This requests 2 physical nodes and all 8 processors on each node (ppn=processors per node).
Note that if you do not specify the number of processors, it will default to one processor core. The default walltime is 1 hour.
-qDestination - which batch queue to use.
#PBS -q batch
This sends the job to the default batch queue on the Redhawk cluster.
-VInherit environment settings.
This will cause all environment variables in the Linux session that the job is submitted from to be inherited by the batch job.
-I (dash upper case i)Interactive.
use "qsub -I" to request an interactive batch job.

Checking job status:

There are several ways to check the status of a job.  The “qstat” command will show the status of all of the jobs currently running or queued.  The displayed information includes the status of the job – “Q” = queued, “R” = running, “E” = exiting, and “C” = complete.  The qstat command also show the length of time the job has been running in the format hhh:mm:ss.

Details on a specific job can be see using the “qstat -f <job-number>” where <job-number> is the numeric portion of the name returned by the qsub command.

A different view of the job queues can be seen using the “showq” command.  This shows separate blocks for active (i.e. running), eligible, and blocked jobs.  Waiting jobs are divided into eligible and blocked jobs based on the queue parameters.  For example, the queues on Redhawk limit the number of running jobs per user, so queued jobs for users with the maximum number of running jobs will be listed as “blocked”.