Batch cluster usage
When you login to the cluster, you are connecting to the cluster’s head node. Users can only access the compute node by using the batch scheduling system.
The batch scheduling system allows users to submit job requests using the qsub command. The jobs will run when resources become available. All output is captured to file by the batch job system, and users can request e-mail notifications when jobs start or end.
To submit a batch job, a user must create a short text file called a batch job script. The batch job script contains instructions for running the batch job and the specific commands the batch job should execute. Batch jobs run as a new user session, so the batch job script should include any commands needed to setup the user session and navigate to the location of data files, etc.
Below is a sample batch job script for a job that will use a single CPU core.
#!/bin/bash -l #PBS -N mb-serial #PBS -l nodes=1:ppn=1 #PBS -l walltime=10:00:00 #PBS -q batch #PBS -m abe #Comment - batch job setup complete cd mrbayes module load mrbayes mb batch.nex
To submit a batch job script, execute:
This will output a unique job identifier in the form of <job-number>.<hostname>.
The batch job script is a Linux shell script, so the first line specifies the shell interpreter to use when running the script. Note that for bash shell, the “-l” option must be used. Lines starting with “#PBS” are instructions to the batch scheduling system. Note that the “#PBS ” options can be overriden on the qsub command line. For example, “qsub -N bar foo.job” will override any “#PBS -N” directives in the foo.job job script and name the job “bar”.
Common batch job options are:
|#PBS or qsub option||Usage|
|-N||Name of the batch job:
The job name is used to name output files and is also displayed when using qstat to query the job status
|-j||Join output and error output into one file:
Use this option to collect all job output in a single file rather than having separate files for standard output and error.
#PBS -m abe
Send e-mail to the user when the job aborts (a), begins (b) or ends (e). Any combination of these three is allowed. Without this option, e-mail will only be sent when a job aborts.
#PBS -m n
No e-mail will be sent.
|-M||E-mail user list.
#PBS -M firstname.lastname@example.org, email@example.com
List of additional e-mail addresses for messages. Note that e-mail is always sent to your uniqueID@miamioh.edu address, so it does not need to be specified.
|-l (dash lower case L)||Resource list. There are two main types of resources, CPUs and time. Multiple #PBS -l lines can be used to request these separately.
#PBS -l walltime=100:00:00
This requests 100 hours of run time for the job.
#PBS -l nodes=2:ppn=8
This requests 2 physical nodes and all 8 processors on each node (ppn=processors per node).
Note that if you do not specify the number of processors, it will default to one processor core. The default walltime is 1 hour.
|-q||Destination - which batch queue to use.
#PBS -q batch
This sends the job to the default batch queue on the Redhawk cluster.
|-V||Inherit environment settings.
This will cause all environment variables in the Linux session that the job is submitted from to be inherited by the batch job.
|-I (dash upper case i)||Interactive.
use "qsub -I" to request an interactive batch job.
Checking job status:
There are several ways to check the status of a job. The “qstat” command will show the status of all of the jobs currently running or queued. The displayed information includes the status of the job – “Q” = queued, “R” = running, “E” = exiting, and “C” = complete. The qstat command also show the length of time the job has been running in the format hhh:mm:ss.
Details on a specific job can be see using the “qstat -f <job-number>” where <job-number> is the numeric portion of the name returned by the qsub command.
A different view of the job queues can be seen using the “showq” command. This shows separate blocks for active (i.e. running), eligible, and blocked jobs. Waiting jobs are divided into eligible and blocked jobs based on the queue parameters. For example, the queues on Redhawk limit the number of running jobs per user, so queued jobs for users with the maximum number of running jobs will be listed as “blocked”.