Submitting a parallel job

Your cluster has also been configured with a parallel queue suitable for running MPI jobs across multiple nodes. By default, this queue has been configured with a parallel environment for the MPI environments available on your cluster. To submit a parallel job via the cluster scheduler, users can create a job script and submit it with the qsub command, using the -pe name <slots> grid-scheduler directive:

Simple MPI job script

#!/bin/bash
#$ -V -cwd
#$ -pe mpinodes 4
module load apps/hpl
mpirun -np 1 ./benchmark18.bin

The above job script uses the HPL performance benchmark over many nodes. To view the status of the job script, and which nodes have been chosen - use the qstat -f command:

[alces-cluster@login1(awscluster) ~]$ qstat -f 
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
[email protected] BIP   0/0/1          0.00     linux-x64     S
---------------------------------------------------------------------------------
[email protected] BIP   0/0/1          0.05     linux-x64     S
---------------------------------------------------------------------------------
[email protected] BIP   0/0/1          0.01     linux-x64     S
---------------------------------------------------------------------------------
[email protected] BIP   0/0/1          0.00     linux-x64     S
---------------------------------------------------------------------------------
[email protected] IP    0/1/1          0.00     linux-x64     
     10 11.15234 mpi_job.sh alces-cluste r     12/09/2015 11:35:55     1        
---------------------------------------------------------------------------------
[email protected] IP    0/1/1          0.05     linux-x64     
     10 11.15234 mpi_job.sh alces-cluste r     12/09/2015 11:35:55     1        
---------------------------------------------------------------------------------
[email protected] IP    0/1/1          0.01     linux-x64     
     10 11.15234 mpi_job.sh alces-cluste r     12/09/2015 11:35:55     1        
---------------------------------------------------------------------------------
[email protected] IP    0/1/1          0.00     linux-x64     
     10 11.15234 mpi_job.sh alces-cluste r     12/09/2015 11:35:55     1

The mpirun command included in the jobscript above submits an OpenMPI job with 8 processes – the MPI machinefile is automatically generated by grid-scheduler and passed to the MPI without needing further parameters. The -pe mpi 4 directive instructs the scheduler to submit the jobscript to the MPI parallel environment using 4 node slots (using 2 processes per node).

There are 4 parallel environments set up on your cluster by default:

  • mpinodes - default MPI parallel environment
  • mpinodes-verbose - submits the job wrapped in job submission information such as date submitted and queue information
  • mpislots - slots available across multiple nodes instead of an entire node
  • mpislots-verbose - submits the job wrapped in job submission information such as date submitted and queue information