Viewing the status of a job

The grid-scheduler scheduler allows users to view the status of the jobs they have submitted. The qstat command displays the status of all the jobs submitted by the user:

job-ID  prior   name       user     state submit/start at     queue               slots 
---------------------------------------------------------------------------------------
  321 0.55500 openmpi.sh  alces-user  r   01/06/2011 10:04:43 [email protected]  16
  331 0.22000 sleepjob.sh alces-user  r   01/06/2011 11:44:20 [email protected]     1
  332 0.12000 sleepjob.sh alces-user  qw  01/06/2011 11:45:10                         1
  334 0.12000 sleepjob.sh alces-user  qw  01/06/2011 11:48:44                         1  
  335 0.12000 sleepjob.sh alces-user  qw  01/06/2011 11:48:52                         1

The job state can be marked as one or more of the following:

Status code Job state Description
d deleted A user or administrator has requested that the job should be deleted from the queueing system
E error The job is in error status. Use the -explain E option to qstat for more information
h hold The job has been set to hold by a user or administrator
r running The job is running
R restarted The job has been restarted
s suspended The job has been suspended and is not currently running
S suspended The job is currently being suspended
t transferring The job is being transferred to an execution host to be run
q queued The job is queued for execution
w waiting The job is waiting for resources to be available

By default, the qstat command only shows jobs belonging to the user executing the command. Use the qstat -u '*' command to see the status of jobs submitted by all users.

The qstat -f command provides more detail about the scheduler system, also listing the status of each queue instance on every execution host available in your cluster. Queues are listed with the following status:

Status code Queue state Description
a alarm (load) A queue instance has exceeded its pre-configured maximum load threshold
c configuration error A queue instance has a configuration error - contact your system administrator for assistance
d disabled A queue instance has been temporarily disabled by a system administrator
o orphaned The indicated queue instance has been de-configured, but jobs are still running using queue resources
s suspended The queue instance has been suspended
u unknown The scheduler has lost contact with the machine hosting the queue instance
A alarm (suspend) The queue instance has exceeded its suspension threshold
C calendar suspended The queue has been automatically suspended via the built in calendar facility. Contact your system administrator for information on the configured calendar policies for your site
D calendar disabled The queue has been automatically disabled via the build in calendar facility. Contact your system administrator for information on the configured calendar policies for your site
E error The scheduler was unable to contact the shepherd process on the machine hosting this queue instance. Contact your system administrator for assistance.
S subordinated This queue instance has been suspended via subordination to another queue.