Viewing the status of a job
The grid-scheduler scheduler allows users to view the status of the jobs they have submitted. The qstat command displays the status of all the jobs submitted by the user:
job-ID prior name user state submit/start at queue slots
---------------------------------------------------------------------------------------
321 0.55500 openmpi.sh alces-user r 01/06/2011 10:04:43 [email protected] 16
331 0.22000 sleepjob.sh alces-user r 01/06/2011 11:44:20 [email protected] 1
332 0.12000 sleepjob.sh alces-user qw 01/06/2011 11:45:10 1
334 0.12000 sleepjob.sh alces-user qw 01/06/2011 11:48:44 1
335 0.12000 sleepjob.sh alces-user qw 01/06/2011 11:48:52 1
The job state can be marked as one or more of the following:
| Status code | Job state | Description |
|---|---|---|
d |
deleted |
A user or administrator has requested that the job should be deleted from the queueing system |
E |
error |
The job is in error status. Use the -explain E option to qstat for more information |
h |
hold |
The job has been set to hold by a user or administrator |
r |
running |
The job is running |
R |
restarted |
The job has been restarted |
s |
suspended |
The job has been suspended and is not currently running |
S |
suspended |
The job is currently being suspended |
t |
transferring |
The job is being transferred to an execution host to be run |
q |
queued |
The job is queued for execution |
w |
waiting |
The job is waiting for resources to be available |
By default, the qstat command only shows jobs belonging to the user executing the command. Use the qstat -u '*' command to see the status of jobs submitted by all users.
The qstat -f command provides more detail about the scheduler system, also listing the status of each queue instance on every execution host available in your cluster. Queues are listed with the following status:
| Status code | Queue state | Description |
|---|---|---|
a |
alarm (load) |
A queue instance has exceeded its pre-configured maximum load threshold |
c |
configuration error |
A queue instance has a configuration error - contact your system administrator for assistance |
d |
disabled |
A queue instance has been temporarily disabled by a system administrator |
o |
orphaned |
The indicated queue instance has been de-configured, but jobs are still running using queue resources |
s |
suspended |
The queue instance has been suspended |
u |
unknown |
The scheduler has lost contact with the machine hosting the queue instance |
A |
alarm (suspend) |
The queue instance has exceeded its suspension threshold |
C |
calendar suspended |
The queue has been automatically suspended via the built in calendar facility. Contact your system administrator for information on the configured calendar policies for your site |
D |
calendar disabled |
The queue has been automatically disabled via the build in calendar facility. Contact your system administrator for information on the configured calendar policies for your site |
E |
error |
The scheduler was unable to contact the shepherd process on the machine hosting this queue instance. Contact your system administrator for assistance. |
S |
subordinated |
This queue instance has been suspended via subordination to another queue. |