Schedulers

A cluster scheduler allows one or multiple users to share the available compute resource, as well as queue compute jobs to run when resource becomes available. A wide variety of different commercial and open-source schedulers are available for compute clusters, each providing different features for particular types of workload. All schedulers are designed to perform the following functions:

Allow users to submit new jobs to the cluster
Allow users to monitor the state of their queued and running jobs
Allows users and environment administrators to control running jobs
Monitor the status of compute resource including system load, memory, etc.

More advanced schedulers can be configured to implement policies that control how jobs are executed on the cluster, ensuring fair-sharing and optimal loading of the available resources. Most schedulers are extendible with a variety of plug-in options for monitoring different metrics, reporting system usage and allowing job submission via different interfaces. The scheduler system available on your compute cluster will depend on how your system administrator has configured the system – they will be able to advise you on how your HPC cluster is set up.

When a new job is submitted by a user, the cluster scheduler software assigns compute cores and memory to satisfy the job requirements. If suitable resources are not available to run the job, the scheduler adds the job to a queue until enough resources are available for the job to run. Your system administrator can configure the scheduler to control how jobs are selected from the queue and executed on cluster nodes. Once a job has finished running, the scheduler returns the resources used by the job to the pool of free resources, ready to run another user job.