Notice! This document is currently in Archived status.
The content of this document may be incorrect or outdated.

Print this article Edit this article

Configuring and setting up TORQUE/MAUI clusters

TORQUE - Terascale Open-Source Resource and QUEue Manager

TORQUE is an open source resource manager providing control over batch jobs and distributed compute nodes. It is a community effort based on the original *PBS project.  The source for TORQUE can be found at http://www.clusterresources.com.

Installing TORQUE

TORQUE is provided in RPM format and available from the ECN repository as an optional package.  The package is added to a host via igor.  To set up the installation requires the definition of two igor variables, PKGS_TORQUE and TORQUE_SERVER_NAME.  If the host is to be the TORQUE server for the cluster, it will have the additional variable TORQUE_SERVER defined.

Defining PKGS_TORQUE will result in the following RPMs to be installed via the local_pkgs file in igor:

torque
torque-mom

Defining TORQUE_SERVER will result in the following additional RPMs to be installed via the local_pkgs file in igor:

torque-server
torque-client
maui
maui-local

The TORQUE_SERVER_NAME definition denotes which of the hosts in the cluster will be the TORQUE server and is defined for each host in the cluster.  This value should be the fully qualified name for the server host and will be the same for each host in a cluster.

Sample TORQUE setup.

In the igor hosts file for each host in the cluster:

@define PKGS_TORQUE
@define TORQUE_SERVER_NAME "clusterserver.ecn.purdue.edu"

For the host clusterserver, the following definition will be added:

@define TORQUE_SERVER

Setting up a node list

The nodes in a cluster are defined in the file /var/torque/server_priv/nodes on the server node.  This file is maintained by igor and contains a list of node machine names and any characteristics needed.

Configuring TORQUE queues

In order for TORQUE to accept jobs and manage their resources, one or more queues must be created and configured to accept and allocate resources for the jobs.  Queue management is performed with the qmgr() command.  The minimum configuration is performed by the following commands which will create a queue named 'batch', enable it and start it:

qmgr -c 'create queue batch'
qmgr -c 'set queue batch queue_type=execution'
qmgr -c 'set queue batch enabled=true'
qmgr -c 'set queue batch started=true'

Setting a default TORQUE queue

TORQUE does not automatically assign a default queue.  Users must specify the queue to be used unless a default queue is explicitly configured.  The following command will configure TORQUE to use the 'batch' queue that was just created as the default queue.  This queue will be used if the user does not specify a queue.

qmgr -c 'set server default_queue=batch'

Restricting nodes available to TORQUE queues

A torque queue can be restricted to submit jobs to specific nodes by adding resource attributes to the node entries.  The attributes will be added to the igor file and will be appended to the end of the line describing a node.  To add the requirement to the queue entry, qmgr is used.  The following commmand will restrict jobs submitted to the 'batch' queue created above, to only submit jobs to the nodes with an attribute of 'general':

qmgr -c 'set queue batch resources_default.neednodes=general'

Jobs submitted to the batch queue will now only be executed on nodes that have the 'general' resource available.

MAUI - Cluster Scheduler

The Maui Scheduler is a policy engine which allows control over when, where, and how resources such as processors, memory, and disk are allocated to jobs.

Installing MAUI

MAUI is provided in RPM format and is automatically installed on TORQUE server hosts.  The maui-local RPM installs a set of symbolic links to the MAUI executables in /usr/local/bin.

Configuring MAUI for TORQUE queues

The default installation of MAUI should handle most installations.  The scheduler should be started by the installation process.  If it is not, the following command should be run:

/etc/init.d/maui start

MAUI allows multiple queues to be managed with different attributes.  The settings are configured in the file /var/local/maui/maui.cfg.

One possible configuration for multiple queues is to give preference to a set of users by allowing them to submit batch jobs to a queue that has a higher priority.  First, the queues need to be created and enabled, as described above in the TORQUE configuration section.  Next the MAUI QOS needs to be configured to provide prioritized scheduling.  The following additions to maui.cfg will set a high priority configuration (hi) and a lower priority configuration (low).  The new QOS configurations will then be assigned to the queues batchhigh and batchlow respectively.

QOSWEIGHT             3
QOSCFG[hi] PRIORITY=100000
QOSCFG[low] PRIORITY=1
CLASSCFG[batchhigh] QDEF=hi
CLASSCFG[batchlow] QDEF=low

The QOSWEIGHT line increases the importance of QOS over other weighting factors (the default weight is 1).  The two QOSCFG lines define the variables hi and low as priority settings.  The CLASSCFG lines assign the default QOS definitions for the queues batchhigh and batchlow.  This configuration will allow jobs submitted to the batchhigh queue to be scheduled before jobs waiting in the batchlow queue.

A queue can also be configured to allow a high priority queue job to preempt jobs already running in a low priority queue by changing the QOSCFG lines to:

QOSCFG[hi]            PRIORITY=100000 FLAGS=PREEMPTOR
QOSCFG[low] PRIORITY=100000 FLAGS=PREEMPTEE

This change will allow jobs submitted to the batchhigh queue to preempt jobs running in the batchlow queue.  The batchlow queue jobs will be returned to the queue and run as resources permit.  To avoid stranding a job in a low priority queue, the priority of the job is increased as it spends time in the queue.

After making changes to the maui.cfg file, /etc/init.d/maui restart should be run to make the changes active.

 

Last Modified: Jan 6, 2023 11:00 am US/Eastern
Created: Nov 20, 2007 5:06 pm US/Eastern by admin
JumpURL: