Information Technology
RTC Updates Summary for August, 2007

15-May-2008


Introduction

The RTC was relocated to Rice's new Primary Data Center between September 17 and 29. During this time, a few changes and updates were applied to the system and are summarized below.

Addition of new, faster large memory and quad-processor nodes

The RTC has four new nodes added. Three of them are quad-processor nodes and one is a dual-processor node. One of the new quad-processor nodes has 32GB of RAM. The remaining two quad-processor nodes have 16GB of RAM. The new dual-processor node also has 16GB of RAM. All four of the new nodes have a 1.3GHz clock speed. All of the new hardware is accessed by submitting jobs to the super queue. If you want to specifically access the node with 32GB of RAM, you need to request the node with the "largemem" feature using a PBS option in your batch script similar to the following:

#PBS -l nodes=1:ppn=4:largemem,walltime=04:00:00

To summarize the super queue, it now has seven quad-processor nodes and one dual-processor node for a total of 30 processors. Four of the quad-processor nodes have 16GB of RAM and a 900MHz clock speed. Two of the quad-proessor nodes have 16GB of RAM and a 1.3GHz clock speed. One of the quad-processor nodes has 32GB of RAM and a 1.3GHz clock speed. Lastly, the dual-processor node has 16GB of RAM and a 1.3GHz clock speed.

Recovery of three Myrinet nodes

Faulty Myrinet cards have been replaced on three nodes thus adding 3 nodes to the Myrinet-capable node pool.

Increase bandwidth for /shared.scratch

The hardware connectivity for the Parallel Virtual File System (PVFS), /shared.scratch, has been upgraded to provide increased bandwidth capacity.

New network addresses cause SSH warning messages

The RTC was assigned new network addresses (IP addresses). This will result in SSH error messages warning you that the RTC IP addresses have changed. Please follow the instructions provided by your SSH client to remove the old RTC IP addresses from your list of known hosts. This will remove the error for future logins. On a Linux/Unix system, for example, this will involve removing the RTC entries from the $HOME/.ssh/known_hosts file.

New node allocation policy

Prior to the relocation of RTC, the job scheduler would automatically grant exclusive access to a node for your job regardless of how many processors you asked for. For example, if you requested one processor you would have been granted one processor but no one else would have been allowed to use the second processor. This policy has now changed. If you request only one processor, it is possible that someone else's job might be assigned to the second processor. It is important to keep this in mind when submitting jobs because your job might be sharing node resources (i.e. disk, memory, and so on) with another job and possibly decreasing your performance. If you need exclusive access to a node, please see our FAQ on this topic.

New queue policy

A new queue policy has been put into place. It is summarized as follows:

Queue Name
# of CPUs
Minimum Walltime
Maximum Walltime
short
<=262
N/A
24:00:00
long
<=196
24:00:01
48:00:00
verylong
<=134
48:00:01
168:00:00
super
<=30
N/A
336:00:00
interactive
<=8
N/A
01:00:00
dedicated
Requires Approval
N/A
Requires Approval

Upgrade of Torque and Maui

The job manager (Torque) and the job scheduler (Maui) have been upgraded. This will allow new features to be enabled in the future. Please report any problems you experience with job scheduling to the Help Desk.

Upgrade of Operating System and Kernel

The operating system has been upgraded to Red Hat Enterprise Linux 4 release 5. The kernel has also been upgraded to version 2.6.9-55. No recompilation of software was required for these upgrades.


Getting Help

If you find anything that does not appear to be working correctly following this maintenance period, please inform the Help Desk.

 

IT
Division of Information Technology
MS-119, P.O. Box 1892, Rice University, Houston, Texas 77251-1892
713-348-HELP(4357)