![]() |
||||||||||||||||||||||||||||
|
RTC Updates Summary for August, 2007
15-May-2008 Introduction The RTC was relocated to Rice's new Primary Data Center between September 17 and 29. During this time, a few changes and updates were applied to the system and are summarized below. Addition of new, faster large memory and quad-processor nodes The RTC has four new nodes added. Three of them are quad-processor nodes and one is a dual-processor node. One of the new quad-processor nodes has 32GB of RAM. The remaining two quad-processor nodes have 16GB of RAM. The new dual-processor node also has 16GB of RAM. All four of the new nodes have a 1.3GHz clock speed. All of the new hardware is accessed by submitting jobs to the super queue. If you want to specifically access the node with 32GB of RAM, you need to request the node with the "largemem" feature using a PBS option in your batch script similar to the following:
To summarize the super queue, it now has seven quad-processor nodes and one dual-processor node for a total of 30 processors. Four of the quad-processor nodes have 16GB of RAM and a 900MHz clock speed. Two of the quad-proessor nodes have 16GB of RAM and a 1.3GHz clock speed. One of the quad-processor nodes has 32GB of RAM and a 1.3GHz clock speed. Lastly, the dual-processor node has 16GB of RAM and a 1.3GHz clock speed. Recovery of three Myrinet nodes Faulty Myrinet cards have been replaced on three nodes thus adding 3 nodes to the Myrinet-capable node pool. Increase bandwidth for /shared.scratch The hardware connectivity for the Parallel Virtual File System (PVFS), /shared.scratch, has been upgraded to provide increased bandwidth capacity. New network addresses cause SSH warning messages The RTC was assigned new network addresses (IP addresses). This will result in SSH error messages warning you that the RTC IP addresses have changed. Please follow the instructions provided by your SSH client to remove the old RTC IP addresses from your list of known hosts. This will remove the error for future logins. On a Linux/Unix system, for example, this will involve removing the RTC entries from the $HOME/.ssh/known_hosts file. New node allocation policy Prior to the relocation of RTC, the job scheduler would automatically grant exclusive access to a node for your job regardless of how many processors you asked for. For example, if you requested one processor you would have been granted one processor but no one else would have been allowed to use the second processor. This policy has now changed. If you request only one processor, it is possible that someone else's job might be assigned to the second processor. It is important to keep this in mind when submitting jobs because your job might be sharing node resources (i.e. disk, memory, and so on) with another job and possibly decreasing your performance. If you need exclusive access to a node, please see our FAQ on this topic. New queue policy A new queue policy has been put into place. It is summarized as follows:
Upgrade of Torque and Maui The job manager (Torque) and the job scheduler (Maui) have been upgraded. This will allow new features to be enabled in the future. Please report any problems you experience with job scheduling to the Help Desk. Upgrade of Operating System and Kernel The operating system has been upgraded to Red Hat Enterprise Linux 4 release 5. The kernel has also been upgraded to version 2.6.9-55. No recompilation of software was required for these upgrades. Getting Help If you find anything that does not appear to be working correctly following this maintenance period, please inform the Help Desk.
|
||||||||||||||||||||||||||||
|