![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Introduction to RTC -
Rice's HP Intanium 2 Cluster 07-Aug-2008 IntroductionThe RTC is Rice's Itanium 2 cluster. It has a total of 286 Intel Itanium 2 processors (900 MHz processors on 124 dual processor
nodes and 4 quad processor nodes, 1.3MHz processors on 6 nodes). Each processor can access
up to 4GB of RAM (16GB or 32GB on the quad nodes). The system contains 65
nodes using the Myrinet
interconnect while Gigabit is available on all nodes. The system
also
has three filesystems. A 5 TB PVFS (parallel) filesystem
(/shared.scratch)
provides fast I/O to run user applications; 700 GB for user
home directories (/users) and another 2.5 TB for group-based allocation
(/projects) and 150 GB for
software (/opt/apps). A complete hardware overview is online.
RTC is running Red Hat Enterprise 4 Linux and the 2.6.9 kernel. Most installed software is in /opt/apps. See the module command for information on how to use these applications. If you need any software that is not present, please let us know. For information on the unix shell configuration program called module, PBS, compilers, MPI, and contact information, see the remainder of this document. A final note: Be careful about changing your unix shell's configuration (.profile, .cshrc, .bash, etc) until you get things working. The system and the necessary shell environment is a little different from Ada so caution should be used when trying to duplicate your Ada environment on RTC. Logging in to RTCRTC can be accessed from any machine on the Rice campus with SSH. If
you need
off-campus access, you will have to install VPN on your computer and
then login to RTC via SSH. For more information regarding off-campus
access, please visit our Off-Campus Access
FAQ
To login to RTC from a Linux or Unix machine, type:
To transfer files into RTC from a Linux or Unix machine, use scp:
For more information about using SSH, please see our SSH FAQ. Login Nodes Once you are logged in to RTC, you are logged into one of three login nodes. These nodes are intended for users to compile software, prepare data files, and submit jobs to the job queue. They are not intended for running compute jobs. Please run all compute jobs in one of the job queues described later in this document. Filesystems and Disk QuotasRTC currently enforces disk quotas for all users. There is a 4 GB quota for home directories (/users). There is a 50GB quota for the projects (/projects) allocation. There are no quotas on /shared.scratch. However, /shared.scratch is for applications that need fast I/O and is not for permanent storage. Any files on /shared.scratch that are not modified for more than two weeks will be deleted automatically! Permanent storage is in /users and /projects only. NOTE: Do not use /users and /projects for job I/O. Please see our FAQ for more details on job I/O. To see your current quota and your disk usage, run this command:
To see the quota and usage for all groups that you belong to, run this command:
For information on how to use /projects, please see our FAQ.
Customizing Your Environment with the module CommandEach user can customize their enviroment using the module command. This command
lets you select software and will source the appropriate paths and
libraries. All the requested user applications are located under the /opt/apps directory.
To list what applications are available, type:
To load the module for the Intel compiler, use:
For assistance with module, type man module. For more information on using module with a PBS batch script, please see our FAQ. Job Scheduling
|
| Queue
Name |
# of
CPUs |
Minimum Walltime |
Maximum
Walltime |
| short |
<=262 |
N/A |
24:00:00 |
| long |
<=196 |
24:00:01 |
48:00:00 |
| verylong |
<=134 |
48:00:01 |
168:00:00 |
| super |
<=38 |
N/A |
336:00:00 |
| interactive |
<=8 |
N/A |
01:00:00 |
| dedicated |
Requires Approval |
N/A |
Requires Approval |
Interactive is a higher priority queue with the purpose of serving debugging sessions and interactive jobs. The maximum number of CPUs that can be accessed through this queue is 4 with a maximum job walltime of 60 minutes. This queue is only available from 8:00 a.m. to 8:00 p.m.
Super is a special queue for accessing quad-processor and large memory nodes. There are nine quad-processor nodes and one dual-processor node in this queue. Clock speeds range from 900MHz to 1.3GHz. RAM ranges from 16GB to 32GB. There are a total of 38 processors in this queue. More details about the super queue can be found in our FAQ.
NOTE: Do not run CPU intensive processes on RTC's login nodes. Use one of the queues listed above. Any CPU intensive process running on RTC's login nodes is subject to termination without notice.
There
may be other queues present on the system. These are normally
dedicated to special projects/allocations.
A
good way to obtain the status of all queues and their current usage
is to run the following PBS command:
|
Once you have an
executable, you need to create a job script containing
the
following PBS options:
Table 1. PBS Submission Options
Option |
Description |
#PBS -N jobname |
Assigns a job name. The default is the name of PBS job script. |
#PBS -l nodes=2:ppn=2:myrinet |
The number of nodes,
processors
per node, and MPI Myrinet network (only
for parallel jobs). Do
not specify ppn greater than 2 (or 4 for the quad processor nodes) or
the job will not run. |
#PBS -l nodes=1:ppn=1 |
Using both of these options will give your job exclusive access to a node such that no other jobs can share the node. This combination of arguments will assign one processor to your job and will give it exclusive access to all of the resources (i.e. memory) of the entire node without interference from other jobs. Please see our FAQ for more details on exclusive access. |
#PBS -l walltime=01:00:00 |
The maximum wall-clock time
needed for this job to run. |
| #PBS -l pmem=1000m | The maximum amount of physical memory used by any single process of the job (in megabytes). See our FAQ for more details. |
| #PBS -q queuename |
Specify the name of the queue to use. Only required for the interactive and super queues. Specifying a queue name will actually prevent the job from running, except for interactive and super queues. |
#PBS -o mypath |
The full path for the standard output (stdout) .OU files. |
#PBS -e mypath |
The full path for the standard error (stderr) .ER files. |
#PBS -j oe |
Join option that merges the standard error stream with the standard output stream of the job. |
#PBS -V |
Exports all environment variables to the job. |
| #PBS -M username@rice.edu | Email address for job status messages. |
| #PBS -m bae | PBS will notify the user via email when the job begins, aborts or terminates. |
#PBS -m n |
Turn off all email from the job. |
The job launcher's purpose is to spawn copies of your executable across the resources allocated to your job. We currently recommend and support mpiexec for this task. It is a cleaner, safer and faster alternative to mpirun. By default mpiexec only needs your executable, the rest of the information will be extracted from PBS.
Examples:
Run
“myprogram” as a
parallel mpi code on each of the processors allocated by PBS using
Myrinet:
##include the myrinet option on the -l line in your PBS batch script
#PBS -l nodes=2:ppn=2:myrinet
mpiexec -comm mpich-gm ./myprogram |
Run “myprogram” using
Ethernet:
mpiexec -comm mpich-p4 ./myprogram |
For more information on using mpiexec to launch your job with Ethernet or Myrinet, please see our FAQ.
We still provide mpirun if
your application must use it because it doesn't support anything
else. Note that rsh is the default communication protocol for mpirun. However, RTC requires ssh for the communication protocol. The following example is the job presented above launched using mpirun with ssh configured as the default protocol :
|
A job script may
consist of
PBS directives, comments and executable statements. A PBS directive
provides a way of specifying job attributes in addition to the
command line options. For example, we could create a myjob.pbs script
this way:
|
If you need to debug
your
program and want to run in interactive mode, the same request could
be constructed like this:
|
|
Table 2. Maui commands
Command |
Description |
showq |
Show a detailed list of all submitted jobs. |
checkjob job.ID |
Show a detailed description of
the job given by job.ID. |
showstart job.ID |
Gives an estimate of the
expected start time of the job given by job.ID |
There are four different states that a job can be after submission: active, idle, blocked or deferred. The showq command with no arguments will list all jobs in their current state.
Active (Running): These are jobs that have been started.
Idle: These jobs are eligible to run but there's simply not enough resources to allocate to them at this time.
Blocked: These jobs aren't being considered for running, probably due to a
policy violation. Jobs will eventually get out of this state and go
into the idle queue. For
instance, a queue has reached the maximum number of active processors
assigned to it and it's blocking all jobs until resources are
released by active jobs.
Deferred: Jobs in this state normally have a batch hold which means that they
requested resources of a type or amount that do not exist on the
system. (walltime, number of nodes, etc). If your job is deferred,
please review the resource requirements on your submission script and
make sure that the destination queue can satisfy them.
It
is possible to modify job attributes after the job was submitted and
is not in the running state. The pbs command qalter supports all of
the parameters available on qsub. This example reduces
the walltime originally requested for the job:
|
A job can also be
relocated
to a different queue using the qmove command :
|
A job can be deleted
by
using the qdel command:
|
|
|
| module command |
Description |
| for p4
(ethernet) MPI |
|
| module load mpich/1.2.7-gcc3 | For gcc compiled version |
| module load mpich/1.2.7-intel9 | For Intel compiled version |
| for gm (myrinet) MPI |
|
| module load mpich-gm/1.2.7-gcc3 | For gcc compiled version |
| module load mpich-gm/1.2.7-intel9 | For Intel compiled version |
| use command |
module command |
| use intel80-mpichp4 | module load mpich-1.2.7-intel9 |
| use intel80-mpichgm | module load mpich-gm/1.2.7-intel9 |
mpicc -o foo mympifoo.c |
If you have any further questions please see our FAQ. If you still
have questions, please let us know:
http://helpdesk.rice.edu
helpdesk@rice.edu
713-348-4357
Please follow our guidelines when contacting the Help Desk for faster problem resolution.
![]() |