Information Technology
Introduction to Ada - Rice's Cray XD1 Cluster

07-Aug-2008


Introduction

Ada is Rice's newest and largest computing cluster.  It is a 632 AMD64 CPU core machine with dual core 2.2 GHz AMD Opteron 275 CPUs and with 1 MB L2 cache.  Each core has 2 GB of memory.  Each node has two CPUs or four cores on it, with a total of 8 GB of RAM.  All 8 GB are visible on each core, although 4GB is local to each dual core CPU and therefore slightly faster to access.  The system also has three filesystems.  A 5 TB Lustre filesystem (/lustre) provides fast I/O to run user applications; 5 TB for home directories (/home) and another 5 TB for group-based allocation (/projects).  The interconnect is Cray proprietary "RapidArray" which is based on Infiniband.

Ada is running SuSE 9.0 Linux and the 2.6.5 kernel, with small changes made by Cray, particularly for their RapidArray interconnect.

Most installed software is in /opt/apps.  See the module command for information on how to use these applications.  If you need any software that is not present, please let us know.

For information on the unix shell configuration program called module, PBS, compilers, MPI, and contact information, see the remainder of this document.

A final note.  Be careful about changing your unix shell's configuration (.profile, .cshrc, .bash, etc) until you get things working.  The system and the necessary shell environment is a little different from the RTC.


Logging in to Ada

Ada can be accessed from any machine on the Rice campus with SSH. If you need off-campus access, you will have to install VPN on your computer and then login to Ada via SSH. For more information regarding off-campus access, please visit our Off-Campus Access FAQ

To login to ada from a Linux or Unix machine, type:


ssh -Y (your_login_name)@ada.rice.edu

To transfer files into Ada from a Linux or Unix machine, use scp:

 
scp some_file.dat *.incl *.txt (your_login_name)@ada.rice.edu:

For more information about using SSH, please see our SSH FAQ.

Login Nodes

Once you are logged in to Ada, you are logged into one of four login nodes. These nodes are intended for users to compile software, prepare data files, and submit jobs to the job queue. They are not intended for running compute jobs. Please run all compute jobs in one of the job queues described later in this document.


Filesystems, Quotas and Job Output

Ada currently enforces disk quotas for all users.  There is a 10 GB quota for home directories (/home, also called /users) and a 50 GB quota per group (/projects).  There are no quotas on /lustre.  However, /lustre is for applications that need fast I/O and is not for permanent storage.  Any files on /lustre that are not modified for more than two weeks will be deleted automatically!  The /home (also called /users) and /projects filesystems are intended for permanent storage only. They are not intended for job I/O. For more details about filesystems for job I/O, please see our FAQ.

NOTE: Do not use /users and /projects for job I/O. Please see our FAQ for more details on job I/O.

To see your current quota and your disk usage, run this command:

 
quota -s

To see the quota and usage for all groups that you belong to, run this command:

 
quota -s -g

For information on how to use /projects, please see our FAQ.

Customizing Your Environment with the module Command

Each user can customize their enviroment using the module command.  This command lets you select software and will source the appropriate paths and libraries. All the requested user applications are located under the /opt/apps/ directory.

To list what applications are available, type:

 
adauser@ada751-6:~> module avail

----------------------------------------- /usr/local/modules/versions ---------------------------------
3.1.6

------------------------------------- /usr/local/modules/3.1.6/modulefiles ----------------------------
dot module-cvs module-info modules null use.own

------------------------------------------ /usr/local/modulefiles -------------------------------------
MetaOCaml/309_alpha_030 apprentice2/2.5.0(default) grace/5.1.19 papi/3.1.0(default)
R/2.2.1 apptools jdk/1.5.0_06 pgi/6.0.2(default)
acml/2.7.0 atlas/3.7.11 libsci/1.3/pgi-6.0.1 pgi/6.1.2
acml-gnu/2.7.0 craypat/1.2.0(default) libsci/1.3/pgi32-6.0.1 sac/100.0
acml-mp/2.7.0 dwarf/4.3.0(default) matlab/2006a trilinos/6.0.15
acml32/2.7.0 eclipse/3.1.2 mpich/mpich-1.2.6-gnu vmd/1.8.3
acml32-gnu/2.7.0 elf/0.8.5(default) mpich/mpich-pgi602(default) xd-tools
acml32-mp/2.7.0 gaussian/g03-d1 mpich/mpich-pgi612
afni/AFNI_2005_11_18_2040 gcc/4.1.0 ocaml/3.09.1
adauser@ada751-6:~>



To load the module for PGI v6.1.2, type:

 
module load pgi/6.1.2

For assistance with module, type man module

For more information on using module with a PBS batch script, please see our FAQ.


Job Scheduling

The batch job scheduling system implemented on Ada consists of two packages: Torque and Moab.  Torque is in charge of resource management and monitoring while the Moab scheduler decides when and where jobs should run. Torque is an enhanced, commercial version of OpenPBS and implements all of the usual PBS commands as described later in this document.

Fairshare Scheduling Policy

We implement the Moab fairshare feature to provide a fair utilization of the available resources.  This is accomplished by allowing historical resource utilization information to be incorporated into job feasibility and priority decisions. This is normally the most significant component of a job's priority, which ultimately defines the position of the job on a queue. We do not use a FIFO (First-In-First-Out) scheduler on Ada.

Backfill Scheduling Policy

This is a scheduling optimization which allows Moab to make better use of available resources by running jobs out of order. Using job data such as walltime and resources requested, the scheduler can start other, lower-priority jobs so long as they do not delay the highest priority jobs.  Because of the way it works, essentially filling in holes in node space, backfill tends to favor smaller and shorter running jobs more than larger and longer running ones.

NOTE:  It is important to specify an accurate walltime for your job in your PBS submission script.  Selecting the default of 4 hours for jobs that are known to run for less time may result in the job being delayed by the scheduler due to an overestimation of the time the job needs to run.

Available Queues and System Load

We currently provide two queues for general accessibility, compute and interactive:

Compute is a standard priority queue that can allocate all of the available resources (maximum of 544 processors) and has a maximum job walltime of 4 hours.

Interactive is a higher priority queue with the purpose of serving debugging sessions and interactive jobs.  The maximum number of CPUs that can be accessed through this queue is 16 with a maximum job walltime of 30 minutes.  This queue is available 24 hours per day. To use this queue, you must use the -I option on the qsub command line (see Batch Scheduling with PBS for qsub options). This will give you an interactive command line prompt on a compute node.

NOTE: Do not run CPU intensive processes on Ada's login nodes. Use one of the queues listed above. Any CPU intensive process running on Ada's login nodes is subject to termination without notice.

There may be other queues present on the system.  These are normally dedicated to special projects/allocations.

A good way to obtain the status of all queues and their current usage is to run the following PBS command:

 
adauser@adahost:~> qstat -q

server: ada759-6

Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- ----- ----- ---- -----
compute -- -- 04:00:00 -- 4 0 -- E R
interactive -- -- 00:30:00 -- 0 0 -- E R
----- -----
4 0

Here is a brief description of the relevant fields:

Walltime:  Maximum walltime a job can request
Run:  Number of jobs in running state
Que:  Number of jobs in queued state
State:  The queue is enabled “E” and running (started) "R"

Determining Why a Job is not Running

There may be several reasons why a job is not running and appears to be stuck in the queue.  Please see our PBS Job Scheduling FAQ for more information.

Batch Processing with PBS

Once you have an executable, you need to create a job script containing the following PBS options:

  • Request the resources that will be needed (i.e. number of processors, wall-clock time, etc.), and
  • Use commands to prepare for execution of the executable (i.e. cd to working directory, source shell environment files, etc).

See Table 1 below for PBS submission options.

Table 1. PBS Submission Options

Option

Description

#PBS -N jobname

Assigns a job name. The default is the name of PBS job script.

#PBS -l nodes=2:ppn=2

The number of nodes and processors per node. Do not specify ppn greater than 4 or the job will not run.  Selecting less than 4 processors per node means that other jobs might share the node.

Example: nodes=10:ppn=4 or nodes=20:ppn=2 would both result in 40 processors for the job. The latter option might result in other users sharing the node.

#PBS -l nodes=1:ppn=1
#PBS -W x=NACCESSPOLICY:SINGLEJOB

Use both of these options to give your job exclusive access to a node such that no other jobs can share the node.  This combination of arguments will assign one processor to your job and will give it exclusive access to all of the resources (i.e. memory) of the entire node without interference from other jobs.

Please see our FAQ for more details on requesting exclusive access.

#PBS -l walltime=01:00:00

The maximum wall-clock time needed for this job to run.

#PBS -l pmem=1000m The maximum amount of physical memory used by any single process of the job (in megabytes). See our FAQ for more details.
#PBS -q queuename
Specify the name of the queue to use.

#PBS -o mypath

The full path for the standard output (stdout) .OU file.

#PBS -e mypath

The full path for the standard error (stderr) .ER file.

#PBS -j oe

Join option that merges the standard error stream with the standard output stream of the job.

#PBS -I Interactive jobs. This will give you an interactive prompt on a compute node. Primarily used for the Interactive queue.

#PBS -V

Exports all environment variables to the job.

#PBS -M username@rice.edu Email address for job status messages.
#PBS -m bae PBS will notify the user via email when the job begins, aborts or terminates.

#PBS -m n

Turn off all email notification from the job.

Job Launchers (mpiexec, mpirun)

The job launcher's purpose is to spawn copies of your executable across the resources allocated to your job. We currently recommend and support mpiexec for this task. It is a cleaner, safer and faster alternative to mpirun. By default mpiexec only needs your executable, the rest of the information will be extracted from PBS.

Cray also provides a special application launcher that works in conjunction with mpiexec. The xd1launcher ensures that your application takes advantage of XD1 software features such as LSS (Linux Synchronized Scheduler) and CPU affinity. This is an easy way to increase the performance of your application on the XD1 without much effort.

Examples:

Run “myprogram” as a parallel MPI code on each of the processors allocated by PBS:

 
mpiexec $XD1LAUNCHER mypgrogram

Run “myprogram” on only 8 processors:

 

mpiexec -n 8 $XD1LAUNCHER myprogram

We still provide mpirun if your application must use it because it doesn't support anything else.  Note that rsh is the default communication protocol for mpirun. However, Ada requires ssh for the communication protocol. The following example is the job presented above launched using mpirun with ssh configured as the default protocol :

 

 
export AM_RSH_CMD=/usr/bin/ssh
mpirun -np 8 -hostfile $PBS_NODEFILE $XD1LAUNCHER myprogram

Make sure you configured passwordless ssh in your account prior running mpirun or communication between the nodes assigned to your job will fail.

Job Scripts

A job script may consist of PBS directives, comments and executable statements. A PBS directive provides a way of specifying job attributes in addition to the command line options. For example, we could create a myjob.pbs script this way:

 

 
#PBS -N JOBNAME
#PBS -q compute
#PBS -l nodes=2:ppn=2,pmem=2000m,walltime=00:30:00 #PBS -M username@rice.edu
#PBS -m abe #PBS -V

echo "My job ran on: "
cat $PBS_NODEFILE
cd $PBS_O_WORKDIR
mpiexec $XD1LAUNCHER ./myprogram

If you need to debug your program and want to run in interactive mode, the same request could be constructed like this:

 

 
qsub -I -N JOBNAME -q interactive -l nodes=2:ppn=2,pmem=2000m,walltime=00:30:00

NOTE:  It is important to specify an accurate walltime for your job in your PBS submission script.  Selecting the default of 4 hours for jobs that are known to run for less time may result in the job being delayed by the scheduler due to an overestimation of the time the job needs to run.

Submitting and Monitoring Jobs

Once your job script is ready, use qsub to submit it:

 
qsub ./myjobs.pbs

This will return a jobid while the output and error stream of the job will be saved to two files inside the directory where the job was submitted. 

The status of the job can be obtained using Moab commands.  See Table 2 for a list of Moab commands.

Table 2. Moab commands

Command

Description

showq

Show a detailed list of all submitted jobs.

checkjob job.ID

Show a detailed descriptoin of the job given by job.ID .

showstart job.ID

Gives an estimate of the expected start time of the job given by job.ID.

There are four different states that a job can be after submission: active, idle, blocked or deferred. The showq command with no arguments will list all jobs in their current state.

Active: These are jobs that have been started.

Idle: These jobs are eligible to run but there's simply not enough resources to allocate to them at this time.

Blocked: These jobs are not being considered for running, probably due to a policy violation. Jobs will eventually get out of this state and go into the idle queue. For instance, a queue has reached the maximum number of active processes assigned to it and it's blocking all jobs until resources are released by active jobs.

Deferred: Jobs in this state normally have a batch hold which means that they requested resources of a type or amount that do not exist on the system. (walltime, number of nodes, etc). If your job is deferred, please review the resource requirements on your submission script and make sure that the destination queue can satisfy them.

Modifying and Deleting Jobs

It is possible to modify job attributes after the job was submitted and is not in the running state. The pbs command qalter supports all of the parameters available on qsub.  This example reduces the walltime originally requested for the job:

 

 
qalter -l walltime=00:03:00 <jobid>

A job can also be relocated to a different queue using the qmove command :

 
qmove <interactive-queue> <jobid>

A job can be deleted by using the qdel command:

 
qdel <jobid>

Compilers and Programming

Several programming models are supported on Ada. Programs that are of sequential, parallel or distributed can be run. Sequential programs require one processor to run. Parallel and distributed programs utilize multiple processors concurrently. Parallel programs are a subset of distributed programs. Generally speaking, distributed computing involve parametric sweeps, task farming, etc. Message passing, threaded applications generally fit under the scope of parallel computing.

SPMD is one of the most popular method of parallelism, where a single executable works on its own data.

The supported compilers on Ada are PGI, GCC, and J2EE SDK. MPICH implementations of PGI and GCC are available and can be loaded upon demand using the module command.

Compiling Serial Code

First of all you will have to load the appropriate compiler environment. To do so you will have to type:

 
module load pgi/6.1.2


Once the environment is set, you can compile your program with one of the following:

 
pgcc -o foo foo.c

pgCC -o foo foo.cc

pgf77 -o foo foo.f77

pgf90 -o foo foo.f90

pghpf -o foo foo.hpf



Compiling Parallel Code

To compile a parallel version of your code that has MPI calls, use the appropriate mpich library. Again, use module to load the appropriate compiler environment.

To compile your code you will have use the MPICH scripts that are currently in your default path. The MPICH scripts are responsible for invoking the compiler, linking your program with the MPI library and setting the MPI include files (mpi.h and mpif.h).

Once the environment is set, you can compile your program with one of the following (assuming the PGI compiler as above):


 
mpicc -o foo mympifoo.c

mpiCC -o foo foo.cc

mpif77 -o foo foo.f77

mpif90 -o foo foo.f90



Getting Help

If you have any further questions please see our FAQ.  If you still have questions, please let us know:

    http://helpdesk.rice.edu
    helpdesk@rice.edu
    713-348-4357

Please follow our guidelines when contacting the Help Desk for faster problem resolution.


IT
Division of Information Technology
MS-119, P.O. Box 1892, Rice University, Houston, Texas 77251-1892
713-348-HELP(4357)