Information Technology
Using SAS on Ada

29-Nov-2007


Introduction

SAS is a business intelligence application for analytics, data manipulation, and reporting.  There are currently two ways to run SAS on the Ada cluster as described below.  These examples are intended to present a very simple configuration to run a minimal SAS job on Ada.  Adjust your configuration according to the complexity of your jobs.


Running SAS with the Graphical User Interface

In order to run SAS with the Graphical User Interface (GUI), follow the steps below:

1.  Login to Ada using SSH

In order to login to Ada, you must use the SSH software configured to tunnel (or redirect) X displays:
  • From Unix/Linux:


    ssh adauser@ada.rice.edu -X

  • From Windows:

    • You must use SSH and X-Win32 for Windows.  Both are available on the IT Software Distribution Page.  SSH will provide the mechanism to login to Ada, while X-Win32 will allow graphical applications running on Ada to be displayed on your desktop. For instructions on logging in to Ada using SSH and X-Win32, please see our FAQ on this subject.

2.  Configure your Ada environment to run SAS

After you have logged into Ada, use the module command to configure your Ada environment to run SAS.


adauser@adahost:~> module load sas

The module command will establish all of the environment variables needed to run the SAS program.

3.  Launch SAS

Run the sas command to launch the GUI.   This will run SAS on one of Ada's login nodes.   From this point you will be able to write and test your SAS programs using the GUI.  It is important to note that if you run a SAS program from this point, you will be running on the login node, not on a compute node.  This is acceptable only for debugging of short programs (< 30 minutes in duration).  Programs using a lot of CPU time on the login nodes will impact all users on the system.  Therefore, any programs found running for more than 30 minutes on the login nodes are subject to being killed at the system administrators descretion!


Running SAS on Compute Nodes

It is preferable to run SAS jobs on the compute nodes of Ada, not the login nodes.  This is accomplished by running your SAS program from inside a PBS batch script as follows:

1.  Login to Ada using SSH

Login to Ada using SSH.  X11 tunneling as described in the previous section is not required.

2.  Configure your Ada environment to run SAS

After you have logged into Ada, use the module command to configure your Ada environment to run SAS.


adauser@adahost:~> module load sas

The module command will establish all of the environment variables needed to run the SAS program.

3.  Write a SAS Program

Write your SAS program and save it in a file.  For our example we will use a sample SAS program called print.sas that is provided with the SAS distribution and can be found at /usr/local/sas/sas-9.1.3/samples/base on Ada.   Save your program in a directory, such as /home/adauser/sas.   You may want to create your files using the nano, vi or emacs text editors or use the SAS GUI.

4.  Write a PBS Batch Script

Include your SAS program (print.sas in our example) in a PBS batch script named sas.job, for example, as follows:


#PBS -q compute
#PBS -l walltime=4:00:00
#PBS -N sas
#PBS -V
#PBS -M emailaddress@rice.edu
#PBS -m abe
#PBS -o /home/adauser/sas
#PBS -e /home/adauser/sas
echo "I ran on: "
cat $PBS_NODEFILE
cd /home/adauser/sas
sas print.sas

NOTE:  This example will submit the print.sas program to a single compute node on the cluster.  Any error messages from this program will be in a file called print.log.  Output from the program will be found in a file called print.lst.  This is the default behavior of SAS for batch jobs.  Output and errors from the PBS batch script will be found in a file named JobID.OU and JobID.ER, respectively, where JobID is the PBS Job ID number assigned to your job.   There will be a different number for each job you submit.

5.  Submit the PBS Job

Once you have written the PBS batch script sas.job above, you must submit the job to the scheduler as follows:


adauser@adahost:~> cd /home/adauser/sas

adauser@adahost:~> qsub ./sas.job

 

NOTE:  This will submit a single SAS job to a single processor on Ada.  Use the showq command to determine if your job is running or queued.


Writing SAS Work Files/Reassigning the Work Library

By default SAS will write scratch files (temporary work files) to the /var/tmp partition on each compute node.   This partition is small and is shared by all jobs on the node.  There is considerable risk that this partition will be overloaded and crash all of the jobs on the node.  SAS may also report the error "FATAL: Unable to initialize work library".  To redirect these scratch files, use the -work option on the SAS command line.  The call to SAS in the PBS batch script presented above can be written as follows:


sas -work $TMPDIR print.sas

This will redirect all of the SAS scratch files to a temporary directory on /scratch that is created automatically by PBS and is unique for each job.  This directory is automatically defined by PBS using the $TMPDIR environment variable so that you do not need to know the real directory path that is being used.  NOTE:  The /scratch partition is also shared by all jobs on the compute node though it is much larger than /var/tmp.  Please try to restrict your scratch files to about 10GB per processor per node to ensure that the partition will not fill up.


More Information (SAS User's Guide)

The examples above are intended to be a very minimal example of how to get a single SAS compute job to run on a compute node.  There are a variety of ways to run multiple jobs.  One way to run multiple jobs is to repeat the process above for each job that you need to run.  Thus, you would submit your jobs serially, one per PBS batch script.  You can also write Unix/Linux shell scripts to submit multiple jobs from within the shell script utilizing only one batch script. This topic is beyond the scope of this document. SAS also supports the concept of parallel programming which is also beyond the scope of this document.

For more information on PBS batch scripts, monitoring your jobs, and general information for using Ada, please see our FAQ.

To get more help with SAS, use the SAS User's Guide available on the SAS web site.



Getting Help

If you need assistance running SAS on Ada, please contact the Help Desk at 713-348-4357.

 

IT
Division of Information Technology
MS-119, P.O. Box 1892, Rice University, Houston, Texas 77251-1892
713-348-HELP(4357)