![]() |
||||||
|
Using SAS on Ada
29-Nov-2007 Introduction SAS
is a business intelligence
application for analytics, data manipulation, and reporting.
There are currently two ways to run SAS on the Ada cluster as described
below. These examples are intended to present a very simple
configuration to run a minimal SAS job on Ada. Adjust your
configuration according to the complexity of your jobs. Running SAS with the Graphical User Interface In order to run SAS with the Graphical User Interface (GUI), follow the
steps below: 1. Login to Ada using SSHIn order to login to Ada, you must use the SSH software configured to
tunnel (or redirect) X displays:
2. Configure your Ada environment to run SASAfter you have logged into Ada, use the module command to configure your Ada environment to run SAS.
The module command will establish all of the environment variables needed to run the SAS program. 3. Launch SASRun the sas command to launch
the GUI. This will run SAS on one of Ada's login
nodes. From this point you will be able to write and test
your SAS programs using the GUI. It is important to note that if
you run a SAS program from this point, you will be running on the login
node, not on a compute node. This is acceptable only for
debugging of short programs
(< 30 minutes in duration). Programs using a lot of CPU time
on the login nodes will impact all users on the system.
Therefore, any programs found running for more than 30 minutes on the
login nodes are subject to being killed at the system administrators
descretion!
Running SAS on Compute NodesIt is preferable to run SAS jobs on the compute nodes of Ada, not the
login nodes. This is accomplished by running your SAS program
from inside a PBS batch script as follows:
1. Login to Ada using SSHLogin to Ada using SSH. X11
tunneling as described in the previous section is not required.
2. Configure your Ada environment to run SASAfter you have logged into Ada, use the module command to configure your Ada environment to run SAS.
The module command will establish all of the environment variables needed to run the SAS program. 3. Write a SAS ProgramWrite your SAS program and save it in a file. For our example we
will use a sample SAS program called print.sas that is provided with the SAS distribution and can be found at /usr/local/sas/sas-9.1.3/samples/base on Ada. Save your program in a directory, such
as /home/adauser/sas.
You may want to create your files using the nano, vi or emacs text editors or use the SAS
GUI.
4. Write a PBS Batch ScriptInclude your SAS program (print.sas in our example) in a PBS batch script named sas.job, for example, as follows:
NOTE:
This example will submit the print.sas program to a single compute node on the cluster. Any error
messages from this program will be in a file called print.log.
Output from the program will be found in a file called print.lst.
This is the default behavior of SAS for batch jobs. Output and
errors from the PBS batch script will be found in a file named JobID.OU and JobID.ER, respectively,
where JobID is the PBS Job ID
number assigned to your job. There will be a different
number for each job you submit. 5. Submit the PBS JobOnce you have written the PBS batch script sas.job above, you must submit the job to the scheduler as follows:
NOTE: This will
submit a single SAS job to a single processor on Ada. Use the showq command to determine if
your job is running or queued. Writing SAS Work Files/Reassigning the Work LibraryBy default SAS will write scratch files (temporary work files) to the /var/tmp partition on each compute node. This partition is small and is shared by all jobs on the node. There is considerable risk that this partition will be overloaded and crash all of the jobs on the node. SAS may also report the error "FATAL: Unable to initialize work library". To redirect these scratch files, use the -work option on the SAS command line. The call to SAS in the PBS batch script presented above can be written as follows:
This will redirect all of the SAS scratch files to a temporary directory on /scratch that is created automatically by PBS and is unique for each job. This directory is automatically defined by PBS using the $TMPDIR environment variable so that you do not need to know the real directory path that is being used. NOTE: The /scratch partition is also shared by all jobs on the compute node though it is much larger than /var/tmp. Please try to restrict your scratch files to about 10GB per processor per node to ensure that the partition will not fill up. More Information (SAS User's Guide)The examples above are intended to be a very minimal example of how to
get a single SAS compute job to run on a compute node. There are
a variety of ways to run multiple jobs. One way to run multiple
jobs is to repeat the process above for each job that you need to
run. Thus, you would submit your jobs serially, one per PBS batch
script. You can also write Unix/Linux shell scripts to submit multiple jobs from within the shell script utilizing only one batch script. This topic is beyond the scope of this document. SAS also supports the concept of parallel programming which is
also beyond the scope of this document. To get more help with SAS, use the SAS User's Guide available on the SAS web site.
Getting HelpIf you need assistance running SAS on Ada, please contact the Help Desk at 713-348-4357.
|
||||||
|