![]() |
|||
|
Guidelines for disk storage on RTC
13-Mar-2008 Introduction There are several disk storage options on RTC for storing job
output. The correct location for job output depends on the size
of the output and the job performance characteristics. The
temporary storage that is available for job output is /scratch which is
local storage on each node, and /shared.scratch which is a high performance
filesystem shared by all nodes. The remainder of this document
describes the appropriate ways to use the temporary storage areas, where to store job output permanently, and how to perform I/O redirection when running a job with mpiexec. PBS job output (.OU) and error (.ER) filesIt is important to note that PBS writes the standard output (stdout) and standard error (stderr) of your jobs to files with .OU and .ER extensions, respectively. If you have your job writing any or all of its output to stdout, it will automatically be written to a .OU file in your working directory. However, while the job is running these files are stored locally on each node at /var/spool/PBS. The .OU and .ER files are not moved to your home directory until the job exits. Excessively large .OU files can fill up /var/spool/PBS and cause your job to crash. Furthermore, this directory location is shared among all processors on each node. If this directory becomes full, all of the jobs on that node will crash. Therefore, it is important that you redirect your stdout to an output file somewhere on /shared.scratch or /scratch if your stdout file is going to be larger than approximately 50MB. NOTE: It is not enough to simply specify the -o argument in your PBS batch script. This argument will simply specify the final location for your .OU file after your job exists. It will still be created in /var/spool/PBS while the job is running. Instead, you should use Linux I/O redirection to avoid having the .OU file created in /var/spool/PBS, such as:
If you are running your job with mpiexec and you need to perform I/O redirection, the proper format of the command will look like this:
Use the man mpiexec command to view an online manual page for more information on mpiexec I/O redirection options. See our tutorials for more information on Linux I/O redirection operators.
Using /shared.scratchAll nodes on RTC have access to a 5 TB /shared.scratch storage space. This storage is available to all users and is tuned for high throughput I/O. This storage area is visible on all compute nodes and the login nodes. To use /shared.scratch, simply create a directory under /shared.scratch, copy your input files to this location, and redirect your output files here as well. When your job is finished, copy the final results to your permanent storage space on /users or /projects. /shared.scratch is not for permanent storage. Data files not accessed in two weeks will be deleted automatically by the system. Recovery of these files is not possible.
Using /scratchEach node on RTC has between 25 GB and 65 GB storage space available to all
users on /scratch. It is most appropriate to use this storage space for the
output
of your running job when your output will
not exceed 10GB per job, and
the I/O rate of your job is infrequent and the block size per
read/write request is small (less than 1MB block size). Jobs of this nature
will
perform better writing to /scratch than to /shared.scratch since /shared.scratch is
tuned for large data sets with frequenet read/write requests of large
amounts of data. If your output exceeds 10GB, then using /shared.scratch
is the only remaining option. Notice to Gaussian users: please use /shared.scratch for Gaussian scratch
files. Do not use /scratch. NOTE: /scratch is shared among all processors on each node. Therefore, it might be used by 2-4 different jobs at once. If this partition becomes full, it is likely that all of the jobs on that node will crash. It is important not to exceed the 10GB recommendation to prevent jobs from crashing. Any data left on /scratch when a job exits will be deleted without notice and can not be recovered. To use /scratch, copy your datasets from your source directory to the desired scratch directory at the start of your job, and again, in the reverse direction, at the end of your job. Here are examples of how to copy your data at the beginning and the end of each job:
NOTE: It is necessary to call mpiexec three times in this
example. The first call is to copy your data from your dataset to /scratch. The second call will run your program. When your program terminates, it is
necessary to go back to the nodes to retrieve your data. This is
accomplished by the third call of mpiexec.
The
data will be deleted automatically when the PBS job terminates.
It is very important to note that in order for the third call to mpiexec to work, the second call
must be completely finished. If your job runs out of walltime
before the third mpiexec call is finished, your PBS job will exit and the data stored on /scratch on all nodes will be deleted automatically. Using /users and /projectsThe /users and
/projects directories are intended for permanent data storage, not job
I/O. These filesystems are NFS filesystems and are not designed
to handle high performance applications. Using them for job I/O
might result in severely degraded performance across the entire
cluster, especially if the I/O of your job is heavy. Use of these
filesystems for I/O should only be done at the direction of the system
administrators.
Questions |
|||
|