Information Technology
How Do I Ensure My Job
Has Enough Memory to Run?

04-Sep-2008


Introduction

With all nodes on the cluster being shared among all users, it is possible that a single process owned by a single user might be consuming all of the memory on the node. If this happens, new jobs assigned to that node will likely not have enough memory to run. If your job is memory intensive, it would be best to request a node with enough available memory to run your job.


Determining Memory Requirements

In order to request a node with an appropriate amount of memory available, it might first be necessary to determine how much memory your job will actually need if this is not known ahead of time. To do this, it is recommended that you submit a test job and request exclusive access to a node. This means that only your job will be running on the node and will not interfere with or be disrupted by other jobs. Exclusive access will provide the best opportunity for you to measure the amount of memory that your job can consume without interference from other jobs.

There are a variety of methods that can be used to determine memory usage of a job. Two relatively straightforward ways to do this is as follows:

Enable email notification

You can enable email notification in your test job so that it will report, among other things, the memory utilization of your job. When your job exits, the email report that you will receive will look similar to the following:

PBS Job Id: 74127.sugarman.rcsg.rice.edu
Job Name: Test_Job
Exec host: compute51.local/7+compute51.local/6
Execution terminated
Exit_status=0
resources_used.cput=00:01:11
resources_used.mem=87036kb
resources_used.vmem=266028kb
resources_used.walltime=00:00:58

In this example, the resources_used.mem represents the maximum amount of physical memory used by all processes in the job. This number can be inaccurate because it is polled from the operating system at regular intervals. It does not represent the memory usage at an instant in time. Fluctuations during the polling interval will be missed. To ensure the best possible accuracy you must request exclusive access to the node when your test job runs so that the job will have the maximum amount of memory available to it.

Use qstat on a running job

It is also possible to see the above values while a job is running with the following command:


qstat -f jobID

As with the email report, fluctuations in memory utilization will be missed if it occurs during the polling interval.


Requesting A Specific Amount of Memory

Once you have determined how much memory your job requires, you should request access to a node with enough memory available to satisfy the job for all future jobs of its type. There is a PBS option that can be used to find a node with a specific amount of memory available:

  • Use the following line in your PBS batch script:

#PBS -l nodes=4:ppn=1,pmem=2000m

This combination of options will give the user four nodes, only one processor (ppn) per node, and will assign the job to nodes with at least 2GB of physical memory available. The pmem option means the amount of physical memory that is needed by each process in the job. In this example the unit is megabytes, so 2GB is 2000m. This example requires 8GB total memory or 2GB per process with one process per node. If your job requires more than one process per node, then the amount of free memory available on the node must be ppn X pmem. For example, if each process needs 2GB (pmem) and you will be using two processors per node (ppn), then the node must have 4GB of free memory for the job to run.

NOTE: This memory limit is not enforced, meaning that your job can still exceed this 2GB limit if it needs to. However, the risk is that it will run out of memory. The same is true of other jobs that might be sharing a node with your job. If other jobs on the node exceed their memory limit, the performance of the node and all jobs on it will suffer.

NOTE: It is important to be accurate with your request. If you request an amount that is higher than you need, the scheduler might have a hard time finding nodes with enough memory available to run your job. Thus, you might unnecessarily delay execution of your job.

NOTE: If your job uses memory near the amount of the maximum physical memory installed on the node, performance of your job might suffer and there is a chance that the job will crash. Keep in mind that the operating system resides in physical memory along side your job. A good rule of thumb is to allow 1GB of memory for the operating system.

WARNING: If you pick a pmem value that is high, such that ppn X pmem is greater than the amount of memory on a single node, your job will never run because it needs more memory than is available on a node.

IT
Division of Information Technology
MS-119, P.O. Box 1892, Rice University, Houston, Texas 77251-1892
713-348-HELP(4357)