Information Technology
File Transfer Method for Large Data Sets

03-Jul-2007


Introduction

People who have accounts on the SSH Gateway system to access our shared computing clusters from off-campus are also able to use this service to transfer files to the clusters. This is a convenient solution if the data sets are small. If they are large, however, then disk space on the SSH Gateway and the need to copy the data twice (once to the SSH Gateway and then again to the cluster) can become a problem. This document will describe an advanced method for file transfers that will eliminate this problem. Note: This method will only work from Unix/Linux desktops. This method is not available on Windows systems.


Create a FIFO

The advanced method for file transfers will include the use of a FIFO. A FIFO is a special file type that permits independent processes
to communicate. One process opens the FIFO file for writing, and another for reading, after which data can flow as with the usual
anonymous pipe in shells or elsewhere. Use man mkfifo for more details. In order to set up and use a FIFO, follow these steps:

1. Login to the SSH Gateway

Login to the SSH Gateway with SSH as you normally would.

2. Create and open the FIFO

Create and open the FIFO on the SSH Gateway by running these commands:


# change to your home directory
cd                        

# create a ~/fifo directory tree
mkdir fifo


# change to the ~/fifo directory
cd fifo


# make a FIFO named "afifo"
mkfifo afifo

# redirect the output of afifo to an ssh connection
# to ada.rice.edu or rtc.rice.edu.
# This ssh connection will remotely run "tar -xvf -" on Ada or RTC.
# Substitute ada.rice.edu or rtc.rice.edu for host.rice.edu
cat afifo | ssh username@host.rice.edu "tar -xvf -"
# You will be prompted for your cluster password.

The last step above will block and wait until data is written to the FIFO. When data is written to the FIFO, it will be copied via SSH to the cluster and untarred.

3. Write data to the FIFO

On your Unix/Linux desktop, tar your file and pipe to ssh connected to the FIFO on the SSH Gateway with the following command:


tar cvf - filename | ssh username@gw.rcsg.rice.edu "cat > fifo/afifo"
# You will be prompted for your gw.rcsg.rice.edu password.

This command will pipe the output of the tar command to an SSH connection which will in turn write the data onto the FIFO. This will result in the file being tarred from your desktop to the FIFO on the SSH Gateway, which in turn writes it to the cluster via the FIFO where it is
untarred. Thus, data is never written to disk on the SSH Gateway.

IT
Division of Information Technology
MS-119, P.O. Box 1892, Rice University, Houston, Texas 77251-1892
713-348-HELP(4357)