File Transfer using rsync¶
For replicating datasets between the HPC clusters and your workstation, or between two filesystems on an HPC cluster, rsync offers powerful functionality beyond that of cp or scp commands. With rsync you can copy directories between your workstation and the HPC clusters - or between different filesystems - in such a way that permission and file modification timestamps are preserved, and that only files which have changed are transferred.
Basic Usage of rsync
rsync [options] source [source] destination
Where source is a list of one or more source files or directories to copy and destination is a directory into which to copy source.
Commonly useful options are:
-a
“Archive” mode - permissions and timestamps of the source are replicated at the destination.-v
“Verbose”.-n
“dry run” - don’t actually do anything, just indicate what would be done.-C
“follow CVS ignore conventions”
For detailed information about rsync, type man rsync
at the command line.
Some examples of rsync usage¶
To replicate in your /scratch area a directory tree you have saved in /home
$ cd /home/$USER/
$ ls -F
my_input_data/
$ rsync -nav my_input_data /scratch/$USER/my_run_dir
building file list ... done
my_input_data/file1
my_input_data/file2
$ rsync -av my_input_data /scratch/$USER/my_run_dir
building file list ... done
my_input_data/file1
my_input_data/file2
$ ls -F /scratch/$USER/my_run_dir
my_input_data/
There is now a copy of my_input_data directory under /scratch/$USER/my_run_dir
.
If you append / to source, rsync will copy the contents of source rather than the source directory itself
$ cd /home/$USER/
$ ls -F
my_input_data/
$ rsync -nav my_input_data/ /scratch/$USER/my_run_dir
building file list ... done
file1
file2
$ rsync -av my_input_data/ /scratch/$USER/my_run_dir
building file list ... done
file1
file2
$ ls -F /scratch/$USER/my_run_dir
file1
file2
The host name followed by a colon tells rsync that source or destination is on another host. If your username on the other host is different from the username on the current host, you can specify the remote username with username@remotehost: as following:
$ ls -F
my_input_data/
$ rsync -av my_input_data <your-net-id>@jubail.abudhabi.nyu.edu:/scratch/<your-net-id>/my_run_dir
### For Example ###
$ rsync -av my_input_data wz22@jubail.abudhabi.nyu.edu:/scratch/wz22/my_run_dir
To copy in the other direction, from /scratch
on HPC to your workstation.
$ hostname
my_workstation
$ rsync -av <your-net-id>@jubail.abudhabi.nyu.edu:/scratch/<your-net-id>/my_run_dir my_results
### For Example ###
$ rsync -av wz22@jubail.abudhabi.nyu.edu:/scratch/wz22/my_run_dir my_results
$ ls my_results
Only those files with modifications will be copired from the HPC to your workstation.
Ignoring certain files¶
The -C
option tells rsync to follow CVS conventions about ignoring certain files.The conventions are described fully in the man page (man rsync) but in summary, when -C
is used the following files are ignored or any file or directory whose name matches any of these mentioned below are ignored:
RCS SCCS CVS CVS.adm RCSLOG cvslog.* tags TAGS .make.state .nse_depinfo *~ #* .#* ,* _$* *$ *.old *.bak *.BAK *.orig *.rej .del-* *.a *.olb *.o *.obj *.so *.exe *.Z *.elc *.ln core .svn/
For example, when copying a tree of source code, you probably want the .c, .f and .h files but not the .o files, to define your own ignored rsync files use any of those methods.
- Any file whose name matches a pattern listed in the environment variable
CVSIGNORE
. This environment variable takes a list of patterns separated by spaces, such as the default list above.
When defining
CVSIGNORE
you will need to enclose the definition in quotation marks.For example to skip Fortran output to unnamed unit numbers (whose files have names like
fort.99
) and netcdf files whose name ends inintermediate.nc
, setCVSIGNORE
as follows (note that this syntax is forBASH
)
- Any file whose name matches a pattern listed in the environment variable
$ export CVSIGNORE="fort.?? *.intermediate.nc"
- Or any file whose name matches a pattern listed in the file $HOME/.cvsignore, or in a file named .cvsignore within a directory being copied.
This file has contents as per $CVSIGNORE, but with one pattern per line, for example:
$ cat .cvsignore
fort.??
*.intermediate.nc