$ARCHIVE¶

In order to simplify the storage mount points and the user experience in storing their data on the Jubail HPC, We have merged the functionalities of two storages, $WORK (/work/<NetID>) and $ARCHIVE (/archive/NetID>) into a single storage $ARCHIVE (/archive/<NetID>).

$ARCHIVE can be accessed on login nodes as follows:

cd $ARCHIVE
# is equivalent to
cd /archive/<your-NetID>

Caution

The /work/<NetID> has been discontinued from 2nd Feb 2023. The previous data in /work/<NetID> is moved to /archive/work/<NetID> as a temporary staging area to allow users to move out their existing data from $WORK to archive (/archive/<NetID>/…). The /archive/work/<NetID> is available for the users from the 3rd of Feb 2023 and would be deprecated after a period of 180 days.

When storage is shared among many users, rules must be set to prevent users from consuming disk space excessively at the expense of others. For this purpose, HPC systems enforce disk quotas to users and groups of users. The regular cleanup of the /scratch/<NetID> storage is essential to maintain its efficiency and easily manage the capacity.

Warning

The current policy is to remove any files(data) that have not been accessed (viewed, created or modified) for more than 90 days.

On the HPC, disk quotas are enforced on /home and /scratch (not /archive ).

There are two quota constraints:

The total amount of disk space
The total number of files.

Once you reach a quota limit your jobs may be killed. So it is a good practice to check your quota before submitting a job that will generate a lot of data.

We urge our users to clean up their $SCRATCH storage regularly.

Run myquota command in the terminal on the HPC to check your current usage and quota.

Example output:

                    DISK SPACE                # FILES (1000's)
filesystem       size      quota            number      quota
            --------------------------   --------------------------
/home         131KB       20GB ( 0%)           0        100 ( 0%)
/scratch      220GB     5242GB ( 4%)           4        500 ( 1%)
/archive      418GB     5242GB ( 8%)           3        125 ( 3%)

What’s inside `$ARCHIVE`¶

The archive processes are described below in the form of a flowchart.

The archive comprises of two types of storages:
- Normal Storage (fast)
- Long Term Storage - Tape drive (slow)
Process 1 : The data in the Tape drive is synced with the storage every 12 hours, so that a copy of the files is available on the Tape drive as well.
Process 2 : Once the usage of the total storage hits 80% , the system automatically frees up space by keeping only a copy of the file on the tape drive (which can be retrieved later).
The freeing up of space is in accordance with the access timestamp of the files which are oldest.

State of the file in `$ARCHIVE`¶

Since the $ARCHIVE acts both as a storage soultion and long term storage (tape drive) as well, hence the state of the file plays an important role in the same. There are essentially two types of states for a file in /archive.

archived state: The file has a copy on the storage and the tape drive as well.
In the above figure, file1,file2,file3,file4,file5 are in the archived state as they are both available on the storage and the tape drive.
released archived state: The file is only available on the tape drive and has been moved (released) from the storage to free up space.
In the above figure, file6,file7,file8,file9,file10 are in the released archived state as they have been released/moved from the storage to the tape drive.

How to identify the state of a file¶

In order to identify the state of files in a directory:

#dmfls -d <path to archive directory>

#for example:
dmfls -d /archive/wz22/abc

In order to identify the state of a single file:

#dmfls <path to archive file>

#for example:
dmfls /archive/wz22/abc/input.txt

A sample output of the above directory command is shown below

[wz22@login4 ~]$ dmfls -d /archive/wz22/abc
./hwloc/gnu/build.txt:  exists archived,
./hwloc/src/hwloc-1.7.1.tar.bz2:  exists archived,
./blat/src/bedGraphToBigWig:  exists archived,
./blat/src/liftOver: released exists archived,
./blat/src/blat:  exists archived,
./cufflinks/src/cufflinks-2.0.2.Linux_x86_64.tar.gz: released exists archived,
./cufflinks/src/cufflinks-2.1.1.Linux_x86_64.tar.gz:  released exists archived,
./crystal-analysis/gnu/crystal_analysis-0.9.12.tbz2:  exists archived,

It can be seen above in the sample output that the state of a few files is released archived state while some are in the archived state.

How to Archive and De Archive¶

The above figure shows the following:

The data from /scratch or /home can be moved/copied to /archive using the usual unix commands (rsync,cp,mv)
The commands to copy out data from /archive depend on the state of the file.

archived state

Since the archived state refers to the copy of the file available on both the storages, usual unix commands (cp , rsync) can be used to copy out the files from /archive to your /scratch.

released archived state

Since the released archived state refers that the file has been moved/released and is now only available on the tape library, dearchiving the file would be a two-step process.
It would have to be first moved from the tape to the normal storage using the dmfget <filename> command and then can be copied out to the required directory in your /scratch using the usual unix commands (rsync , cp).

A simple dearchiving would have the following steps:

Go to the required /archive directory which you would like to copy out to your /scratch.

(base) [wz22@login1 ~]$ cd /archive/wz22/abc

Check the state of the file using the dmfls <filename> command.

(base) [wz22@login1 abc]$ ls
xyz.txt file2.txt
(base) [wz22@login1 lib64]$ dmfls *
xyz.txt:  exists archived,
file2.txt:  released exists archived,

(optional) if there are files in the released archived state, use the dmfget <filename> command to copy out them from the tape library to the storage to make them in the archived state. This will run in the background and the progress can be tracked using the dmfmonitor command.
```
(base) [wz22@login1 abc]$ dmfget *
Execute watch dmfmonitor <directory/file_name> to see progress
(base) [wz22@login1 abc]$ dmfmonitor *
xyz.txt: NOOP
file2.txt: RESTORE running (0 bytes moved)
(base) [wz22@login1 abc]$ dmfmonitor *
xyz.txt: NOOP
file2.txt: NOOP
(base) [wz22@login1 lib64]$ dmfls *
xyz.txt:  exists archived,
file2.txt:  exists archived,
```
Note that in the command dmfmonitor when the status corresponding to the file is NOOP, means that the file is now back in the storage and in the archived state.
Copy out the file from /archive to desired location in /scratch using the usual unix commands (cp , rsync).

(base) [wz22@login1 abc]$ cp -r /archive/wz22/abc /scratch/wz22/.

Tip

A user can simply use the standard unix utilities like rsync, cp etc. to copy in or out the data from/to /archive. While copying out from archive, the rest of the dmfget command would be automatically handled in the background and hence the time taken for the moving out would depend on the state of the file (released/exists).

Any operation performed on the archive files would first auto transfer the file to the exists archived state before performing the operation.

Quick Glance into the archive commands¶

Action	Command	Remarks
Navigating to archive	`cd /archive/<NetID>`	usual unix commands (`rsync,cd,cp,mv`) can be used
List the state of the files	`dmfls <filename>`	check for `archived state` and `released archived state`
Retrieve from Tape drive to Storage	`dmfget <filename>`	use when the file is in the `released archived state`
Monitor the state of a file	`dmfmonitor <filename>`	Can be used to track if the migration from tape drive to storage is done.

Summary¶

Action	Command	Remarks
Copy in to Archive	rsync <source-dir> <dest-dir>	Usual commands like rsync and cp can be used to copy in to `/archive`
Copy out from archive	rsync <source-dir> <dest-dir>	Usual commands like rsync and cp can be used to copy out from `/archive`. The dmfget would be handled in the background.

Best Practices¶

Dos	Remarks
Periodically clean your `/scratch`	Files which have not been access for 90 days in `/scratch` are deleted.
Once a project is completed move the data over to `/archive`	Moving data to `/archive` frees up space from `/scratch` and avoids deletion of files if older than 90 days.
Use tar files to archive directories with large file count	Lesser the number of files, faster is the archiving and dearchiving process

Note

$ARCHIVE can also be mounted on your workstation, Linux,Mac and Windows. Instructions are in this page: Mount $ARCHIVE with SSHFS