Skip to content

The EOS namespace is the master

DISCLAIMER: This section is just as much a proposal as it is an explanation so please don’t take any conclusions or statements as being final. As of Thursday 23 April Georgios of the EOS development team is exploring the possibility of adding a sys.archive.file_id index to the EOS namespace in order to solve the problem of EOS converters changing the EOS file IDs of the files they convert. OK this disclaimer is finished.

The EOS namespace is the master with respect to the CTA tape catalogue. This means the tape catalogue has to be play catch-up in terms of EOS file creation, deletion and the change of an EOS file ID.

EOS changes the file ID of an EOS file whenever the file is processed by an EOS converter. The high-level algorithm of an EOS file converter is as follows:

1. Use Third Party Copy (TPC) to make a new physical copy of the file in its new location and disk layout and temporarily store it’s meta data in `/proc/converters`.
2. Merge the original file’s metadata into the copies metadata in  `/proc/converters`.
3. Rename the new file copy to be on top of the existing file.

EOS file converters are used for two reasons: 1. To explicitly convert one of more files from one disk layout to another. 2. To perform inter scheduler group rebalancing.

The latter reason can happen whenever performance problems need to be addressed in production and should therefore be considered a possible day to day chore in the running of an EOSCTA instance.

The ARCHIVE_FILE.DISK_FILE_ID database column of the CTA file catalogue is used to store the potentially changing EOS file ID of an EOS file. The two main reasons for the existence of this database column are as follows: 1. Allow tape operators to list the entire contents of a tape including the EOS path of each file using the cta-admin tapefile ls —vid TAPE_VOLUME_IDENTIFIER command which internally queries the EOS namespace by EOS file ID in order to retrieve the EOS metadata of the file. 2. Search for dark tape data, in other words find tape files referenced in the CTA catalogue that have no corresponding file in the EOS namespace.

The first reason, to make the cta-admin tapefile ls —vid TAPE_VOLUME_IDENTIFIER work, is very important. Tape operators will need to list the entire contents of tapes including EOS metadata about each file. The second reason, searching for dark data, has a very low priority at this stage in the project. We are currently much more worried about losing data than accidentally keeping some around on tape.

Given that EOS may change the EOS file ID of a file, the ARCHIVE_FILE .DISK_FILE_ID database column is to be considered a performance cache which can have cache misses. Something similar to the following algorithm should be executed when using the ARCHIVE_FILE.DISK_FILE_ID database column:

 1. Look up the `ARCHIVE_FILE.DISK_FILE_ID` based on the `ARCHIVE_FILE.ARCHIVE_FILE_ID`
 2. Query the EOS namespace for the metadata of the EOS file with the EOS file ID equal to `ARCHIVE_FILE.DISK_FILE_ID`
 3. If there is a valid result then
 4.   Return the EOS file metadata
 5. Else we have a “cache miss”
 6.   Find the new EOS file ID using the EOS namespace and the `sys.archive.file_id` extended attribute
 7.   Update the `ARCHIVE_FILE .DISK_FILE_ID` column
 9.   Query the EOS namespace for file metadata using the new EOS file ID
10.   Return the EOS file metadata 
11. End if

Step 6 is not yet possible and may actually turn out not to be. The EOS team are currently investigating the addition of a sys.archive.file_id index to the EOS namespace.