File lifecycle on CTA¶
CTA is CERN's software solution for the archival and retrieval of data stored on tape.
At CERN, CTA works as a tape back-end to the EOS Disk System and, together, both EOS+CTA form a complete data archival system.
Info
While EOS+CTA is the solution used at CERN and covered by this documentation, CTA is also compatible with the dCache distributed storage system. For more information check [1] [2] [3].
With EOSCTA, the EOS disk system covers a few important key-roles:
- Managing the namespace for data stored on tape.
- Providing a storage buffer for files during archival and retrieval operations.
- Serving as the interface between external clients and CTA.
CTA, on other hand, manages all tape related operations:
- Managing the tape file catalogue.
- Providing scheduling functions for (data) archival and retrieval requests.
- Controlling data transfers to and from the tape hardware.
The following diagrams exemplify the use of EOSCTA for the archival and retrieval of physics data:
Archival¶
Retrieval¶
There is always a third component, which is the client software. It communicated with EOSCTA using the XRootD/HTTP protocols. The client is responsible for transferring files into and out of EOS, and managing failures and retries.
The most commonly-used client is CERN's File Transfer System (FTS) but, in principle, any client which can communicate using the XRootD or HTTP protocol can be used.
This document describes the EOSCTA workflows (archival, retrieval and deletion of files) and the interfaces and protocols used between the different system components. We will also cover repack.
Main workflows: