Archival¶
On the EOS side of EOSCTA side, files are created in the namespace by a CREATE workflow event and then archived to tape following a CLOSEW (CLOSE Write) workflow event.
It is important to note that the EOS instance in EOSCTA is a temporary staging area for files on their way to/from tape. When the file is successfully archived, the disk buffer copy will be automatically deleted.
Likewise, if archiving fails before the archive request is queued, the file will be deleted from the EOS disk buffer, and an error will be reported to the client. It is expected that the client will attempt to re-send the file in this case. See below for more details on how different errors are handled.
Archive workflow¶
The figure below shows the sequence of a client writing a file to EOS. The storage class is checked on CREATE and a synchronous archive request is queued on CLOSEW.
sequenceDiagram
participant Client as Client
participant FST as EOS FST
participant MGM as EOS MGM
participant FE as CTA Frontend
participant TS as CTA Tape Server
rect rgba(255,255,255,0.1)
activate Client
Client ->> MGM: open (CREATE)
MGM ->> FE: check storageClass (CREATE)
FE -->> MGM: ack
MGM -->> Client: redirect to FST
end
rect rgba(255,255,255,0.1)
Client ->> FST: open
activate FST
FST -->> Client: ack
Note right of FST: Write file to<br/>disk buffer<br/>on FST
Client ->> FST: write
Client ->> FST: ...
Client ->> FST: close (CLOSEW)
deactivate FST
FST ->> MGM: commit
MGM ->> FE: notification (CLOSEW)
Note right of FE: Create<br/>and enqueue<br/>archive request
FE -->> MGM: reply
MGM -->> FST: ack
FST -->> Client: ack
deactivate Client
end
rect rgba(255,255,255,0.1)
activate TS
Note right of TS: Tape session pops<br/>archive request(s)<br/>from queue and<br/>writes file(s)<br/>to tape
TS ->> MGM: open
MGM -->> TS: redirect
TS ->> FST: open
activate FST
TS ->> FST: read
TS ->> FST: ...
TS ->> FST: close
deactivate FST
TS ->> MGM: tapereplica
MGM ->> MGM: Evict disk copy
MGM -->> TS: ack
deactivate TS
end
EOS events handled by the CTA Frontend¶
- CREATE: Validate Storage Class, allocate Archive ID
- CLOSEW: Archive the file to tape
The EOS-CTA events are synchronous. \ If CTA fails during either event, no archive request is queued. The file will be deleted from the EOS buffer and an error reported to the client.
For more details on error handling, see How failures are handled before and after the Archive Request is queued.
EOS events not handled by the CTA Frontend¶
- OPENW: We do not handle OPENW events, because files on tape are immutable
Despite EOS generating an OPENW event when an already-existing file is opened for writing, CTA does not allow file modifications. \ Therefore, the OPENW workflow is not supported by CTA. \ This should be enforced by system administrators by adding an immutable flag (!u) to the ACL of tape-backed directories in EOS, or as a rule.
How failures are handled before and after the archive request is queued¶
1. Before queueing the archive request¶
Up until the point where the archive request is queued, failures are synchronous and are reported immediately to the client. \ The file will be deleted from the EOS disk cache, to ensure that the disk buffer does not fill up with failed transfers and to allow the client to retry.
If there is a failure during CREATE, no file is written and the MGM will delete the file metadata from the EOS namespace. If there is a failure while writing the file to the EOS disk buffer, the FST will delete the file and the CLOSEW event will not be executed. The FST will also delete the file if there is a failure during the processing of the CLOSEW event in the CTA Frontend.
2. After queueing the archive request¶
After the request has been queued, failures in the archival process are asynchronous. The client must poll the status of the file to determine if an error has occurred.
If there is a failure on the CTA side (ex: can't authenticate to EOS or read the file, can't mount a tape, tape write error, etc.), the CTA Tape Server will retry to write the file to tape several times (usually three times per mount session, over two separate mounts). \ If all these attempts fail, the CTA Tape Server will call back EOS with the archive failed workflow event, to report the failure. \ When EOS receives the archive failed event, the MGM updates the file metadata with the error message, in the extended attributes. The file is left in the disk cache to allow an operator to investigate and possibly resubmit the archive request manually.