Skip to content

Failed Requests

Archive and Retrieve Requests are created as objects in CTA's distributed object store. Archive and Retrieve Queues also exist as objects in the object store. See the Object Store chapter in the Overview and Design document for more detail.

Sometimes a request cannot be fulfilled, for example because of a hardware problem like a stuck or broken tape. When this happens, CTA behaves as follows:

  1. The tape is mounted for archive or retrieve. If the tape cannot be mounted, the job fails and is requeued for retry.
  2. If the tape is mounted and the job fails (for example because of a disk or network problem), it will be retried three times within the same mount. After the third failed attempt, the tape will be unmounted and the job is requeued for retry.
  3. When the requeued job reaches the front of the queue, a second tape mount will be attempted, following steps (1) and (2) above.
  4. If the job cannot be fulfilled, the tape is unmounted; the job is put into a Failed Requests queue in the Object Store; and an error report is generated.

The Failed Requests Queue

Failed Requests queues use the same data structures as the normal Archive and Retrieve queues, but there is no automatic handling of requests in these queues. They will remain in the queue until dealt with by an operator.

Operators can get a summary of all failed requests using cta-admin failedrequest ls --summary.

To get the detail of which requests failed, use cta-admin failedrequest ls with options --justarchive, --justretrieve, --tapepool or --vid.

The JSON output (cta-admin --json failedrequest) includes more details, including the address of each request in the object store. This address can be used to delete the object. The --log option includes the complete failure log messages with timestamps of when each failure occurred.