Skip to content

The CTA tape lifecycle

Introduction

In CTA, the tapes information are stored in the Catalogue and are managed by the mean of cta-admin commands.

$ cta-admin tape help
cta-admin ta/tape add/ch/rm/reclaim/ls/label:
    add     --vid/-v <vid> --mediatype/--mt <media_type_name>
            --vendor/--ve <vendor> --logicallibrary/-l <logical_library_name>
            --tapepool/-t <tapepool_name> --full/-f <"true" or "false">
            [--state/-s <"ACTIVE" or "DISABLED" or "BROKEN" or "EXPORTED" or "REPACKING" or "REPACKING_DISABLED">]
            [--reason/-r <reason_status_change>] [--comment/-m <"comment">]
    ch      --vid/-v <vid> [--mediatype/--mt <media_type_name>]
            [--vendor/--ve <vendor>]
            [--logicallibrary/-l <logical_library_name>]
            [--tapepool/-t <tapepool_name>]
            [--encryptionkeyname/-k <encryption_key_name>]
            [--full/-f <"true" or "false">]
            [--state/-s <"ACTIVE" or "DISABLED" or "BROKEN" or "EXPORTED" or "REPACKING" or "REPACKING_DISABLED">]
            [--reason/-r <reason_status_change>] [--comment/-m <"comment">]
    rm      --vid/-v <vid>
    reclaim --vid/-v <vid>
    ls      [--vid/-v <vid>] [--mediatype/--mt <media_type_name>]
            [--vendor/--ve <vendor>]
            [--logicallibrary/-l <logical_library_name>]
            [--tapepool/-t <tapepool_name>] [--vo/--vo <vo>]
            [--capacity/-c <capacity_in_bytes>] [--full/-f <"true" or "false">]
            [--fxidfile/-F <filename>] [--all/-a]
            [--state/-s <"ACTIVE" or "DISABLED" or "BROKEN" or "EXPORTED" or "REPACKING" or "REPACKING_DISABLED">]
    label   --vid/-v <vid> [--force/-f <"true" or "false">]

Different tape states and supported state transitions

Tape state diagram

What can be done on each final state

State User
read
request
User
write
request
Repack
request
Repack
read
sub-request (*)
Repack
write
sub-request(*)
Mountable Reclaimable
ACTIVE YES YES NO YES YES(**) YES YES
DISABLED YES YES NO YES YES(**) NO YES
REPACKING NO NO YES NO YES YES NO
REPACKING_DISABLED NO NO YES NO YES NO NO
BROKEN NO NO NO NO NO NO NO
EXPORTED NO NO NO NO NO NO NO

() Repack sub-request queueing is handled internally by the maintenance process.
(
*) Repack read sub-requests may be re-queued on ACTIVE/DISABLED tape replicas if the original REPACKING read sub-request could not be enqueued.

> ACTIVE state

A tape that is ACTIVE is a tape that is in good condition to allow a user to read data from it or to write data to it. A newly added tape to CTA will be ACTIVE by default.

To set a tape to ACTIVE, run:

$ cta-admin tape ch --vid V01001 --state active

If the tape was DISABLED before running this command, the reason why the tape has been disabled will be deleted.

> DISABLED state

A tape should not stay in this state for more than 1 week, person on rota to follow up on tapes disabled for longer than one week.

New user requests may be queued on a DISABLED tape, if there are no replicas on other ACTIVE tapes. However, no tape will be mounted while it's DISABLED.

To disable a tape, run:

$ cta-admin tape ch --vid V01001 --state disabled --reason "Failed to read data from this tape with 3 different drives"
A reason why the tape is set to DISABLED has to be provided.

Why does a tape enter this state?

  • cta-taped disabled it upon tape server issue
  • Monitoring probes decided to disable it (too many errors over the past XX hours,...)
  • An operator think the tape is in bad shape and must be kept away from users while it is investigated

Small fraction of DISABLED tapes are due to failed dismount operation and as such do not require immediate repack. These tapes are also quickly un-disabled.

> REPACKING state

A tape should be moved to REPACKING state before the operator submits a repack request. Otherwise the repack request won't be accepted. Likewise, it's not possible to move out of REPACKING while a repack request is ongoing (except for the REPACKING_DISABLED state).

When a change to REPACKING is requested, the tape will first move to the temporary state REPACKING_PENDING. Then, it waits for the maintenance process to clean all user requests on the tape queue, before finally moving it to REPACKING state.

For more details on repacking, check the repacking documentation.

> REPACKING_DISABLED state

The state REPACKING_DISABLED is similar to DISABLED, but for repacking tapes (we can't move out of repacking states while a repack is ongoing).

A tape should not stay in this state for more than 1 week, person on rota to follow up on tapes disabled for longer than one week.

New repack requests can be queued on a REPACKING_DISABLED tape. However, no tape will be mounted while it's on this state.

> BROKEN state

A tape can stay in this state for long and it is very likely that it is its very last state.

When a change to BROKEN is requested, the tape will first move to the temporary state BROKEN_PENDING. Then, it waits for the maintenance process to clean all user requests on the tape queue, before finally moving it to BROKEN state. This guarantees that all requests are properly disposed of.

Why does a tape enter this state?

A problematic (earlier DISABLED) tape that requires non trivial efforts for its data to be recovered: - low level slow tape extract is needed - sent for data recovery to the vendor = the tape is not present in the library - all recovery attempts exhausted, the tape is permanently broken but experiment action is needed (delete the lost files from the catalogs)

In rare cases an ACTIVE tape that fall on the floor because of a gripper incident can move from ACTIVE straight to BROKEN as it must be physically put back in a slot.

Normally, very few tapes are in the BROKEN state (there are ~20 such tapes in CASTOR out of ~30 000 as of December 2020).

No operations are allowed for a BROKEN tape.

> EXPORTED state

A tape can stay in this state for long. It means that the tape was removed from the tape library.

While we at CERN we do not remove tape cartridges from tape libraries, other sites do. Therefore, the EXPORTED state should help to distinguish that.

This state would behave very similarly to BROKEN state. The difference is on the error messages reported to the user.