Deprecated
This page is deprecated and may contain information that is no longer up to date.
CTA Tape Verify¶
The CTA Tape Verify framework monitors tapes in the CTA system and alerts in case of potential data loss/corruption.
The framework is composed by a set of scripts (namely cta-verify-file
, cta-verify-tape
, cta-verification-feeder
), a monitoring flume agent that parses the CTA logs and extracts logs related to verification and related periodic rundeck jobs.
cta-verify-file¶
This command submits a verification request to the CTA frontend. A verification request is a retrieve request with the isVerifyOnly
flag set to true. CTA handles this requests specially:
-
On queuing time, the mount policy n attributed to the request is hard-coded in the
/etc/cta/cta-frontend-xrootd.conf
file (configuration optionscta.verification.mount_policy
). This is done so that verification jobs run with the lowest possible priority in the cta scheduler. The verification mount policy should be associated to a specific verification virtual organization to allow tracking ongoing verification jobs incta-admin sq
-
When a verification retrieve job is executed, the tape file is streamed to
/dev/null
. -
When a recall tape session is finished, verified jobs and bytes are accounted separately from user or repack jobs in the "Tape session finished message".
The options for this command are:
- request.user: The user submitting the request. This will be "verification"
- request.group: The group of the user submitting the request. This will be "it"
- instance: The disk instance of the retrieve request. This can be any disk instance since verification jobs don't write the file anywhere.
- vid: Optional. The vid of the tape the file belongs to.
cta-verify-tape¶
The cta-verify-tape
command submits a tape for verification by selecting some files from it and issuing a cta-verify-file
command for each of them.
When a tape is submitted for verification, it's verification_status
is updated to {date: <current_date>, status: ongoing, files_submitted: <number of files>, files_verified: 0, files_failed: 0}
.
The options for this command are:
- vid: The vid of the tape to be verified.
- read_time: The minimum amount of time it should take to read a tape (assuming a read speed of 300 MB/s).
- data_size: The minimum amount of data that should be read from the tape. This should be just a number of bytes or a number followed by a unit (K, M, G, T, P, E)
- first: The number of files that should be read from the beginning of the tape (fseq order)
- last: The number of files that should be read from the end of the tape (fseq order)
- random: The number of files that should be read from the middle of the tape
- all: If present, all tape files will be verified
- options: Options to pass to the
cta-verify-file
commands.
The script will choose jobs to satisfy all the flags. To satisfy the read_time
and data_size
options it will take jobs from the middle of the tape. If there are not enough files on the tape, it will just verify the whole tape.
The cta-verify-tape
command should be run from the cta frontend. It's logs are written in /var/log/cta/verification/cta-verify-tape.log
Verification monitoring¶
Verification job submission on a tape¶
File submission is caught on cta-frontend
MSG:
May 11 14:57:36.567479 ctafrontend cta-frontend: LVL="INFO" PID="210" TID="800" MSG="Queued retrieve request" user="ctaadmin1@ctafrontend" fileId="10026" instanceName="ctaeos" diskFilePath="dummy" diskFileOwnerUid="0" diskFileGid="0" dstURL="file://dummy" errorReportURL="" creationHost="ctafrontend" creationTime="1652273856" creationUser="ctaeos" requesterName="verification" requesterGroup="it" criteriaArchiveFileId="10026" criteriaCreationTime="1652273277" criteriaDiskFileId="10016" criteriaDiskFileOwnerUid="0" criteriaDiskInstance="ctaeos" criteriaFileSize="407" reconciliationTime="1652273277" storageClass="ctaStorageClass" checksumType="ADLER32" checksumValue="20ce7e60" tapeTapefile0="(vid=V01007 fSeq=10006 blockId=100051 fileSize=407 copyNb=1 creationTime=1652273277)" selectedVid="V01007" verifyOnly="1" catalogueTime="0.004542" schedulerDbTime="0.008317" policyName="verification" policyMinAge="1" policyPriority="1" retrieveRequestId="RetrieveRequest-Frontend-ctafrontend-210-20220511-14:35:58-0-30326"
Verification job finished¶
End of verification job is caught in the MSG="File successfully read from tape"
with isVerifyOnly="1"
from cta-taped
:
Jun 9 15:12:31.923036 tpsrv01 cta-taped: LVL="INFO" PID="6494" TID="6528" MSG="File successfully read from tape" thread="TapeRead" tapeDrive="VDSTK11" tapeVid="V01007" mountId="37" vo="vo" mediaType="T10K500G" tapePool="ctasystest" logicalLibrary="VDSTK11" mountType="Retrieve" labelFormat="0000" vendor="vendor" capacityInBytes="500000000000" fileId="10066" BlockId="100051" fSeq="10006" dstURL="file://dummy" isRepack="0" isVerifyOnly="1" positionTime="0.007167" readWriteTime="0.004118" waitFreeMemoryTime="0.000008" waitReportingTime="0.000098" transferTime="0.004224" totalTime="0.005537" dataVolume="407" headerVolume="480" driveTransferSpeedMBps="0.160195" payloadTransferSpeedMBps="0.073506" LBPMode="LBP_Off" repackFilesCount="0" repackBytesCount="0" userFilesCount="0" userBytesCount="0" verifiedFilesCount="1" verifiedBytesCount="407" checksumType="ADLER32" checksumValue="20ce7e60" status="success"
Verification job failed¶
Logged as an ERROR
in the TapeRead
thread with isVerifyOnly="1"
with MSG="Error reading a file in TapeReadFileTask"
:
[1626269255.260238000] Jul 14 15:27:35.260238 tpsrv017.cern.ch cta-taped: LVL="ERROR" PID="21203" TID="21550" MSG="Error reading a file in TapeReadFileTask" thread="TapeRead" tapeDrive="I4550832" tapeVid="I62382" mountId="14492" vo="IT-ST-TAB" mediaType="3592JD15T" tapePool="validation" logicalLibrary="IBM455" mountType="Retrieve" vendor="FUJIFILM" capacityInBytes="15000000000000" fileId="1339264257" BlockId="75" fSeq="2" dstURL="file://dummy" isRepack="0" isVerifyOnly="1" fileBlock="0" ErrorMessage="[ReadFile::position] - Reading HDR1: Failed ST read with crc32c in DriveGeneric::readExactBlock Errno=5: Input/output error"
cta-verification-feeder¶
This command is responsible for selecting which tapes are to be verified according to user supplied parameters. It is run periodically from rundeck and handles the following:
- Check the tapes that have an ongoing verification status. If the verification has finished (
cta-admin --json sq
does now show a retrieve queue for the tape) change the verification status tofinished
). - Compare the number of ongoing verifications with a user supplied maximum. If it is less than the maximum, submit new tapes for verification until the limit is reached.
The feeder can use three parameters to choose which tape to submit for verification:
- last_read: Choose the tape that has not been read in the longest time.
- last_write: Choose the tape that has not been written in the longest time.
- last_verified: Choose the tape that has not been verified in the longest time.
The options for the cta-verification-feeder
are as follows:
- filter: Filter out tapes matching the states passed (multiple states should be separated by
,
). - tapepool: Only verify tapes in the specified tapepool (multiple tapepools should be separated by ',')
- min_data_on_tape: Minimum number of bytes on tape for it to be verified
- min_relative_capacity: Minimum relative filled capacity of tape for it to be verified
- verify_path: Location of
cta-verify-tape
executable. - verify_policy: One of
last_read
,last_written
,last_verified
. - full_tapes: Verify only tapes that are full.
- minage: Verify only tapes which have not been verified for so many days.
- maxverify: maximum number verify processes to run concurrently.
- verify_options: Options to pass to
cta-verify-tape
. - logfile: Log file path.
- list_current_verifications: Just list current ongoing verifications and exit.
- noaction: Dry run, do not submit the tapes for verification.
The feeder tries to satisfy all the options passed. If the options passed cause no tape to be eligible for verification, the script aborts with an exit code of 1.
The feeder calls the cta-verify-tape
script for each tape it decides to verify.
The log file for the feeder is in /var/log/cta/verification/cta-verification-feeder.log
.
Listing current verifications¶
Can be done with cta-verification-feeder --list_current_verifications
or cta-admin --json sq
, by seeing which retrieve queues have jobs queued with the verification
mount policy.
Canceling verification¶
Once the verification jobs have been submitted, they cannot be canceled, just like normal retrieve jobs. You can however manually change the verification_status
of the tape with cta-admin tape ch --verificationstatus
. Note you must manually override the whole json. Changing the status from ongoing
to done
or cancelled
will make the verification framework not list the current tape as being verified.
Monitoring¶
Tape verification monitoring is in this dashboard.
Automation¶
Done through rundeck with a job running every four hours.