Skip to content

The cta-statistics-update tool

Danger

This tool only works for PostgreSQL and Oracle database.

Introduction

The cta-statistics-update tool is a C++ command line tool that allows the user to perform a per-Tape statistics update in the CTA Catalogue database.

This documentation will present how to configure this tool and how it works.

Configuration of the cta-statistics-update tool

As this tool only interrogates and updates the CTA Catalogue database, it only needs the Catalogue database configuration file. The content of the configuration file has to be done according to the catalogue/cta-catalogue.conf.example file.

Running the cta-statistics-update tool

In order to run the tool, just type :

cta-statistics-update path_to_cta_catalogue_conf.conf

What does this tool do

Checks that the Catalogue contains the tables and the columns needed for updating

In order to compute the Per-tape statistics, the tool needs the following TABLES and COLUMNS to exist:

Tables Columns
TAPE VID
NB_MASTER_FILES
MASTER_DATA_IN_BYTES
DIRTY
NB_COPY_NB_1
COPY_NB_1_IN_BYTES
NB_COPY_NB_GT_1
COPY_NB_GT_1_IN_BYTES
TAPE_FILE VID
FSEQ
ARCHIVE_FILE_ID
COPY_NB
ARCHIVE_FILE ARCHIVE_FILE_ID
SIZE_IN_BYTES
IS_DELETED

Executes a SQL UPDATE query on the TAPE table

The query will update every tape which has the flag DIRTY set to 1. This flag is automatically set to 1 by CTA when files are added or removed.

The following fields will be updated on the TAPE table:

  • DIRTY is set to 0
  • NB_MASTER_FILES: The number of tape files present which have not been deleted. Counts 1st, 2nd, ... copies as one file each.
  • MASTER_DATA_IN_BYTES: The amount of "useful data" on the tape, i.e. the sum of the size in bytes of all the MASTER files that the tape contains. This term corresponds to the WLCG term "Amount of Data Stored"/"usedSize".
  • NB_COPY_NB_1: The number of unique tape files on the tape. This is the count one gets when only counting first copies. The sum of all nbCopyNb1 statistics from all tapes should equal the number of archive files in the system.
  • COPY_NB_1_IN_BYTES: The amount of data in the tape when counting only unique tape files. Or, in other words, when only counting the first copy of each archive file.
  • NB_COPY_NB_GT_1: The number of replicated tape files on the tape. This is the count one gets when only counting 2nd, 3rd, ... copies.
  • COPY_NB_GT_1_IN_BYTES: The amount of data on the tape when counting only tape files which are 2nd, 3rd, ... copies.

When the query is executed, the number of updated tapes will be displayed through stdout.

Example of the output of the tool:

$ cta-statistics-update /etc/cta/cta-catalogue-oracle.conf
Updating tape statistics in the catalogue...
Updated catalogue tape statistics in 0.0989, 5 tape(s) have been updated