CTA Maintenance Daemon#
The responsibility of the maintenance daemon is to periodically run a number of routines.
At the moment, this works as follows:
graph LR
ra1["RoutineA"]
rb1["RoutineB"]
rc1["RoutineC"]
ra2["RoutineA"]
rb2["RoutineB"]
rc2["RoutineC"]
subgraph main Thread
direction LR
START:::hidden --> ra1 --> rb1 --> rc1
rc1 -- sleep --> ra2
ra2 --> rb2 --> rc2 --> END:::hidden
end
classDef hidden display: none;
The sleep interval can be configured in the config file.
In addition to the main thread, the maintd process also spawns a dedicated SignalReactor thread whose job it is to capture incoming signals (e.g. SIGTERM, SIGHUP) and execute the function associated with said signal. This ensures that the logic for dealing with signals is not spread out through all of the code.
graph LR
wfs["Wait for signal"]
esf["Execute Function</br>for said signal"]
subgraph SignalReactor Thread
direction LR
START:::hidden --> wfs
wfs -- Signal Received --> esf
esf -- Done --> wfs
end
classDef hidden display: none;
Routines#
The routines are defined in routines/. Which routines are run depend on whether the Objectstore or Postgres scheduler is used. Using the config, routines can be enabled/disabled as desired. However, for a correct functioning of CTA, at least one of each routine must be running somewhere.
-
Disk Reporting. All
buffer <-> tapedata transfers are divided into a two stage workflow, first the data movement takes place, and then, whether the operation was successful or not, a report job will get queued into the scheduler. The disk report archive/retrieve routines are responsible for executing the reporting of these jobs. For example, after a file has been archived, a reporting job will be queued, and later on picked up by aDiskReportArchiveRoutinethat will inform the disk buffer that the file got successfully written to tape. -
Repack. The maintenance daemon performs two tasks when it comes to repack
- Request Expansion: expanding a repack request means converting the request to repack a tape into the necessary retrieve and archive jobs to make effective the movement of the data to a new tape. For a more detailed explanation of tape repacking at CERN see Repack Workflows. This is done by the
RepackExpandRoutine. - Reporting: in a similar fashion to the disk reporting, all reporting related to repack requests are handled by a separate routine. Specifically, the
DiskReportRetrieveRoutine.
- Request Expansion: expanding a repack request means converting the request to repack a tape into the necessary retrieve and archive jobs to make effective the movement of the data to a new tape. For a more detailed explanation of tape repacking at CERN see Repack Workflows. This is done by the
-
Objectstore Garbage Collection. Any process that interacts with the object store scheduler database registers itself as an unique
agentinto the object store, the process will periodically update theagent's heartbeat. The way in which an agent interacts with the scheduler is by taking ownership of a batch of jobs and removing them from their queue. The role of the garbage collector is to move the jobs owned by a deadagentback into the scheduler. An agent is considered dead if the heartbeat has not being updated for a certain period of time, this can happen if the process has crashed or it got stuck processing some jobs. -
Objectstore Queue Cleanup. When a tape state change is requested, the tape is set to a
PENDINGstate in the Catalogue and the queues in the Objectstore are flagged for cleanup. This routine looks for queues to clean up and it will take care of transferring the pending jobs to another queue if it is possible to retrieve the files from another tape (the files have a copy on a different tape) or report them as failed. After this, it changes the tape state fromPENDINGto the desired one, eitherREPACKINGorBROKEN. For more details on the possible changes on the tape states see Tape Lifecycle.
Supported Routines#
Objectstore#
DiskReportArchiveRoutineDiskReportRetrieveRoutineRepackExpandRoutineRepackReportRoutineGarbageCollectRoutineQueueCleanupRoutine
Postgres#
DiskReportArchiveRoutineDiskReportRetrieveRoutineRepackExpandRoutineRepackReportRoutine