Skip to content

Introduction

A test of high-rate concurrent archives and retrieves was carried out against an EOSCTA instance. This instance began to behave in a bad way. One of the operations, either archives or retrieves were starved.

The solution to the starvation problem was to rate limit access to the EOS MGM. The following EOS command is used to set rate limits:

eos access set limit

Each of the deployed EOSCTA instances at CERN have the following idempotent script that can be used to configure EOS:

/root/configure_eoscta.sh

Here are the current rate limiting configurations for each of the deployed EOSCTA instances. Please note that note some these configurations may not have been applied or may have been overridden:

[itctabuild02] ~ > date; ssh root@eosctaalicepps grep access configure_eoscta.sh
Thu Mar 12 14:09:56 CET 2020
[itctabuild02] ~ > 
[itctabuild02] ~ > date; ssh root@eosctaatlas grep access configure_eoscta.sh
Thu Mar 12 14:10:09 CET 2020
eos access set limit 100 rate:user:*:OpenWriteCreate
eos access set limit 100 rate:user:*:Prepare
eos access ls
[itctabuild02] ~ > 
[itctabuild02] ~ > date; ssh root@eosctaatlaspps grep access configure_eoscta.sh
Thu Mar 12 14:10:15 CET 2020
eos access set limit 100 rate:user:*:OpenWriteCreate
eos access set limit 100 rate:user:*:Prepare
eos access ls
[itctabuild02] ~ > 
[itctabuild02] ~ > date; ssh root@eosctacmspps grep access configure_eoscta.sh
Thu Mar 12 14:10:27 CET 2020
eos access set limit 100 rate:user:*:OpenWriteCreate
eos access set limit 100 rate:user:*:Prepare
eos access ls
[itctabuild02] ~ > 
The problem with the rate limiting system is that it does behave in the way one might think given our above configurations.

The rate limiting system is a "punishment system". If a user exceeds any of their operation specific rates limits then ALL operation types for that user are "penalized". The above configurations give the false impression that a given user is limited to 100Hz for "OpenWriteCreate" operations and 100Hz for "Prepare" operations. This is not true. As soon as 100Hz is exceeded fori either "OpenWriteCreate" or "Prepare" the concerned user to stalled for all types of operations.