Skip to content

Deprecated

This page is deprecated and may contain information that is no longer up to date.

Truncated disk replica

How to reproduce the problem

Archive a file to tape:

[itctabuild02] ~ > echo -n '1234567890' > ten_byte_file.txt
[itctabuild02] ~ > run_eosuser1_shell
[itctabuild02] ~ (krb5=eosuser1)> xrdcp ten_byte_file.txt root://localhost//eos/dev/userfiles/testdir_1
[10B/10B][100%][==================================================][10B/s]  
[itctabuild02] ~ (krb5=eosuser1)> exit
exit
[itctabuild02] ~ >

Observe that the file is only on tape (d0::t1):

[itctabuild02] ~ > run_eosuser1_shell
[itctabuild02] ~ (krb5=eosuser1)> eos root://localhost ls -y /eos/dev/userfiles/testdir_1/ten_byte_file.txt
d0::t1   -rw-r--r--   1 eosuser1 eosuser1           10 Apr 16 11:14 ten_byte_file.txt
[itctabuild02] ~ (krb5=eosuser1)> exit
exit
[itctabuild02] ~ >

Request that the file be retrieved from tape:

[itctabuild02] ~ > run_eospoweruser1_shell
[itctabuild02] ~ (krb5=eospoweruser1)> xrdfs localhost prepare -s /eos/dev/userfiles/testdir_1/ten_byte_file.txt
eos:044620011458020202280000000001000042:e03bfa81.5e982134:11
[itctabuild02] ~ (krb5=eospoweruser1)> exit
exit
[itctabuild02] ~ >

Observe that the file is both on disk and on tape (d1::t1):

[itctabuild02] ~ > run_eosuser1_shell
[itctabuild02] ~ (krb5=eosuser1)> eos root://localhost ls -y /eos/dev/userfiles/testdir_1/ten_byte_file.txt
d1::t1   -rw-r--r--   2 eosuser1 eosuser1           10 Apr 16 11:14 ten_byte_file.txt
[itctabuild02] ~ (krb5=eosuser1)> exit
exit
[itctabuild02] ~ >

Determine the location of the underlying physical file on the EOS FST and truncate it:

[itctabuild02] ~ > sudo eos root://localhost fileinfo /eos/dev/userfiles/testdir_1/ten_byte_file.txt --fullpath
  File: '/eos/dev/userfiles/testdir_1/ten_byte_file.txt'  Flags: 0644
  Size: 10
Modify: Thu Apr 16 11:14:35 2020 Timestamp: 1587028475.686152000
Change: Thu Apr 16 11:15:11 2020 Timestamp: 1587028511.310849367
Birth : Thu Apr 16 11:14:35 2020 Timestamp: 1587028475.645430951
  CUid: 19227 CGid: 1487  Fxid: 00000010 Fid: 16    Pid: 15   Pxid: 0000000f
XStype: adler    XS: 0b 2c 02 0e    ETAGs: "4294967296:0b2c020e"
Layout: replica Stripes: 1 Blocksize: 4k LayoutId: 00100012
  #Rep: 2
┌───┬──────┬────────────────────────┬────────────────┬────────────────────────────────────────────┬──────────┬──────────────┬────────────┬────────┬────────────────────────┬──────────────────────────────────────────────────────────────┐
│no.│ fs-id│                    host│      schedgroup│                                        path│      boot│  configstatus│       drain│  active│                  geotag│                                             physical location│
└───┴──────┴────────────────────────┴────────────────┴────────────────────────────────────────────┴──────────┴──────────────┴────────────┴────────┴────────────────────────┴──────────────────────────────────────────────────────────────┘
 0    65535                localhost           tape.0                              /does_not_exist                       off      nodrain  offline                                                       /does_not_exist/00000000/00000010 
 1        2     itctabuild02.cern.ch        spinner.0 /run/media/smurray/250GB/fst_spinner_storage     booted             rw      nodrain   online                     flat /run/media/smurray/250GB/fst_spinner_storage/00000000/00000010 

*******
[itctabuild02] ~ > echo -n | sudo tee /run/media/smurray/250GB/fst_spinner_storage/00000000/00000010
[itctabuild02] ~ >

Copy out the disk replica as an end user and print the successful exit code. This actually works when it should NOT:

[itctabuild02] ~ > run_eosuser1_shell
[itctabuild02] ~ (krb5=eosuser1)> xrdcp root://localhost//eos/dev/userfiles/testdir_1/ten_byte_file.txt /tmp/tmp_ten_byte_file.txt
[0B/0B][100%][==================================================][0B/s]  
[itctabuild02] ~ (krb5=eosuser1)> echo $?
0
[itctabuild02] ~ (krb5=eosuser1)> exit
exit
[itctabuild02] ~ > 

Observe that the successfully copied out file is of zero length which is an ERROR:

[itctabuild02] ~ > stat /tmp/tmp_ten_byte_file.txt
  File: ‘/tmp/tmp_ten_byte_file.txt’
  Size: 0           Blocks: 0          IO Block: 4096   regular empty file
Device: 801h/2049d  Inode: 3014674     Links: 1
Access: (0644/-rw-r--r--)  Uid: (19214/ smurray)   Gid: ( 1000/ smurray)
Context: unconfined_u:object_r:user_tmp_t:s0
Access: 2020-04-16 11:16:49.369981786 +0200
Modify: 2020-04-16 11:16:49.369981786 +0200
Change: 2020-04-16 11:16:49.369981786 +0200
 Birth: -
[itctabuild02] ~ > 

Observe that EOS ignores an end user’s request to retrieve the file from tape because EOS believes the disk replica already exists:

[itctabuild02] ~ > run_eospoweruser1_shell
[itctabuild02] ~ (krb5=eospoweruser1)> xrdfs localhost prepare -s /eos/dev/userfiles/testdir_1/ten_byte_file.txt
eos:044620011458020202280000000001000042:e03bfa81.5e982134:12
[itctabuild02] ~ (krb5=eospoweruser1)> exit
exit
[itctabuild02] ~ > 
[itctabuild02] ~ > grep 'nothing to prepare' /var/log/eos/mgm/xrdlog.mgm
200416 11:39:05 time=1587029945.244306 func=HandleProtoMethodPrepareEvent level=INFO  logid=static.............................. unit=mgm@itctabuild02.cern.ch:1094 tid=00007f5c442fa700 source=WFE:1666                       tident= sec=(null) uid=99 gid=99 name=- geo="" File /eos/dev/userfiles/testdir_1/ten_byte_file.txt is already on disk, nothing to prepare.
[itctabuild02] ~ > 

How an end user can recover the data

Ask EOS to evict the disk replica:

[itctabuild02] ~ > run_eospoweruser1_shell
[itctabuild02] ~ (krb5=eospoweruser1)> xrdfs localhost prepare -e /eos/dev/userfiles/testdir_1/ten_byte_file.txt
[itctabuild02] ~ (krb5=eospoweruser1)> exit
exit
[itctabuild02] ~ > 

Observe that EOS now recognises the fact that the disk replica is in fact gone:

[itctabuild02] ~ > run_eosuser1_shell
[itctabuild02] ~ (krb5=eosuser1)> eos root://localhost ls -y /eos/dev/userfiles/testdir_1/ten_byte_file.txt
d0::t1   -rw-r--r--   1 eosuser1 eosuser1           10 Apr 16 11:14 ten_byte_file.txt
[itctabuild02] ~ (krb5=eosuser1)> exit
exit
[itctabuild02] ~ > 

Request that the file be retrieved from tape:

[itctabuild02] ~ > run_eospoweruser1_shell
[itctabuild02] ~ (krb5=eospoweruser1)> xrdfs localhost prepare -s /eos/dev/userfiles/testdir_1/ten_byte_file.txt
eos:044620011458020202280000000001000042:e03bfa81.5e982134:13
[itctabuild02] ~ (krb5=eospoweruser1)> exit
exit
[itctabuild02] ~ > 

Observe that the file is both on disk and on tape (d1:t1):

[itctabuild02] ~ > run_eosuser1_shell
[itctabuild02] ~ (krb5=eosuser1)> eos root://localhost ls -y /eos/dev/userfiles/testdir_1/ten_byte_file.txt
d1::t1   -rw-r--r--   2 eosuser1 eosuser1           10 Apr 16 11:14 ten_byte_file.txt
[itctabuild02] ~ (krb5=eosuser1)> exit
exit
[itctabuild02] ~ > 

Copy the recovered file out:

[itctabuild02] ~ > run_eosuser1_shell
[itctabuild02] ~ (krb5=eosuser1)> xrdcp root://localhost//eos/dev/userfiles/testdir_1/ten_byte_file.txt /tmp/tmp_ten_byte_file.txt
[10B/10B][100%][==================================================][10B/s]  
[itctabuild02] ~ (krb5=eosuser1)> exit
exit
[itctabuild02] ~ > cat /tmp/tmp_ten_byte_file.txt; echo
1234567890
[itctabuild02] ~ > 

What a tape operator can do to recover the data

The same as an end user.