Deprecated
This page is deprecated and may contain information that is no longer up to date.
Wrong checksum disk replica¶
How to reproduce the problem¶
Archive a file to tape:
[itctabuild02] ~ > echo -n '1234567890' > ten_byte_file.txt
[itctabuild02] ~ > run_eosuser1_shell
[itctabuild02] ~ (krb5=eosuser1)> xrdcp ten_byte_file.txt root://localhost//eos/dev/userfiles/testdir_1
[10B/10B][100%][==================================================][10B/s]
[itctabuild02] ~ (krb5=eosuser1)> exit
exit
[itctabuild02] ~ >
Observe that the file is only on tape (d0::t1
):
[itctabuild02] ~ > run_eosuser1_shell
[itctabuild02] ~ (krb5=eosuser1)> eos root://localhost ls -y /eos/dev/userfiles/testdir_1/ten_byte_file.txt
d0::t1 -rw-r--r-- 1 eosuser1 eosuser1 10 Apr 16 11:55 ten_byte_file.txt
[itctabuild02] ~ (krb5=eosuser1)> exit
exit
[itctabuild02] ~ >
Request that the file be retrieved from tape:
[itctabuild02] ~ > run_eospoweruser1_shell
[itctabuild02] ~ (krb5=eospoweruser1)> xrdfs localhost prepare -s /eos/dev/userfiles/testdir_1/ten_byte_file.txt
eos:044620011458020202280000000001000042:e03bfa81.5e982b25:11
[itctabuild02] ~ (krb5=eospoweruser1)> exit
exit
[itctabuild02] ~ >
Observe that the file is both on disk and on tape (d1::t1
):
[itctabuild02] ~ > run_eosuser1_shell
[itctabuild02] ~ (krb5=eosuser1)> eos root://localhost ls -y /eos/dev/userfiles/testdir_1/ten_byte_file.txt
d1::t1 -rw-r--r-- 2 eosuser1 eosuser1 10 Apr 16 11:55 ten_byte_file.txt
[itctabuild02] ~ (krb5=eosuser1)> exit
exit
[itctabuild02] ~ >
Determine the location of the underlying physical file on the EOS FST and give it different contents whilst preserving its size:
[itctabuild02] ~ > sudo eos root://localhost fileinfo /eos/dev/userfiles/testdir_1/ten_byte_file.txt --fullpath
File: '/eos/dev/userfiles/testdir_1/ten_byte_file.txt' Flags: 0644
Size: 10
Modify: Thu Apr 16 11:55:32 2020 Timestamp: 1587030932.74291000
Change: Thu Apr 16 11:56:14 2020 Timestamp: 1587030974.410272330
Birth : Thu Apr 16 11:55:32 2020 Timestamp: 1587030932.36661398
CUid: 19227 CGid: 1487 Fxid: 00000010 Fid: 16 Pid: 15 Pxid: 0000000f
XStype: adler XS: 0b 2c 02 0e ETAGs: "4294967296:0b2c020e"
Layout: replica Stripes: 1 Blocksize: 4k LayoutId: 00100012
#Rep: 2
┌───┬──────┬────────────────────────┬────────────────┬────────────────────────────────────────────┬──────────┬──────────────┬────────────┬────────┬────────────────────────┬──────────────────────────────────────────────────────────────┐
│no.│ fs-id│ host│ schedgroup│ path│ boot│ configstatus│ drain│ active│ geotag│ physical location│
└───┴──────┴────────────────────────┴────────────────┴────────────────────────────────────────────┴──────────┴──────────────┴────────────┴────────┴────────────────────────┴──────────────────────────────────────────────────────────────┘
0 65535 localhost tape.0 /does_not_exist off nodrain offline /does_not_exist/00000000/00000010
1 2 itctabuild02.cern.ch spinner.0 /run/media/smurray/250GB/fst_spinner_storage booted rw nodrain online flat /run/media/smurray/250GB/fst_spinner_storage/00000000/00000010
*******
[itctabuild02] ~ > echo -n RUBBISH890 | sudo tee /run/media/smurray/250GB/fst_spinner_storage/00000000/00000010
RUBBISH890[itctabuild02] ~ >
[itctabuild02] ~ >
Try to copy out the disk replica as an end user and print the exit code of the failing command:
[itctabuild02] ~ > run_eosuser1_shell
[itctabuild02] ~ (krb5=eosuser1)> xrdcp root://localhost//eos/dev/userfiles/testdir_1/ten_byte_file.txt /tmp/tmp_ten_byte_file.txt
[0B/0B][100%][==================================================][0B/s]
Run: [ERROR] Server responded with an error: [3007] Unable to read file - wrong file checksum fn= /run/media/smurray/250GB/fst_spinner_storage/00000000/00000010; input/output error (source)
[itctabuild02] ~ (krb5=eosuser1)> echo $?
54
[itctabuild02] ~ (krb5=eosuser1)> exit
exit
[itctabuild02] ~ >
Observe that EOS still believes that the disk replica still has the previous checksum when in fact the file has been corrupted:
[itctabuild02] ~ > sudo eos root://localhost fileinfo /eos/dev/userfiles/testdir_1/ten_byte_file.txt --fullpath
File: '/eos/dev/userfiles/testdir_1/ten_byte_file.txt' Flags: 0644
Size: 10
Modify: Thu Apr 16 11:55:32 2020 Timestamp: 1587030932.74291000
Change: Thu Apr 16 11:56:14 2020 Timestamp: 1587030974.410272330
Birth : Thu Apr 16 11:55:32 2020 Timestamp: 1587030932.36661398
CUid: 19227 CGid: 1487 Fxid: 00000010 Fid: 16 Pid: 15 Pxid: 0000000f
XStype: adler XS: 0b 2c 02 0e ETAGs: "4294967296:0b2c020e"
Layout: replica Stripes: 1 Blocksize: 4k LayoutId: 00100012
#Rep: 2
┌───┬──────┬────────────────────────┬────────────────┬────────────────────────────────────────────┬──────────┬──────────────┬────────────┬────────┬────────────────────────┬──────────────────────────────────────────────────────────────┐
│no.│ fs-id│ host│ schedgroup│ path│ boot│ configstatus│ drain│ active│ geotag│ physical location│
└───┴──────┴────────────────────────┴────────────────┴────────────────────────────────────────────┴──────────┴──────────────┴────────────┴────────┴────────────────────────┴──────────────────────────────────────────────────────────────┘
0 65535 localhost tape.0 /does_not_exist off nodrain offline /does_not_exist/00000000/00000010
1 2 itctabuild02.cern.ch spinner.0 /run/media/smurray/250GB/fst_spinner_storage booted rw nodrain online flat /run/media/smurray/250GB/fst_spinner_storage/00000000/00000010
*******
[itctabuild02] ~ >
[itctabuild02] ~ > sudo xrdadler32 /run/media/smurray/250GB/fst_spinner_storage/00000000/00000010
0fd802b1 /run/media/smurray/250GB/fst_spinner_storage/00000000/00000010
[itctabuild02] ~ >
Observe that EOS ignores an end user’s request to retrieve the file from tape because EOS believes the disk replica already exists:
[itctabuild02] ~ > run_eospoweruser1_shell
[itctabuild02] ~ (krb5=eospoweruser1)> xrdfs localhost prepare -s /eos/dev/userfiles/testdir_1/ten_byte_file.txt
eos:044620011458020202280000000001000042:e03bfa81.5e982b25:12
[itctabuild02] ~ (krb5=eospoweruser1)> exit
exit
[itctabuild02] ~ >
[itctabuild02] ~ > grep 'nothing to prepare' /var/log/eos/mgm/xrdlog.mgm
200416 12:02:09 time=1587031329.929413 func=HandleProtoMethodPrepareEvent level=INFO logid=static.............................. unit=mgm@itctabuild02.cern.ch:1094 tid=00007fd8b76fd700 source=WFE:1666 tident= sec=(null) uid=99 gid=99 name=- geo="" File /eos/dev/userfiles/testdir_1/ten_byte_file.txt is already on disk, nothing to prepare.
[itctabuild02] ~ >
How an end user can recover the data¶
Ask EOS to evict the disk replica:
[itctabuild02] ~ > run_eospoweruser1_shell
[itctabuild02] ~ (krb5=eospoweruser1)> xrdfs localhost prepare -e /eos/dev/userfiles/testdir_1/ten_byte_file.txt
[itctabuild02] ~ (krb5=eospoweruser1)> exit
exit
[itctabuild02] ~ >
Observe that EOS now recognises the fact that the disk replica is in fact gone:
[itctabuild02] ~ > run_eosuser1_shell
[itctabuild02] ~ (krb5=eosuser1)> eos root://localhost ls -y /eos/dev/userfiles/testdir_1/ten_byte_file.txt
d0::t1 -rw-r--r-- 1 eosuser1 eosuser1 10 Apr 16 11:55 ten_byte_file.txt
[itctabuild02] ~ (krb5=eosuser1)> exit
exit
[itctabuild02] ~ >
Request that the file be retrieved from tape:
[itctabuild02] ~ > run_eospoweruser1_shell
[itctabuild02] ~ (krb5=eospoweruser1)> xrdfs localhost prepare -s /eos/dev/userfiles/testdir_1/ten_byte_file.txt
eos:044620011458020202280000000001000042:e03bfa81.5e982b25:13
[itctabuild02] ~ (krb5=eospoweruser1)> exit
exit
[itctabuild02] ~ >
Observe that the file is both on disk and on tape (d1:t1
):
[itctabuild02] ~ > run_eosuser1_shell
[itctabuild02] ~ (krb5=eosuser1)> eos root://localhost ls -y /eos/dev/userfiles/testdir_1/ten_byte_file.txt
d1::t1 -rw-r--r-- 2 eosuser1 eosuser1 10 Apr 16 11:55 ten_byte_file.txt
[itctabuild02] ~ (krb5=eosuser1)> exit
exit
[itctabuild02] ~ >
Copy the recovered file out:
[itctabuild02] ~ > run_eosuser1_shell
[itctabuild02] ~ (krb5=eosuser1)> xrdcp root://localhost//eos/dev/userfiles/testdir_1/ten_byte_file.txt /tmp/tmp_ten_byte_file.txt
[10B/10B][100%][==================================================][10B/s]
[itctabuild02] ~ (krb5=eosuser1)> exit
exit
[itctabuild02] ~ >
[itctabuild02] ~ > cat /tmp/tmp_ten_byte_file.txt; echo
1234567890
[itctabuild02] ~ >
What a tape operator can do to recover the data¶
The same as an end user.