Bug #7350
osd: scrub does not detect recently touched and then renamed backend files
0%
Description
This is on Dumpling (0.67.5-1precise).
Steps to reproduce:
Create a single-byte RADOS object and read it back:
root@ubuntu-ceph1:~# rados -p test put onebyte - <<< 'A'
root@ubuntu-ceph1:~# rados -p test get onebyte -
A
Figure out where it's stored:
root@ubuntu-ceph1:~# ceph osd map test onebyte
osdmap e120 pool 'test' (3) object 'onebyte' -> pg 3.ed47d009 (3.1) -> up [0,2] acting [0,2]
Simulate data corruption/bitrot. Open file in vi, replace 'A' with 'B'.
root@ubuntu-ceph1:~# vi /var/lib/ceph/osd/ceph-0/current/3.1_head/onebyte__head_ED47D009__3
Read back the file
root@ubuntu-ceph1:~# rados -p test get onebyte -
A
Interesting. We modified the Primary, it silently reads back the (correct) copy from the replica.
OK, let's scrub. After all the mtime was modified.
root@ubuntu-ceph1:~# ceph pg scrub 3.1; ceph -w
instructing pg 3.1 on osd.0 to scrub
cluster bd70ea39-58fc-4117-ade1-03a4d429cb49
health HEALTH_OK
monmap e4: 3 mons at {ubuntu-ceph1=192.168.122.201:6789/0,ubuntu-ceph2=192.168.122.202:6789/0,ubuntu-ceph3=192.168.122.203:6789/0}, election epoch 168, quorum 0,1,2 ubuntu-ceph1,ubuntu-ceph2,ubuntu-ceph3
osdmap e120: 3 osds: 3 up, 3 in
pgmap v893: 200 pgs: 200 active+clean; 2 bytes data, 108 MB used, 15218 MB / 15326 MB avail; 34B/s rd, 0op/s
mdsmap e1: 0/0/1 up
2014-02-06 12:55:25.815616 mon.0 [INF] pgmap v893: 200 pgs: 200 active+clean; 2 bytes data, 108 MB used, 15218 MB / 15326 MB avail; 34B/s rd, 0op/s
2014-02-06 12:55:45.827785 mon.0 [INF] pgmap v894: 200 pgs: 200 active+clean; 2 bytes data, 108 MB used, 15218 MB / 15326 MB avail; 20B/s rd, 0op/s
2014-02-06 12:55:42.768414 osd.0 [INF] 3.1 scrub ok
Interesting again. Scrub completes OK, no errors. Let's read it back again:
root@ubuntu-ceph1:~# rados -p test get onebyte -
A
This despite the primary copy being different.
root@ubuntu-ceph1:~# cat /var/lib/ceph/osd/ceph-0/current/3.1_head/onebyte__head_ED47D009__3
B
All right, let's deep scrub.
root@ubuntu-ceph1:~# ceph pg deep-scrub 3.1; ceph -w
instructing pg 3.1 on osd.0 to deep-scrub
cluster bd70ea39-58fc-4117-ade1-03a4d429cb49
health HEALTH_OK
monmap e4: 3 mons at {ubuntu-ceph1=192.168.122.201:6789/0,ubuntu-ceph2=192.168.122.202:6789/0,ubuntu-ceph3=192.168.122.203:6789/0}, election epoch 168, quorum 0,1,2 ubuntu-ceph1,ubuntu-ceph2,ubuntu-ceph3
osdmap e120: 3 osds: 3 up, 3 in
pgmap v896: 200 pgs: 200 active+clean; 2 bytes data, 108 MB used, 15218 MB / 15326 MB avail; 40B/s rd, 0op/s
mdsmap e1: 0/0/1 up
2014-02-06 12:56:11.218615 mon.0 [INF] pgmap v896: 200 pgs: 200 active+clean; 2 bytes data, 108 MB used, 15218 MB / 15326 MB avail; 40B/s rd, 0op/s
2014-02-06 12:56:15.842525 mon.0 [INF] pgmap v897: 200 pgs: 200 active+clean; 2 bytes data, 108 MB used, 15218 MB / 15326 MB avail
2014-02-06 12:56:14.788259 osd.0 [INF] 3.1 deep-scrub ok
No errors? Really?
root@ubuntu-ceph1:~# ceph health detail
HEALTH_OK
Yes, really. Odd.
Let's stop the OSD and see if we get the same results.
root@ubuntu-ceph1:~# sudo stop ceph-osd id=0
ceph-osd stop/waiting
root@ubuntu-ceph1:~# rados -p test get onebyte -
A
Hmmm, OK. Well, what happens if we bring the OSD back up?
root@ubuntu-ceph1:~# sudo start ceph-osd id=0
ceph-osd (ceph/0) start/running, process 11150
root@ubuntu-ceph1:~# rados -p test get onebyte -
error getting test/onebyte: No such file or directory
Yeah, ouch. What's our health status?
root@ubuntu-ceph1:~# ceph health detail
HEALTH_OK
Really? I don't think so. What's the status of our PGs?
root@ubuntu-ceph1:~# ceph -w
cluster bd70ea39-58fc-4117-ade1-03a4d429cb49
health HEALTH_OK
monmap e4: 3 mons at {ubuntu-ceph1=192.168.122.201:6789/0,ubuntu-ceph2=192.168.122.202:6789/0,ubuntu-ceph3=192.168.122.203:6789/0}, election epoch 168, quorum 0,1,2 ubuntu-ceph1,ubuntu-ceph2,ubuntu-ceph3
osdmap e124: 3 osds: 3 up, 3 in
pgmap v904: 200 pgs: 200 active+clean; 2 bytes data, 110 MB used, 15216 MB / 15326 MB avail
mdsmap e1: 0/0/1 up
2014-02-06 12:57:08.263928 mon.0 [INF] pgmap v904: 200 pgs: 200 active+clean; 2 bytes data, 110 MB used, 15216 MB / 15326 MB avail
All PGs active and clean? I have a hard time believing that.
Hmm, so let's see. What was my PG again?
root@ubuntu-ceph1:~# ceph osd map test onebyte
osdmap e124 pool 'test' (3) object 'onebyte' -> pg 3.ed47d009 (3.1) -> up [0,2] acting [0,2]
Maybe I'll repair it, even though my cluster tells me all is well?
root@ubuntu-ceph1:~# ceph pg repair 3.1
instructing pg 3.1 on osd.0 to repair
root@ubuntu-ceph1:~# rados -p test get onebyte -
A
That did it. But how Joe Average User should come to the conclusion that that is how to fix this I have no idea. :)
History
#1 Updated by Florian Haas about 10 years ago
- Subject changed from Corrupted object undetected by both scrub and deep-scrub, appers lost when restarting primary OSD to Corrupted object undetected by both scrub and deep-scrub, appears lost when restarting primary OSD
#2 Updated by Sage Weil about 10 years ago
- Subject changed from Corrupted object undetected by both scrub and deep-scrub, appears lost when restarting primary OSD to osd: scrub does not detect recently renamed backend files
- Status changed from New to 12
the problem is that vi is renaming the file and we cache recently opened files. use echo asdf >> file or similar to modify the same file/inode, or make the fdcache flush that entry by generating some load in between with something like rados bench.
#3 Updated by Sage Weil about 10 years ago
- Subject changed from osd: scrub does not detect recently renamed backend files to osd: scrub does not detect recently touched and then renamed backend files
#4 Updated by Florian Haas about 10 years ago
- Severity changed from 3 - minor to 4 - irritation
Thanks Sage -- I can confirm that the issue does not appear when echo'ing directly into the file. So evidently it was indeed vi doing a rename; sorry for the noise. Considering a rename would hardly be the result of bitrot, and for good measure I did check if anything did fail when I flipped bits in the file xattrs instead of the file content (it did not), I'm downgrading this one to "irritation".
#5 Updated by Sage Weil almost 10 years ago
- Status changed from 12 to Won't Fix