Actions
Bug #20072
closedTestStrays.test_snapshot_remove doesn't handle head whiteout in pgls results
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
5 - suggestion
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
It's possible this was always the case but we only just happened to see it affect a test?
The CephFS test TestStrays.test_snapshot_remove failed because it found some objects still existed (according to "rados ls") that should have been purged.
Closer inspection showed that the MDS had indeed purged the object, but the `ls` result was still showing it.
Grepping the OSD logs we can see it happening:
1221193$ zgrep -e osd_op_repl -e nls remote/smithi185/log/ceph-osd.1.log.gz | grep 10000000002.00000001 2017-05-23 13:56:23.468242 7f9cfeb01700 1 -- 172.21.15.185:6800/28848 --> 172.21.15.185:0/3138981625 -- osd_op_reply(4 10000000002.00000001 [write 0~4194304] v158'1 uv1 ondisk = 0) v8 -- 0x55a69305a340 con 0 2017-05-23 13:56:24.131490 7f9cfeb01700 1 -- 172.21.15.185:6800/28848 --> 172.21.15.103:6806/3881975179 -- osd_op_reply(48 10000000002.00000001 [trimtrunc 2@0] v158'3 uv3 ondisk = 0) v8 -- 0x55a6933b3400 con 0 2017-05-23 13:56:24.874618 7f9cfeb01700 1 -- 172.21.15.185:6800/28848 --> 172.21.15.185:0/3138981625 -- osd_op_reply(7 10000000002.00000001 [write 0~4194304 [2@0]] v158'4 uv4 ondisk = 0) v8 -- 0x55a69352da00 con 0 2017-05-23 13:56:31.412132 7f9cfeb01700 1 -- 172.21.15.185:6800/28848 --> 172.21.15.103:6806/3881975179 -- osd_op_reply(86 10000000002.00000001 [delete] v158'5 uv5 ondisk = 0) v8 -- 0x55a69352cd00 con 0 2017-05-23 13:56:32.403491 7f9d01306700 1 -- 172.21.15.185:6800/28848 --> 172.21.15.185:0/3446104548 -- osd_op_reply(11 10000000002.00000001 [read 0~786432] v0'0 uv1 ondisk = 0) v8 -- 0x55a69352ea40 con 0 2017-05-23 13:56:32.408533 7f9cfeb01700 1 -- 172.21.15.185:6800/28848 --> 172.21.15.185:0/3446104548 -- osd_op_reply(12 10000000002.00000001 [read 786432~2097152] v0'0 uv1 ondisk = 0) v8 -- 0x55a69352cd00 con 0 2017-05-23 13:56:32.413984 7f9d01306700 1 -- 172.21.15.185:6800/28848 --> 172.21.15.185:0/3446104548 -- osd_op_reply(13 10000000002.00000001 [read 2883584~1310720] v0'0 uv1 ondisk = 0) v8 -- 0x55a69352e080 con 0 2017-05-23 13:56:36.968230 7f9cfeb01700 20 osd.1 pg_epoch: 158 pg[30.5( v 158'5 (0'0,158'5] local-lis/les=155/156 n=2 ec=155/155 lis/c 155/155 les/c/f 156/156/0 155/155/155) [1,2] r=0 lpr=155 crt=158'5 lcod 158'4 mlcod 158'4 active+clean] pgnls item 0xb26eefdd, rev 0xbbf7764d 10000000002.00000001
The delete op is completing in the middle there, but then several seconds later we're seeing the pgnls including the object in its listing.
Actions