Project

General

Profile

Actions

Bug #20072

closed

TestStrays.test_snapshot_remove doesn't handle head whiteout in pgls results

Added by John Spray almost 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Immediate
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
5 - suggestion
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

It's possible this was always the case but we only just happened to see it affect a test?

The CephFS test TestStrays.test_snapshot_remove failed because it found some objects still existed (according to "rados ls") that should have been purged.

http://pulpito.ceph.com/jspray-2017-05-23_11:58:06-fs-wip-jcsp-testing-20170523-distro-basic-smithi/1221193

Closer inspection showed that the MDS had indeed purged the object, but the `ls` result was still showing it.

Grepping the OSD logs we can see it happening:

1221193$ zgrep -e osd_op_repl -e nls remote/smithi185/log/ceph-osd.1.log.gz | grep  10000000002.00000001
2017-05-23 13:56:23.468242 7f9cfeb01700  1 -- 172.21.15.185:6800/28848 --> 172.21.15.185:0/3138981625 -- osd_op_reply(4 10000000002.00000001 [write 0~4194304] v158'1 uv1 ondisk = 0) v8 -- 0x55a69305a340 con 0
2017-05-23 13:56:24.131490 7f9cfeb01700  1 -- 172.21.15.185:6800/28848 --> 172.21.15.103:6806/3881975179 -- osd_op_reply(48 10000000002.00000001 [trimtrunc 2@0] v158'3 uv3 ondisk = 0) v8 -- 0x55a6933b3400 con 0
2017-05-23 13:56:24.874618 7f9cfeb01700  1 -- 172.21.15.185:6800/28848 --> 172.21.15.185:0/3138981625 -- osd_op_reply(7 10000000002.00000001 [write 0~4194304 [2@0]] v158'4 uv4 ondisk = 0) v8 -- 0x55a69352da00 con 0
2017-05-23 13:56:31.412132 7f9cfeb01700  1 -- 172.21.15.185:6800/28848 --> 172.21.15.103:6806/3881975179 -- osd_op_reply(86 10000000002.00000001 [delete] v158'5 uv5 ondisk = 0) v8 -- 0x55a69352cd00 con 0
2017-05-23 13:56:32.403491 7f9d01306700  1 -- 172.21.15.185:6800/28848 --> 172.21.15.185:0/3446104548 -- osd_op_reply(11 10000000002.00000001 [read 0~786432] v0'0 uv1 ondisk = 0) v8 -- 0x55a69352ea40 con 0
2017-05-23 13:56:32.408533 7f9cfeb01700  1 -- 172.21.15.185:6800/28848 --> 172.21.15.185:0/3446104548 -- osd_op_reply(12 10000000002.00000001 [read 786432~2097152] v0'0 uv1 ondisk = 0) v8 -- 0x55a69352cd00 con 0
2017-05-23 13:56:32.413984 7f9d01306700  1 -- 172.21.15.185:6800/28848 --> 172.21.15.185:0/3446104548 -- osd_op_reply(13 10000000002.00000001 [read 2883584~1310720] v0'0 uv1 ondisk = 0) v8 -- 0x55a69352e080 con 0
2017-05-23 13:56:36.968230 7f9cfeb01700 20 osd.1 pg_epoch: 158 pg[30.5( v 158'5 (0'0,158'5] local-lis/les=155/156 n=2 ec=155/155 lis/c 155/155 les/c/f 156/156/0 155/155/155) [1,2] r=0 lpr=155 crt=158'5 lcod 158'4 mlcod 158'4 active+clean] pgnls item 0xb26eefdd, rev 0xbbf7764d 10000000002.00000001

The delete op is completing in the middle there, but then several seconds later we're seeing the pgnls including the object in its listing.

Actions

Also available in: Atom PDF