Bug #20072: TestStrays.test_snapshot_remove doesn't handle head whiteout in pgls results - CephFS - Ceph

Actions

Copy link

Bug #20072

closed

TestStrays.test_snapshot_remove doesn't handle head whiteout in pgls results

Added by John Spray almost 7 years ago. Updated over 6 years ago.

Status:

Resolved

Priority:

Immediate

Assignee:

Patrick Donnelly

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

5 - suggestion

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

It's possible this was always the case but we only just happened to see it affect a test?

The CephFS test TestStrays.test_snapshot_remove failed because it found some objects still existed (according to "rados ls") that should have been purged.

http://pulpito.ceph.com/jspray-2017-05-23_11:58:06-fs-wip-jcsp-testing-20170523-distro-basic-smithi/1221193

Closer inspection showed that the MDS had indeed purged the object, but the `ls` result was still showing it.

Grepping the OSD logs we can see it happening:

1221193$ zgrep -e osd_op_repl -e nls remote/smithi185/log/ceph-osd.1.log.gz | grep  10000000002.00000001
2017-05-23 13:56:23.468242 7f9cfeb01700  1 -- 172.21.15.185:6800/28848 --> 172.21.15.185:0/3138981625 -- osd_op_reply(4 10000000002.00000001 [write 0~4194304] v158'1 uv1 ondisk = 0) v8 -- 0x55a69305a340 con 0
2017-05-23 13:56:24.131490 7f9cfeb01700  1 -- 172.21.15.185:6800/28848 --> 172.21.15.103:6806/3881975179 -- osd_op_reply(48 10000000002.00000001 [trimtrunc 2@0] v158'3 uv3 ondisk = 0) v8 -- 0x55a6933b3400 con 0
2017-05-23 13:56:24.874618 7f9cfeb01700  1 -- 172.21.15.185:6800/28848 --> 172.21.15.185:0/3138981625 -- osd_op_reply(7 10000000002.00000001 [write 0~4194304 [2@0]] v158'4 uv4 ondisk = 0) v8 -- 0x55a69352da00 con 0
2017-05-23 13:56:31.412132 7f9cfeb01700  1 -- 172.21.15.185:6800/28848 --> 172.21.15.103:6806/3881975179 -- osd_op_reply(86 10000000002.00000001 [delete] v158'5 uv5 ondisk = 0) v8 -- 0x55a69352cd00 con 0
2017-05-23 13:56:32.403491 7f9d01306700  1 -- 172.21.15.185:6800/28848 --> 172.21.15.185:0/3446104548 -- osd_op_reply(11 10000000002.00000001 [read 0~786432] v0'0 uv1 ondisk = 0) v8 -- 0x55a69352ea40 con 0
2017-05-23 13:56:32.408533 7f9cfeb01700  1 -- 172.21.15.185:6800/28848 --> 172.21.15.185:0/3446104548 -- osd_op_reply(12 10000000002.00000001 [read 786432~2097152] v0'0 uv1 ondisk = 0) v8 -- 0x55a69352cd00 con 0
2017-05-23 13:56:32.413984 7f9d01306700  1 -- 172.21.15.185:6800/28848 --> 172.21.15.185:0/3446104548 -- osd_op_reply(13 10000000002.00000001 [read 2883584~1310720] v0'0 uv1 ondisk = 0) v8 -- 0x55a69352e080 con 0
2017-05-23 13:56:36.968230 7f9cfeb01700 20 osd.1 pg_epoch: 158 pg[30.5( v 158'5 (0'0,158'5] local-lis/les=155/156 n=2 ec=155/155 lis/c 155/155 les/c/f 156/156/0 155/155/155) [1,2] r=0 lpr=155 crt=158'5 lcod 158'4 mlcod 158'4 active+clean] pgnls item 0xb26eefdd, rev 0xbbf7764d 10000000002.00000001

The delete op is completing in the middle there, but then several seconds later we're seeing the pgnls including the object in its listing.

Actions

Copy link

Updated by Sage Weil almost 7 years ago

Assignee set to Sage Weil
Priority changed from Normal to Immediate

Actions

Copy link

Updated by Ilya Dryomov almost 7 years ago

I've got a bunch of krbd test failures which I tracked down to a recent change in "rados ls" behavior.

Before:

$ rbd create --size 16 --image-feature layering a
<map and fill>
$ rados -p rbd ls
rbd_directory
rbd_data.10116b8b4567.0000000000000003
rbd_data.10116b8b4567.0000000000000001
rbd_info
rbd_data.10116b8b4567.0000000000000002
rbd_data.10116b8b4567.0000000000000000
rbd_header.10116b8b4567
rbd_id.a
$ rbd snap create a@snap
<discard -- delete HEADs>
$ rados -p rbd ls
rbd_directory
rbd_info
rbd_header.10116b8b4567
rbd_id.a

After, deleted HEADs are listed:

$ rbd create --size 16 --image-feature layering a
<map and fill>
$ rados -p rbd ls
rbd_data.100f6b8b4567.0000000000000002
rbd_data.100f6b8b4567.0000000000000003
rbd_directory
rbd_info
rbd_data.100f6b8b4567.0000000000000001
rbd_data.100f6b8b4567.0000000000000000
rbd_id.a
rbd_header.100f6b8b4567
$ rbd snap create a@snap
<discard -- delete HEADs>
$ rados -p rbd ls
rbd_data.100f6b8b4567.0000000000000002
rbd_data.100f6b8b4567.0000000000000003
rbd_directory
rbd_info
rbd_data.100f6b8b4567.0000000000000001
rbd_data.100f6b8b4567.0000000000000000
rbd_id.a
rbd_header.100f6b8b4567

This is both on filestore and bluestore, in 6152fd9f01fffbb1a2a96c46bb12f30a31da3257..404cee744fb6891fe7bf48cfe012cecdc1c2eba1 range. Josh confirms that this wasn't intended -- I suspect the snapset refactoring.

Actions

Copy link

Updated by Sage Weil almost 7 years ago

Subject changed from pgnls results show objects that have been deleted (bluestore) to pgnls results show objects that have been deleted
Status changed from New to 12

Ah, yes, this is the SnapSet refactor, not bluestore. This was a semi-intentional change.

Before, we pgnls would include whiteouts, but those only appeared in cache pools. After, we also have a whiteout when a clone exists but the head of the object is deleted.

We can make it not include whiteouts by default, but then tools enumerating cache pools will be confused. I think we probably want to add a flag to include whiteouts that is non-default and change all of the cache tier code to make use of that. Bleh...

Actions

Copy link

Updated by Greg Farnum almost 7 years ago

Hmm, listing and snapshots have never gotten along well, but that makes me think we should list stuff even if it doesn't have a HEAD object — we don't support listing of non-HEAD so without this we can't see clones at all.

It is annoying for the FS and rbd tests, but I'd rather change those than have "invisible" objects.

Actions

Copy link

Updated by Ilya Dryomov almost 7 years ago

I don't think we want a plain "rados ls" with no flags (--all?) output something that can't be "rados stat"ed. Support for listing of non-HEAD needs to be thought through: how is non-HEAD distinguished from HEAD in the output, how do we expose unique versions, which commands accept non-HEAD as input and which don't, etc.

Actions

Copy link

Updated by Sage Weil almost 7 years ago

Agree with Greg. Also: making pgnls skip whiteout objects means we need to load the object_info_t, which is significantly more expensive.

We do have the list-snaps operation to list snaps on an object--that should work on a whiteout. It isn't immediately clear that that's what you have to do, yes, but you can infer it from an ENOENT.

Also, we've yet to come across a real user of the pgnls interface that needs to do this. The only users right now are cephfs repair and all of the tests.

Actions

Copy link

Updated by Sage Weil almost 7 years ago

Subject changed from pgnls results show objects that have been deleted to TestStrays.test_snapshot_remove doesn't handle head whiteout in pgls results
Assignee deleted (~~Sage Weil~~)

Actions

Copy link

Updated by Sage Weil almost 7 years ago

Project changed from Ceph to CephFS
Category deleted (~~OSD~~)

Actions

Copy link

Updated by John Spray almost 7 years ago

Just to check I understand -- the claim is that we are seeing the object in pgnls output because it has a snapshot that hasn't been removed yet?

If so, we should be able to just update the cephfs test to do a wait_until_true while it waits for the removed_snaps in the osdmap to propagate to the OSDs.

Actions

Copy link

#10