Bug #12748
closedbug with cache/tiering and snapshot reads [merged, needs a test in ceph-qa-suite]
0%
Description
From mailing list (http://article.gmane.org/gmane.comp.file-systems.ceph.user/22901), courtesy of Ilya:
I think I reproduced this on today's master.
Setup, cache mode is writeback:
$ ./ceph osd pool create foo 12 12
pool 'foo' created
$ ./ceph osd pool create foo-hot 12 12
pool 'foo-hot' created
$ ./ceph osd tier add foo foo-hot
pool 'foo-hot' is now (or already was) a tier of 'foo'
$ ./ceph osd tier cache-mode foo-hot writeback
set cache-mode for pool 'foo-hot' to writeback
$ ./ceph osd tier set-overlay foo foo-hot
overlay for 'foo' is now (or already was) 'foo-hot'
Create an image:
$ ./rbd create --size 10M --image-format 2 foo/bar
$ sudo ./rbd-fuse -p foo -c $PWD/ceph.conf /mnt
$ sudo mkfs.ext4 /mnt/bar
$ sudo umount /mnt
Create a snapshot, take md5sum:
$ ./rbd snap create foo/bar@snap
$ ./rbd export foo/bar /tmp/foo-1
Exporting image: 100% complete...done.
$ ./rbd export foo/bar@snap /tmp/snap-1
Exporting image: 100% complete...done.
$ md5sum /tmp/foo-1
83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-1
$ md5sum /tmp/snap-1
83f5d244bb65eb19eddce0dc94bf6dda /tmp/snap-1
Set the cache mode to forward and do a flush, hashes don't match - the
snap is empty - we bang on the hot tier and don't get redirected to the
cold tier, I suspect:
$ ./ceph osd tier cache-mode foo-hot forward
set cache-mode for pool 'foo-hot' to forward
$ ./rados -p foo-hot cache-flush-evict-all
rbd_data.100a6b8b4567.0000000000000002
rbd_id.bar
rbd_directory
rbd_header.100a6b8b4567
bar.rbd
rbd_data.100a6b8b4567.0000000000000001
rbd_data.100a6b8b4567.0000000000000000
$ ./rados -p foo-hot cache-flush-evict-all
$ ./rbd export foo/bar /tmp/foo-2
Exporting image: 100% complete...done.
$ ./rbd export foo/bar@snap /tmp/snap-2
Exporting image: 100% complete...done.
$ md5sum /tmp/foo-2
83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-2
$ md5sum /tmp/snap-2
f1c9645dbc14efddc7d8a322685f26eb /tmp/snap-2
$ od /tmp/snap-2
0000000 000000 000000 000000 000000 000000 000000 000000 000000
50000000
Disable the cache tier and we are back to normal:
$ ./ceph osd tier remove-overlay foo
there is now (or already was) no overlay for 'foo'
$ ./rbd export foo/bar /tmp/foo-3
Exporting image: 100% complete...done.
$ ./rbd export foo/bar@snap /tmp/snap-3
Exporting image: 100% complete...done.
$ md5sum /tmp/foo-3
83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-3
$ md5sum /tmp/snap-3
83f5d244bb65eb19eddce0dc94bf6dda /tmp/snap-3
I first reproduced it with the kernel client, rbd export was just to
take it out of the equation.
Also, Igor sort of raised a question in his second message: if, after
setting the cache mode to forward and doing a flush, I open an image
(not a snapshot, so may not be related to the above) for write (e.g.
with rbd-fuse), I get an rbd header object in the hot pool, even though
it's in forward mode:
$ sudo ./rbd-fuse -p foo -c $PWD/ceph.conf /mnt
$ sudo mount /mnt/bar /media
$ sudo umount /media
$ sudo umount /mnt
$ ./rados -p foo-hot ls
rbd_header.100a6b8b4567
$ ./rados -p foo ls | grep rbd_header
rbd_header.100a6b8b4567
It's been a while since I looked into tiering, is that how it's
supposed to work? It looks like it happens because rbd_header op
replies don't redirect?
Thanks,
Ilya
osd logs generated from using these instructions suggest that the initial bug is that find_object_context does not fill in pmissing for the snap read since it has a cached snapset and objectcontext. The snapset should be ignored in this case.
Updated by Samuel Just over 8 years ago
Specifically, ssc->exists should have been checked.
Updated by Samuel Just over 8 years ago
- Subject changed from bug with forward mode and snapshot reads to bug with cache/tiering and snapshot reads
Updated by Loïc Dachary over 8 years ago
- Status changed from 12 to In Progress
- Assignee set to Loïc Dachary
Updated by Loïc Dachary over 8 years ago
- Assignee deleted (
Loïc Dachary)
Won't make progress on this in the next few days, unassigning myself for now.
Updated by Kefu Chai over 8 years ago
- Status changed from 12 to Fix Under Review
Updated by Kefu Chai over 8 years ago
- Subject changed from bug with cache/tiering and snapshot reads to bug with cache/tiering and snapshot reads [merged, needs a test in ceph-qa-suite]
per sage, we need to add a test triggering this:
maybe add something to singleton-nomsgr/all ?
Updated by Kefu Chai over 8 years ago
- Status changed from Fix Under Review to 7
Updated by Kefu Chai over 8 years ago
- Status changed from 7 to Fix Under Review
Updated by Samuel Just over 8 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Nathan Cutler over 8 years ago
infernalis PR: https://github.com/ceph/ceph/pull/6261
Updated by Nathan Cutler over 8 years ago
Sam, you changed status to Pending Backport, but the backport field is empty.
Updated by Kefu Chai over 8 years ago
- Backport set to hammer
Nathan,
sorry for the confusion, just added hammer to the backport field.
Updated by Loïc Dachary over 8 years ago
- Copied to Backport #13693: bug with cache/tiering and snapshot reads [merged, needs a test in ceph-qa-suite] added
Updated by Loïc Dachary over 8 years ago
- Status changed from Pending Backport to Resolved