Bug #12748
Updated by Loïc Dachary over 8 years ago
From mailing list (http://article.gmane.org/gmane.comp.file-systems.ceph.user/22901), list, courtesy of Ilya:
I think I reproduced this on today's master.
Setup, cache mode is writeback:
$ ./ceph osd pool create foo 12 12
pool 'foo' created
$ ./ceph osd pool create foo-hot 12 12
pool 'foo-hot' created
$ ./ceph osd tier add foo foo-hot
pool 'foo-hot' is now (or already was) a tier of 'foo'
$ ./ceph osd tier cache-mode foo-hot writeback
set cache-mode for pool 'foo-hot' to writeback
$ ./ceph osd tier set-overlay foo foo-hot
overlay for 'foo' is now (or already was) 'foo-hot'
Create an image:
$ ./rbd create --size 10M --image-format 2 foo/bar
$ sudo ./rbd-fuse -p foo -c $PWD/ceph.conf /mnt
$ sudo mkfs.ext4 /mnt/bar
$ sudo umount /mnt
Create a snapshot, take md5sum:
$ ./rbd snap create foo/bar@snap
$ ./rbd export foo/bar /tmp/foo-1
Exporting image: 100% complete...done.
$ ./rbd export foo/bar@snap /tmp/snap-1
Exporting image: 100% complete...done.
$ md5sum /tmp/foo-1
83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-1
$ md5sum /tmp/snap-1
83f5d244bb65eb19eddce0dc94bf6dda /tmp/snap-1
Set the cache mode to forward and do a flush, hashes don't match - the
snap is empty - we bang on the hot tier and don't get redirected to the
cold tier, I suspect:
$ ./ceph osd tier cache-mode foo-hot forward
set cache-mode for pool 'foo-hot' to forward
$ ./rados -p foo-hot cache-flush-evict-all
rbd_data.100a6b8b4567.0000000000000002
rbd_id.bar
rbd_directory
rbd_header.100a6b8b4567
bar.rbd
rbd_data.100a6b8b4567.0000000000000001
rbd_data.100a6b8b4567.0000000000000000
$ ./rados -p foo-hot cache-flush-evict-all
$ ./rbd export foo/bar /tmp/foo-2
Exporting image: 100% complete...done.
$ ./rbd export foo/bar@snap /tmp/snap-2
Exporting image: 100% complete...done.
$ md5sum /tmp/foo-2
83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-2
$ md5sum /tmp/snap-2
f1c9645dbc14efddc7d8a322685f26eb /tmp/snap-2
$ od /tmp/snap-2
0000000 000000 000000 000000 000000 000000 000000 000000 000000
50000000
Disable the cache tier and we are back to normal:
$ ./ceph osd tier remove-overlay foo
there is now (or already was) no overlay for 'foo'
$ ./rbd export foo/bar /tmp/foo-3
Exporting image: 100% complete...done.
$ ./rbd export foo/bar@snap /tmp/snap-3
Exporting image: 100% complete...done.
$ md5sum /tmp/foo-3
83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-3
$ md5sum /tmp/snap-3
83f5d244bb65eb19eddce0dc94bf6dda /tmp/snap-3
I first reproduced it with the kernel client, rbd export was just to
take it out of the equation.
Also, Igor sort of raised a question in his second message: if, after
setting the cache mode to forward and doing a flush, I open an image
(not a snapshot, so may not be related to the above) for write (e.g.
with rbd-fuse), I get an rbd header object in the hot pool, even though
it's in forward mode:
$ sudo ./rbd-fuse -p foo -c $PWD/ceph.conf /mnt
$ sudo mount /mnt/bar /media
$ sudo umount /media
$ sudo umount /mnt
$ ./rados -p foo-hot ls
rbd_header.100a6b8b4567
$ ./rados -p foo ls | grep rbd_header
rbd_header.100a6b8b4567
It's been a while since I looked into tiering, is that how it's
supposed to work? It looks like it happens because rbd_header op
replies don't redirect?
Thanks,
Ilya
osd logs generated from using these instructions suggest that the initial bug is that find_object_context does not fill in pmissing for the snap read since it has a cached snapset and objectcontext. The snapset should be ignored in this case.
I think I reproduced this on today's master.
Setup, cache mode is writeback:
$ ./ceph osd pool create foo 12 12
pool 'foo' created
$ ./ceph osd pool create foo-hot 12 12
pool 'foo-hot' created
$ ./ceph osd tier add foo foo-hot
pool 'foo-hot' is now (or already was) a tier of 'foo'
$ ./ceph osd tier cache-mode foo-hot writeback
set cache-mode for pool 'foo-hot' to writeback
$ ./ceph osd tier set-overlay foo foo-hot
overlay for 'foo' is now (or already was) 'foo-hot'
Create an image:
$ ./rbd create --size 10M --image-format 2 foo/bar
$ sudo ./rbd-fuse -p foo -c $PWD/ceph.conf /mnt
$ sudo mkfs.ext4 /mnt/bar
$ sudo umount /mnt
Create a snapshot, take md5sum:
$ ./rbd snap create foo/bar@snap
$ ./rbd export foo/bar /tmp/foo-1
Exporting image: 100% complete...done.
$ ./rbd export foo/bar@snap /tmp/snap-1
Exporting image: 100% complete...done.
$ md5sum /tmp/foo-1
83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-1
$ md5sum /tmp/snap-1
83f5d244bb65eb19eddce0dc94bf6dda /tmp/snap-1
Set the cache mode to forward and do a flush, hashes don't match - the
snap is empty - we bang on the hot tier and don't get redirected to the
cold tier, I suspect:
$ ./ceph osd tier cache-mode foo-hot forward
set cache-mode for pool 'foo-hot' to forward
$ ./rados -p foo-hot cache-flush-evict-all
rbd_data.100a6b8b4567.0000000000000002
rbd_id.bar
rbd_directory
rbd_header.100a6b8b4567
bar.rbd
rbd_data.100a6b8b4567.0000000000000001
rbd_data.100a6b8b4567.0000000000000000
$ ./rados -p foo-hot cache-flush-evict-all
$ ./rbd export foo/bar /tmp/foo-2
Exporting image: 100% complete...done.
$ ./rbd export foo/bar@snap /tmp/snap-2
Exporting image: 100% complete...done.
$ md5sum /tmp/foo-2
83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-2
$ md5sum /tmp/snap-2
f1c9645dbc14efddc7d8a322685f26eb /tmp/snap-2
$ od /tmp/snap-2
0000000 000000 000000 000000 000000 000000 000000 000000 000000
50000000
Disable the cache tier and we are back to normal:
$ ./ceph osd tier remove-overlay foo
there is now (or already was) no overlay for 'foo'
$ ./rbd export foo/bar /tmp/foo-3
Exporting image: 100% complete...done.
$ ./rbd export foo/bar@snap /tmp/snap-3
Exporting image: 100% complete...done.
$ md5sum /tmp/foo-3
83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-3
$ md5sum /tmp/snap-3
83f5d244bb65eb19eddce0dc94bf6dda /tmp/snap-3
I first reproduced it with the kernel client, rbd export was just to
take it out of the equation.
Also, Igor sort of raised a question in his second message: if, after
setting the cache mode to forward and doing a flush, I open an image
(not a snapshot, so may not be related to the above) for write (e.g.
with rbd-fuse), I get an rbd header object in the hot pool, even though
it's in forward mode:
$ sudo ./rbd-fuse -p foo -c $PWD/ceph.conf /mnt
$ sudo mount /mnt/bar /media
$ sudo umount /media
$ sudo umount /mnt
$ ./rados -p foo-hot ls
rbd_header.100a6b8b4567
$ ./rados -p foo ls | grep rbd_header
rbd_header.100a6b8b4567
It's been a while since I looked into tiering, is that how it's
supposed to work? It looks like it happens because rbd_header op
replies don't redirect?
Thanks,
Ilya
osd logs generated from using these instructions suggest that the initial bug is that find_object_context does not fill in pmissing for the snap read since it has a cached snapset and objectcontext. The snapset should be ignored in this case.