Project

General

Profile

Actions

Bug #12748

closed

bug with cache/tiering and snapshot reads [merged, needs a test in ceph-qa-suite]

Added by Samuel Just over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From mailing list (http://article.gmane.org/gmane.comp.file-systems.ceph.user/22901), courtesy of Ilya:

I think I reproduced this on today's master.

Setup, cache mode is writeback:

$ ./ceph osd pool create foo 12 12
pool 'foo' created
$ ./ceph osd pool create foo-hot 12 12
pool 'foo-hot' created
$ ./ceph osd tier add foo foo-hot
pool 'foo-hot' is now (or already was) a tier of 'foo'
$ ./ceph osd tier cache-mode foo-hot writeback
set cache-mode for pool 'foo-hot' to writeback
$ ./ceph osd tier set-overlay foo foo-hot
overlay for 'foo' is now (or already was) 'foo-hot'

Create an image:

$ ./rbd create --size 10M --image-format 2 foo/bar
$ sudo ./rbd-fuse -p foo -c $PWD/ceph.conf /mnt
$ sudo mkfs.ext4 /mnt/bar
$ sudo umount /mnt

Create a snapshot, take md5sum:

$ ./rbd snap create foo/bar@snap
$ ./rbd export foo/bar /tmp/foo-1
Exporting image: 100% complete...done.
$ ./rbd export foo/bar@snap /tmp/snap-1
Exporting image: 100% complete...done.
$ md5sum /tmp/foo-1
83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-1
$ md5sum /tmp/snap-1
83f5d244bb65eb19eddce0dc94bf6dda /tmp/snap-1

Set the cache mode to forward and do a flush, hashes don't match - the
snap is empty - we bang on the hot tier and don't get redirected to the
cold tier, I suspect:

$ ./ceph osd tier cache-mode foo-hot forward
set cache-mode for pool 'foo-hot' to forward
$ ./rados -p foo-hot cache-flush-evict-all
rbd_data.100a6b8b4567.0000000000000002
rbd_id.bar
rbd_directory
rbd_header.100a6b8b4567
bar.rbd
rbd_data.100a6b8b4567.0000000000000001
rbd_data.100a6b8b4567.0000000000000000
$ ./rados -p foo-hot cache-flush-evict-all
$ ./rbd export foo/bar /tmp/foo-2
Exporting image: 100% complete...done.
$ ./rbd export foo/bar@snap /tmp/snap-2
Exporting image: 100% complete...done.
$ md5sum /tmp/foo-2
83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-2
$ md5sum /tmp/snap-2
f1c9645dbc14efddc7d8a322685f26eb /tmp/snap-2
$ od /tmp/snap-2
0000000 000000 000000 000000 000000 000000 000000 000000 000000

50000000

Disable the cache tier and we are back to normal:

$ ./ceph osd tier remove-overlay foo
there is now (or already was) no overlay for 'foo'
$ ./rbd export foo/bar /tmp/foo-3
Exporting image: 100% complete...done.
$ ./rbd export foo/bar@snap /tmp/snap-3
Exporting image: 100% complete...done.
$ md5sum /tmp/foo-3
83f5d244bb65eb19eddce0dc94bf6dda /tmp/foo-3
$ md5sum /tmp/snap-3
83f5d244bb65eb19eddce0dc94bf6dda /tmp/snap-3

I first reproduced it with the kernel client, rbd export was just to
take it out of the equation.

Also, Igor sort of raised a question in his second message: if, after
setting the cache mode to forward and doing a flush, I open an image
(not a snapshot, so may not be related to the above) for write (e.g.
with rbd-fuse), I get an rbd header object in the hot pool, even though
it's in forward mode:

$ sudo ./rbd-fuse -p foo -c $PWD/ceph.conf /mnt
$ sudo mount /mnt/bar /media
$ sudo umount /media
$ sudo umount /mnt
$ ./rados -p foo-hot ls
rbd_header.100a6b8b4567
$ ./rados -p foo ls | grep rbd_header
rbd_header.100a6b8b4567

It's been a while since I looked into tiering, is that how it's
supposed to work? It looks like it happens because rbd_header op
replies don't redirect?

Thanks,

Ilya

osd logs generated from using these instructions suggest that the initial bug is that find_object_context does not fill in pmissing for the snap read since it has a cached snapset and objectcontext. The snapset should be ignored in this case.


Related issues 1 (0 open1 closed)

Copied to Ceph - Backport #13693: bug with cache/tiering and snapshot reads [merged, needs a test in ceph-qa-suite]ResolvedAbhishek LekshmananActions
Actions #1

Updated by Samuel Just over 8 years ago

Specifically, ssc->exists should have been checked.

Actions #2

Updated by Sage Weil over 8 years ago

  • Status changed from New to 12
Actions #3

Updated by Samuel Just over 8 years ago

  • Subject changed from bug with forward mode and snapshot reads to bug with cache/tiering and snapshot reads
Actions #4

Updated by Loïc Dachary over 8 years ago

  • Status changed from 12 to In Progress
  • Assignee set to Loïc Dachary
Actions #5

Updated by Loïc Dachary over 8 years ago

  • Description updated (diff)
Actions #6

Updated by Loïc Dachary over 8 years ago

  • Description updated (diff)
Actions #7

Updated by Loïc Dachary over 8 years ago

  • Description updated (diff)
Actions #8

Updated by Loïc Dachary over 8 years ago

  • Status changed from In Progress to 12
Actions #9

Updated by Loïc Dachary over 8 years ago

  • Assignee deleted (Loïc Dachary)

Won't make progress on this in the next few days, unassigning myself for now.

Actions #10

Updated by Kefu Chai over 8 years ago

  • Assignee set to Kefu Chai
Actions #11

Updated by Kefu Chai over 8 years ago

  • Status changed from 12 to Fix Under Review
Actions #12

Updated by Kefu Chai over 8 years ago

  • Subject changed from bug with cache/tiering and snapshot reads to bug with cache/tiering and snapshot reads [merged, needs a test in ceph-qa-suite]

per sage, we need to add a test triggering this:

maybe add something to singleton-nomsgr/all ?

Actions #13

Updated by Kefu Chai over 8 years ago

  • Status changed from Fix Under Review to 7
Actions #14

Updated by Kefu Chai over 8 years ago

  • Status changed from 7 to Fix Under Review
Actions #15

Updated by Samuel Just over 8 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #17

Updated by Nathan Cutler over 8 years ago

Sam, you changed status to Pending Backport, but the backport field is empty.

Actions #18

Updated by Kefu Chai over 8 years ago

  • Backport set to hammer

Nathan,

sorry for the confusion, just added hammer to the backport field.

Actions #19

Updated by Loïc Dachary over 8 years ago

  • Copied to Backport #13693: bug with cache/tiering and snapshot reads [merged, needs a test in ceph-qa-suite] added
Actions #20

Updated by Loïc Dachary over 8 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF