Project

General

Profile

Actions

Feature #10585

closed

use new, more reliable version of watch/notify

Added by Josh Durgin over 9 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Normal
Category:
libceph
Target version:
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:

Description

The interface exposed by librados has everything that needs to be
available to the user and a description of most of the
rados-level semantics [1]. Most of this work will be in
osd_client, and a little bit to make rbd use it.

In rbd, opening an image non-readonly causes a watch to be
established on the header object of the image. For historical
reasons, notifications were originally sent with no payload and
any notification on the image header resulted in re-reading all
the mutable image metadata. In userspace this means incrementing
the ImageCtx::refresh_seq counter, which is checked before each
operation to see if the image metadata needs to be reread. When a
watch is lost, the error callback is called and rbd compensates
for possible missed notifications by incrementing refresh_seq to
reread the header before the next operation.

In hammer and beyond the notify payload is used by images with
the exclusive lock feature bit to proxy management operations to
the lock holder, but that's a separate issue. For now the payload
can continue being ignored by krbd, and krbd doesn't need to send
notifications yet.

These details are handled by ImageWatcher in userspace, in
particular see reregister_watch() for watch error handling [2],
and how notifications are now explicitly
acked (rados_notify_ack()) by rbd.

In terms of the low-level implementation of watch/notify, the
usual MOSDOp message for rados operations is used to
register/unregister watches and send notifications with
watch/notify-specific fields. The client periodically pings osds
serving watches to make sure the connection is alive for any osds
serving watches [3]. The kernel should already be doing
this. What it doesn't do yet is expose when a watch has an error
and needs to be reregistered, and the watch flush mechanism may
need to change as well. Note that in the userspace analogue of
osd_client, the Objecter, watch/notify are called "linger" ops
for historical reasons. Objecter::handle_watch_notify() takes
care of MWatchNotify [4] messages, which are notifications or
watch errors received from the OSD.

[1] https://github.com/ceph/ceph/blob/7e5b81b38106654c0b6760b597058ad6e7655dda/src/include/rados/librados.h#L1869

[2] https://github.com/ceph/ceph/blob/796f810398cc4c828a0047ca7a4cc188a805c2af/src/librbd/ImageWatcher.cc#L987

[3] https://github.com/ceph/ceph/blob/780576ba62a3de8decdedae4545af5a853465738/src/osdc/Objecter.cc#L548

[4] https://github.com/ceph/ceph/blob/889cd874e2ded7a1350659449d777af8f4a7a918/src/messages/MWatchNotify.h


Related issues 2 (0 open2 closed)

Related to Linux kernel client - Bug #13328: fix notify completion raceResolvedDouglas Fuller10/01/2015

Actions
Blocked by Linux kernel client - Feature #9779: libceph: sync up with objecterResolvedIlya Dryomov10/14/2014

Actions
Actions #1

Updated by Josh Durgin over 9 years ago

  • Target version set to sprint2
Actions #2

Updated by Josh Durgin almost 9 years ago

  • Assignee set to Douglas Fuller
Actions #3

Updated by Ilya Dryomov almost 9 years ago

  • Category set to libceph

A high-level discussion with some links:

http://www.spinics.net/lists/ceph-devel/msg21422.html

Actions #4

Updated by Josh Durgin almost 9 years ago

  • Description updated (diff)
Actions #5

Updated by Douglas Fuller almost 9 years ago

  • Status changed from New to In Progress
Actions #6

Updated by Douglas Fuller almost 9 years ago

  • Status changed from In Progress to Fix Under Review
Actions #7

Updated by Ilya Dryomov almost 8 years ago

  • Status changed from Fix Under Review to Resolved

Done in 4.7 by way of #9779.

Actions

Also available in: Atom PDF