Project

General

Profile

Actions

Bug #4870

closed

rbd: watch request error handling bugs

Added by Alex Elder almost 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

These are error conditions that may not be likely, but...

In rbd_dev_header_watch_sync():
1) if we are initiating a watch, and there is a failure,
we should unregister the linger status on the request.
2) and if we are cancelling a watch request, and a failure
occurs when submitting the cancel request it or waiting
for its completion, we shouldn't cancel the event

The reason for (1) is that I think the request will basically
leak, because the osd client will hold an extra reference on
it.

The reason for (2) is that an event could eventually arrive
and conceivably handling that might lead to dereferencing
invalid memory.

Actions #1

Updated by Alex Elder almost 11 years ago

  • Priority changed from Normal to High
Actions #2

Updated by Alex Elder almost 11 years ago

  • Status changed from New to Resolved

The first problem listed will be addressed by changes
done for http://tracker.ceph.com/issues/3859.

The second problem mentioned is a little off. When
an event gets registered, the osd client keeps a
record of it. But if a notify arrives and there
is no registered event associated with the notification
the osd client simply ignores it. Therefore if we
cancel the event even though our osd request to clear
the watch request fails, the watch may remain on
the osd but it won't cause any problem on the client.

Finally, when tearing down the watch request, we
unregister the linger request before submitting
the "don't watch any more" request. That just
means the osd client will no longer re-submit
the watch request, and having done this call it
will no longer hold an extra reference to it.

So the net of all of this is that bug 3859 is
addressing one piece of this, and on further
consideration, the second piece is not an
issue after all.

So I'm marking this resolved (well, it will
be when 3859 is).

Actions

Also available in: Atom PDF