Project

General

Profile

Bug #2796

osd: watch state not reestablished when registration op resent

Added by Sage Weil about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
Objecter
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
argonaut
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

if the client doesn't get the watch ack and resends, the osd will ignore it as a dup op, and the watch session state is not reestablished.

Associated revisions

Revision 5dd68b95 (diff)
Added by Sage Weil about 11 years ago

objecter: always resend linger registrations

If a linger op (watch) is sent to the OSD and updates the object, and then
the client loses the reply, it will resend the request. The OSD will see
that it is a dup, however, and not set up the in-memory session state for
the watch. This in turn will break the watch (i.e., notifies won't
get delivered).

Instead, always resend linger registration ops, so that we always have a
unique reqid and do the correct session registeration for each session.

  • track the tid of the registation op for each LingerOp
  • mark registrations ops as should_resend=false; cancel as needed
  • when we send a new registration op, cancel the old one to ensure we
    ignore the reply. This is needed becuase we resend linger ops on any
    pg change, not just a primary change.
  • drop the first_send arg to send_linger(), as we can now infer that
    from register_tid == 0.

The bug was easily reproduced with ms inject socket failures = 500 and the
test_stress_watch utility.

Fixes: #2796
Signed-off-by: Sage Weil <>
Reviewed-by: Josh Durgin <>

Revision 682609a9 (diff)
Added by Sage Weil about 11 years ago

objecter: always resend linger registrations

If a linger op (watch) is sent to the OSD and updates the object, and then
the client loses the reply, it will resend the request. The OSD will see
that it is a dup, however, and not set up the in-memory session state for
the watch. This in turn will break the watch (i.e., notifies won't
get delivered).

Instead, always resend linger registration ops, so that we always have a
unique reqid and do the correct session registeration for each session.

  • track the tid of the registation op for each LingerOp
  • mark registrations ops as should_resend=false; cancel as needed
  • when we send a new registration op, cancel the old one to ensure we
    ignore the reply. This is needed becuase we resend linger ops on any
    pg change, not just a primary change.
  • drop the first_send arg to send_linger(), as we can now infer that
    from register_tid == 0.

The bug was easily reproduced with ms inject socket failures = 500 and the
test_stress_watch utility.

Fixes: #2796
Signed-off-by: Sage Weil <>
Reviewed-by: Josh Durgin <>

History

#1 Updated by Sage Weil about 11 years ago

  • Status changed from New to Fix Under Review
  • Assignee deleted (Sage Weil)

#2 Updated by Sage Weil about 11 years ago

  • Backport set to argonaut

#3 Updated by Sage Weil about 11 years ago

  • Target version set to v0.49

#4 Updated by Sage Weil about 11 years ago

  • Status changed from Fix Under Review to 7

#5 Updated by Sage Weil about 11 years ago

  • Status changed from 7 to Resolved

Also available in: Atom PDF