Project

General

Profile

Bug #13114

LibRadosWatchNotify.WatchNotify2Timeout

Added by Sage Weil about 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Q/A
Tags:
Backport:
hammer
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2015-09-15T22:36:47.195 INFO:tasks.ceph.osd.4.mira072.stderr:2015-09-15 22:36:47.197454 7f9748469980 -1 osd.4 993 log_to_monitors {default=true}
2015-09-15T22:36:49.723 INFO:tasks.workunit.client.0.plana43.stdout:watch_notify2_test_cb from 4796 notify_id 4333622001666 cookie 140222505007120
2015-09-15T22:36:50.726 INFO:tasks.workunit.client.0.plana43.stdout:test/librados/watch_notify.cc:501: Failure
2015-09-15T22:36:51.325 INFO:tasks.workunit.client.0.plana43.stdout:Value of: rados_notify2(ioctx, notify_oid, "notify", 6, 300000, &reply_buf, &reply_buf_len)
2015-09-15T22:36:51.325 INFO:tasks.workunit.client.0.plana43.stdout:  Actual: -110
2015-09-15T22:36:51.325 INFO:tasks.workunit.client.0.plana43.stdout:Expected: 0
2015-09-15T22:36:51.326 INFO:tasks.workunit.client.0.plana43.stdout:[  FAILED  ] LibRadosWatchNotify.WatchNotify2Timeout (4106 ms)

/a/sage-2015-09-15_18:56:52-rados-wip-sage-testing---basic-multi/1058498

Related issues

Copied to Ceph - Backport #13535: LibRadosWatchNotify.WatchNotify2Timeout Resolved

Associated revisions

Revision e86d0338 (diff)
Added by Sage Weil about 7 years ago

osdc/Objecter: distinguish between multiple notify completions

We may send a notify to the cluster multiple times due to OSDMap
changes. In some cases, earlier notify attempts may complete with
an error, while later attempts succeed. We need to only pay
attention to the most-recently send notify's completion.

Do this by making note of the notify_id in the initial ACK (only
present when talking to newer OSDs). When we get a notify
completion, match it against our expected notify_id (if we have
one) or else discard it.

This is important because in some cases an early notify completion
may be an error while a later one succeeds.

Note that if we are talking to an old cluster we will simply not record a
notify_id and our behavior will be the same as before (we will trust any
notify completion we get).

Fixes: #13114
Signed-off-by: Sage Weil <>

Revision 7ffd072a (diff)
Added by Sage Weil about 7 years ago

osdc/Objecter: distinguish between multiple notify completions

We may send a notify to the cluster multiple times due to OSDMap
changes. In some cases, earlier notify attempts may complete with
an error, while later attempts succeed. We need to only pay
attention to the most-recently send notify's completion.

Do this by making note of the notify_id in the initial ACK (only
present when talking to newer OSDs). When we get a notify
completion, match it against our expected notify_id (if we have
one) or else discard it.

This is important because in some cases an early notify completion
may be an error while a later one succeeds.

Note that if we are talking to an old cluster we will simply not record a
notify_id and our behavior will be the same as before (we will trust any
notify completion we get).

Fixes: #13114
Signed-off-by: Sage Weil <>
(cherry picked from commit e86d033854c76f344c678e92016c4e5c5e0385e2)

Conflicts:
src/osdc/Objecter.cc
In Objecter::handle_watch_notify, a conflict was there due to a modified comment by commit 47277c51db7bb2725ea117e4e8834869ae93e006, which was not backported

History

#1 Updated by Kefu Chai about 7 years ago

  • Assignee set to Kefu Chai

#2 Updated by Kefu Chai about 7 years ago

/a/kchai-2015-09-22_03:49:41-rados-wip-kefu-testing---basic-multi/1064089/ seems like a different one though.

#3 Updated by Kefu Chai about 7 years ago

  • Assignee deleted (Kefu Chai)

not able to reproduce locally. reassigning me from this ticket at this moment.

#4 Updated by Sage Weil about 7 years ago

  • Assignee set to Sage Weil

#5 Updated by Sage Weil about 7 years ago

  • Status changed from New to Fix Under Review
  • Backport set to hammer

#7 Updated by Sage Weil about 7 years ago

  • Priority changed from Normal to Urgent

#8 Updated by Sage Weil about 7 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Assignee deleted (Sage Weil)
  • Priority changed from Urgent to High

#9 Updated by Loïc Dachary almost 7 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF