Project

General

Profile

Bug #3613

Objecter::scan_requests crash

Added by Greg Farnum over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Objecter
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

One of the failures in #3459 was due to testrados_watch_notify crashing:

#0  0x00007fa9efc199ec in Objecter::scan_requests (this=0xdaef80, skipped_map=false, need_resend=..., need_resend_linger=...) at osdc/Objecter.cc:445
#1  0x00007fa9efc1a9e8 in Objecter::handle_osd_map (this=0xdaef80, m=0x7fa9e0000bf0) at osdc/Objecter.cc:528
#2  0x00007fa9efbf8a0e in librados::RadosClient::_dispatch (this=0xda9b30, m=0x7fa9e0000bf0) at librados/RadosClient.cc:313
#3  0x00007fa9efbf87df in librados::RadosClient::ms_dispatch (this=0xda9b30, m=0x7fa9e0000bf0) at librados/RadosClient.cc:280
#4  0x00007fa9efd25fe7 in Messenger::ms_deliver_dispatch (this=0xdae1b0, m=0x7fa9e0000bf0) at msg/Messenger.h:549
#5  0x00007fa9efd256f5 in DispatchQueue::entry (this=0xdae298) at msg/DispatchQueue.cc:107
#6  0x00007fa9efdc074c in DispatchQueue::DispatchThread::entry (this=0xdae3b0) at msg/DispatchQueue.h:85
#7  0x00007fa9efe2b119 in Thread::_entry_func (arg=0xdae3b0) at common/Thread.cc:41
#8  0x00007fa9ee9dbe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#9  0x00007fa9eece34bd in klogctl () from /lib/x86_64-linux-gnu/libc.so.6
#10 0x0000000000000000 in ?? ()

See ubuntu@teuthology:/a/sage-2012-12-11_19:47:13-rados-wip-3459-testing-basic/11922. I cannot for my life make sense out of the claimed segfault, although the op does look a little suspect:

(gdb) p op->pgid
$1 = {m_pool = 33, m_seed = 0, m_preferred = 0}

I do notice that the watch-notify handling in the Objecter just got changed though, so maybe that's involved.

Associated revisions

Revision 4bf90782 (diff)
Added by Sage Weil over 11 years ago

osdc/Objecter: prevent pool dne check from invalidating scan_requests iterator

We iterate over ops and, if the pool dne and other conditions are true,
we will immediately return ENOENT and cancel an op. Increment the
iterator at the top of the loop to avoid invalidating it.

We also need to switch to a map<>, because hash_map<> mutations may
invalidate any/all iterators.

Fixes: #3613
Signed-off-by: Sage Weil <>

History

#1 Updated by Sage Weil over 11 years ago

  • Priority changed from High to Urgent

#2 Updated by Josh Durgin over 11 years ago

I don't see any evidence pointing to the recent objecter changes yet. The op->ops vector seems to be invalid though.

#3 Updated by Sage Weil over 11 years ago

  • Status changed from New to 7
  • Assignee changed from Josh Durgin to Sage Weil

the pool dne check invalidated the iterator. switching to map<> and incrementing hte iterator at hte top of the loop

#4 Updated by Sage Weil over 11 years ago

  • Status changed from 7 to Resolved

4bf9078286d58c2cd4e85cb8b31411220a377092

passed 100 iterations of the test (previously failed after ~15).

Also available in: Atom PDF