Project

General

Profile

Actions

Bug #3613

closed

Objecter::scan_requests crash

Added by Greg Farnum over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Objecter
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

One of the failures in #3459 was due to testrados_watch_notify crashing:

#0  0x00007fa9efc199ec in Objecter::scan_requests (this=0xdaef80, skipped_map=false, need_resend=..., need_resend_linger=...) at osdc/Objecter.cc:445
#1  0x00007fa9efc1a9e8 in Objecter::handle_osd_map (this=0xdaef80, m=0x7fa9e0000bf0) at osdc/Objecter.cc:528
#2  0x00007fa9efbf8a0e in librados::RadosClient::_dispatch (this=0xda9b30, m=0x7fa9e0000bf0) at librados/RadosClient.cc:313
#3  0x00007fa9efbf87df in librados::RadosClient::ms_dispatch (this=0xda9b30, m=0x7fa9e0000bf0) at librados/RadosClient.cc:280
#4  0x00007fa9efd25fe7 in Messenger::ms_deliver_dispatch (this=0xdae1b0, m=0x7fa9e0000bf0) at msg/Messenger.h:549
#5  0x00007fa9efd256f5 in DispatchQueue::entry (this=0xdae298) at msg/DispatchQueue.cc:107
#6  0x00007fa9efdc074c in DispatchQueue::DispatchThread::entry (this=0xdae3b0) at msg/DispatchQueue.h:85
#7  0x00007fa9efe2b119 in Thread::_entry_func (arg=0xdae3b0) at common/Thread.cc:41
#8  0x00007fa9ee9dbe9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#9  0x00007fa9eece34bd in klogctl () from /lib/x86_64-linux-gnu/libc.so.6
#10 0x0000000000000000 in ?? ()

See ubuntu@teuthology:/a/sage-2012-12-11_19:47:13-rados-wip-3459-testing-basic/11922. I cannot for my life make sense out of the claimed segfault, although the op does look a little suspect:

(gdb) p op->pgid
$1 = {m_pool = 33, m_seed = 0, m_preferred = 0}

I do notice that the watch-notify handling in the Objecter just got changed though, so maybe that's involved.

Actions #1

Updated by Sage Weil over 11 years ago

  • Priority changed from High to Urgent
Actions #2

Updated by Josh Durgin over 11 years ago

I don't see any evidence pointing to the recent objecter changes yet. The op->ops vector seems to be invalid though.

Actions #3

Updated by Sage Weil over 11 years ago

  • Status changed from New to 7
  • Assignee changed from Josh Durgin to Sage Weil

the pool dne check invalidated the iterator. switching to map<> and incrementing hte iterator at hte top of the loop

Actions #4

Updated by Sage Weil over 11 years ago

  • Status changed from 7 to Resolved

4bf9078286d58c2cd4e85cb8b31411220a377092

passed 100 iterations of the test (previously failed after ~15).

Actions

Also available in: Atom PDF