Project

General

Profile

Actions

Bug #9096

closed

OSD::require_same_peer_instance fails to acquire lock

Added by Loïc Dachary over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
OSD
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

It can be reproduced by running a few times (less than 5) qa/workunits/cephtool/test.sh -t mon_osd. It will eventually crash one or more OSD, all with the following stack trace:

ceph version 0.83-655-ga006fe4 (a006fe4a7df3e0020325a5d9cf2956545b4cac47)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x95) [0x18e9cb1]
2: (Mutex::Lock(bool)+0x14e) [0x188eb78]
3: sources (OSD::require_same_peer_instance(std::tr1::shared_ptr<OpRequest>&, std::tr1::shared_ptr<OSDMap const>&)+0x396) [0x12e55f8] acquire lock again
4: sources (OSD::require_up_osd_peer(std::tr1::shared_ptr<OpRequest>&, std::tr1::shared_ptr<OSDMap const>&, unsigned int)+0x8c) [0x12e5760]
5: sources (void OSD::handle_replica_op<MOSDSubOp, 76>(std::tr1::shared_ptr<OpRequest>&, std::tr1::shared_ptr<OSDMap const>&)+0x349) [0x132eda3]
6: sources (OSD::dispatch_op_fast(std::tr1::shared_ptr<OpRequest>&, std::tr1::shared_ptr<OSDMap const>&)+0x198) [0x12db4e8]
7: sources (OSD::dispatch_session_waiting(OSD::Session*, std::tr1::shared_ptr<OSDMap const>)+0x11f) [0x12d9085] assert lock is in place
8: sources OSD::ms_fast_dispatch(Message*) acquire lock
9: (Messenger::ms_fast_dispatch(Message*)+0x74) [0x19e76a8]
10: (DispatchQueue::fast_dispatch(Message*)+0x3e) [0x19e6306]
11: (Pipe::reader()+0x1a5d) [0x1a07211]
12: (Pipe::Reader::entry()+0x1c) [0x1a0e63e]
13: (Thread::entry_wrapper()+0x79) [0x18d2451]
14: (Thread::_entry_func(void*)+0x18) [0x18d23ce]
15: (()+0x8182) [0x7fdf6d055182]
16: (clone()+0x6d) [0x7fdf6b64330d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions #2

Updated by Loïc Dachary over 9 years ago

  • Description updated (diff)
Actions #3

Updated by Sage Weil over 9 years ago

  • Assignee set to Sage Weil
Actions #4

Updated by Samuel Just over 9 years ago

  • Assignee changed from Sage Weil to Samuel Just
Actions #5

Updated by Samuel Just over 9 years ago

  • Status changed from 12 to Fix Under Review
Actions #6

Updated by Sage Weil over 9 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF