Actions
Bug #8346
closedOSD crashes on master (FAILED assert(ip_op.waiting_for_commit.count(from)))
% Done:
0%
Source:
other
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Revision bfce3d4, home-built packages on Fedora 20: wasn't actually trying to test Ceph, but since I saw some crashes, I've opened this and attached logs.
Things started going wrong after doing a couple of spurious "ceph osd down" and "ceph osd out" operations, although some of the crashes happened a few minutes later, possibly in response to the resulting relocation of PGs.
osd.1:
osd/OSD.cc: 4768: FAILED assert(osd->is_active()) ceph version 0.80-419-gbfce3d4 (bfce3d4facad93ce6528a9595e9b3feb9d0884ca) 1: (OSDService::share_map(entity_name_t, Connection*, unsigned int, std::tr1::shared_ptr<OSDMap const>&, unsigned int*)+0x5b3) [0x6335a3] 2: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x187) [0x633827] 3: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x203) [0x633f93] 4: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0xac) [0x6689bc] 5: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb10) [0xa75d30] 6: (ThreadPool::WorkThread::entry()+0x10) [0xa76c20] 7: (()+0x7f33) [0x7f58ad293f33] 8: (clone()+0x6d) [0x7f58abd24ded] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
osd.3:
osd/ReplicatedBackend.cc: 641: FAILED assert(ip_op.waiting_for_commit.count(from)) ceph version 0.80-419-gbfce3d4 (bfce3d4facad93ce6528a9595e9b3feb9d0884ca) 1: (ReplicatedBackend::sub_op_modify_reply(std::tr1::shared_ptr<OpRequest>)+0x5ab) [0x92242b] 2: (ReplicatedBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x1f6) [0x922736] 3: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x303) [0x7bb403] 4: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x475) [0x633b15] 5: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x203) [0x633f93] 6: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0xac) [0x6689bc] 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb10) [0xa75d30] 8: (ThreadPool::WorkThread::entry()+0x10) [0xa76c20] 9: (()+0x7f33) [0x7f4a1600ef33] 10: (clone()+0x6d) [0x7f4a14a9fead]
osd.4
osd/ReplicatedBackend.cc: 641: FAILED assert(ip_op.waiting_for_commit.count(from)) ceph version 0.80-419-gbfce3d4 (bfce3d4facad93ce6528a9595e9b3feb9d0884ca) 1: (ReplicatedBackend::sub_op_modify_reply(std::tr1::shared_ptr<OpRequest>)+0x5ab) [0x92242b] 2: (ReplicatedBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x1f6) [0x922736] 3: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x303) [0x7bb403] 4: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x475) [0x633b15] 5: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x203) [0x633f93] 6: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0xac) [0x6689bc] 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb10) [0xa75d30] 8: (ThreadPool::WorkThread::entry()+0x10) [0xa76c20] 9: (()+0x7f33) [0x7f3e79608f33] 10: (clone()+0x6d) [0x7f3e78099ded] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Files
Actions