Actions
Bug #1088
closedosd: assert(is_up) failed when sending queries
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
This happened when I was stress testing the peering code with 10 osds running off one disk, streaming writes, and marking osds in and out.
2011-05-13 14:59:17.011631 7ff1fd10d710 osd6 510 do_queries querying osd0 on 1 PGs osd/OSDMap.h: In function 'entity_inst_t OSDMap::get_cluster_inst(int)', in thread '0x7ff1fd10d710' osd/OSDMap.h: 482: FAILED assert(is_up(osd)) ceph version 0.27.1-357-g98acbc9 (commit:98acbc996e00ec9ba4a8682755bcca9300628c00) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x58) [0x961530] 2: (OSDMap::get_cluster_inst(int)+0x8b) [0x73b699] 3: (OSD::do_queries(std::map<int, std::map<pg_t, PG::Query, std::less<pg_t>, std::allocator<std::pair<pg_t const, PG::Query> > >, std::less<int>, std::allocator<std::pair<int const, std::map<pg_t, PG::Query, std::less<pg_t>, std::allocator<std::pair<pg_t const, PG::Query> > > > > >&)+0x1d2) [0x7bbd6a] 4: (OSD::activate_map(ObjectStore::Transaction&, std::list<Context*, std::allocator<Context*> >&)+0x3bd) [0x7b5e53] 5: (OSD::handle_osd_map(MOSDMap*)+0x1917) [0x7b3b6d] 6: (OSD::_dispatch(Message*)+0x358) [0x7b053c] 7: (OSD::ms_dispatch(Message*)+0x125) [0x7af8a5] 8: (Messenger::ms_deliver_dispatch(Message*)+0x68) [0x703112] 9: (SimpleMessenger::dispatch_entry()+0x702) [0x6f0a56] 10: (SimpleMessenger::DispatchThread::entry()+0x31) [0x6e5c1b] 11: (Thread::_entry_func(void*)+0x28) [0x701ea9] 12: (()+0x68ba) [0x7ff2077728ba] 13: (clone()+0x6d) [0x7ff20640702d]
The crashed osd is osd.6,
Logs for this are in vit:/home/joshd/osd_bugs/ossert_is_up_failed3.
This was run with the msgr fixes (commit:8bc4f711b044c6cf33f027279c4e8f5ff0f07226 and commit:98acbc996e00ec9ba4a8682755bcca9300628c00) applied.
Actions