Actions
Bug #566
closedosd: build_prior needs to be wary of nonexistent osds
% Done:
0%
Spent time:
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2010-11-08 22:34:35.332280 7f84cec17710 osd0 50 pg[3.0p3( empty n=0 ec=2 les=23 50/50/50) [0,1] r=0 mlcod 0'0 !hml crashed+peering] build_prior interval(21-34 []/[3,0] maybe_went_rw) 2010-11-08 22:34:35.332298 7f84cec17710 filestore(/data/osd0) read /data/osd0/current/meta/osdmap.34_0 0~0 2010-11-08 22:34:35.332329 7f84cec17710 filestore(/data/osd0) read /data/osd0/current/meta/osdmap.34_0 0~2538 = 2538 osd/OSDMap.h: In function 'osd_info_t& OSDMap::get_info(int)': osd/OSDMap.h:490: FAILED assert(osd < max_osd) ceph version 0.22.1 (commit:7464f9688001aa89f9673ba14e6d075d0ee33541) 1: (PG::peer(ObjectStore::Transaction&, std::list<Context*, std::allocator<Context*> >&, std::map<int, std::map<pg_t, PG::Query, std::less<pg_t>, std::allocator<std::pair<pg_t const, PG::Query> > >, std::less<int>, std::allocator<std::pair<int const, std::map<pg_t, PG::Query, std::less<pg_t>, std::allocator<std::pair<pg_t const, PG::Query> > > > > >&, std::map<int, MOSDPGInfo*, std::less<int>, std::allocator<std::pair<int const, MOSDPGInfo*> > >*)+0x8e0) [0x54a560] 2: (OSD::activate_map(ObjectStore::Transaction&, std::list<Context*, std::allocator<Context*> >&)+0x47d) [0x4e3bdd] 3: (OSD::handle_osd_map(MOSDMap*)+0x2815) [0x4f6795] 4: (OSD::_dispatch(Message*)+0x2ab) [0x4f89bb] 5: (OSD::ms_dispatch(Message*)+0x39) [0x4f9429] 6: (SimpleMessenger::dispatch_entry()+0x79b) [0x46a2db] 7: (SimpleMessenger::DispatchThread::entry()+0x1f) [0x45d53f] 8: (Thread::_entry_func(void*)+0xa) [0x470caa] 9: (()+0x7971) [0x7f84d64f2971] 10: (clone()+0x6d) [0x7f84d572391d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. *** Caught signal (ABRT) *** ceph version 0.22.1 (commit:7464f9688001aa89f9673ba14e6d075d0ee33541) 1: (sigabrt_handler(int)+0xde) [0x5e06de] 2: (()+0x33c20) [0x7f84d5670c20] 3: (gsignal()+0x35) [0x7f84d5670ba5] 4: (abort()+0x180) [0x7f84d56746b0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f84d5f146bd] 6: (()+0xb9906) [0x7f84d5f12906] 7: (()+0xb9933) [0x7f84d5f12933] 8: (()+0xb9a3e) [0x7f84d5f12a3e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x69c) [0x5ce05c] 10: (PG::build_prior()+0xa64) [0x546e64] 11: (PG::peer(ObjectStore::Transaction&, std::list<Context*, std::allocator<Context*> >&, std::map<int, std::map<pg_t, PG::Query, std::less<pg_t>, std::allocator<std::pair<pg_t const, PG::Query> > >, std::less<int>, std::allocator<std::pair<int const, std::map<pg_t, PG::Query, std::less<pg_t>, std::allocator<std::pair<pg_t const, PG::Query> > > > > >&, std::map<int, MOSDPGInfo*, std::less<int>, std::allocator<std::pair<int const, MOSDPGInfo*> > >*)+0x8e0) [0x54a560] 12: (OSD::activate_map(ObjectStore::Transaction&, std::list<Context*, std::allocator<Context*> >&)+0x47d) [0x4e3bdd] 13: (OSD::handle_osd_map(MOSDMap*)+0x2815) [0x4f6795] 14: (OSD::_dispatch(Message*)+0x2ab) [0x4f89bb] 15: (OSD::ms_dispatch(Message*)+0x39) [0x4f9429] 16: (SimpleMessenger::dispatch_entry()+0x79b) [0x46a2db] 17: (SimpleMessenger::DispatchThread::entry()+0x1f) [0x45d53f] 18: (Thread::_entry_func(void*)+0xa) [0x470caa] 19: (()+0x7971) [0x7f84d64f2971]
(gdb) up #11 PG::build_prior (this=0x23126e0) at osd/PG.cc:949 warning: Source file is more recent than executable. 949 const osd_info_t& pinfo = osd->osdmap->get_info(o); (gdb) p o $2 = 3 (gdb) p osd->osdmap->epoch $3 = 50 (gdb) list 944 } 945 946 // consider ACTING osds 947 for (unsigned i=0; i<interval.acting.size(); i++) { 948 int o = interval.acting[i]; 949 const osd_info_t& pinfo = osd->osdmap->get_info(o); 950 951 // if the osd restarted after this interval but is not known to have 952 // cleanly survived through this interval, we mark the pg crashed. 953 if (pinfo.up_from > interval.last && (gdb) p o $4 = 3
and that map is 50:
root@cephdisk02:~# osdmaptool -p /data/osd0/current/meta/osdmap.50_0 osdmaptool: osdmap file '/data/osd0/current/meta/osdmap.50_0' epoch 50 fsid e93c55d3-7255-edf2-4603-41bff032e92e created 2010-10-29 16:19:56.133231 modifed 2010-11-08 22:36:02.107112 flags pg_pool 0 'data' pg_pool(rep pg_size 2 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 lpg_num 2 lpgp_num 2 last_change 1 owner 0) pg_pool 1 'metadata' pg_pool(rep pg_size 2 crush_ruleset 1 object_hash rjenkins pg_num 256 pgp_num 256 lpg_num 2 lpgp_num 2 last_change 1 owner 0) pg_pool 2 'casdata' pg_pool(rep pg_size 2 crush_ruleset 2 object_hash rjenkins pg_num 256 pgp_num 256 lpg_num 2 lpgp_num 2 last_change 1 owner 0) pg_pool 3 'rbd' pg_pool(rep pg_size 2 crush_ruleset 3 object_hash rjenkins pg_num 256 pgp_num 256 lpg_num 2 lpgp_num 2 last_change 1 owner 0) max_osd 3 osd0 in weight 1 up (up_from 50 up_thru 30 down_at 49 last_clean 21-37) 192.168.100.15:6804/23979 192.168.100.15:6805/23979 osd1 in weight 1 up (up_from 3 up_thru 3 down_at 0 last_clean 0-0) 192.168.100.16:6803/2171 192.168.100.16:6804/2171 osd2 in weight 1 up (up_from 41 up_thru 33 down_at 40 last_clean 26-35) 192.168.100.17:6801/2279 192.168.100.17:6802/2279
i.e., max_osd went down, so the old osd no longer exists.
Updated by Sage Weil over 13 years ago
- Status changed from New to Resolved
Actions