https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2015-04-13T15:32:21Z
Ceph
Ceph - Bug #11373: OSD crash in OSDService::get_map
https://tracker.ceph.com/issues/11373?journal_id=50574
2015-04-13T15:32:21Z
Kefu Chai
tchaikov@gmail.com
<ul></ul><p>the backtrace<br /><pre>
terminate called after throwing an instance of 'ceph::FailedAssertion'
*** Caught signal (Aborted) **
in thread 7f4bf83a7880
ceph version 0.94 (e61c4f093f88e44961d157f65091733580cea79a)
1: ceph-osd() [0xac51c2]
2: (()+0xf130) [0x7f4bf6d3e130]
3: (gsignal()+0x37) [0x7f4bf57585d7]
4: (abort()+0x148) [0x7f4bf5759cc8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f4bf605c9b5]
6: (()+0x5e926) [0x7f4bf605a926]
7: (()+0x5e953) [0x7f4bf605a953]
8: (()+0x5eb73) [0x7f4bf605ab73]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x27a) [0xbc538a]
10: (OSDService::get_map(unsigned int)+0x3f) [0x6ff77f]
11: (OSD::load_pgs()+0x17c9) [0x6b7479]
12: (OSD::init()+0x729) [0x6b8b99]
13: (main()+0x27f3) [0x643b63]
14: (__libc_start_main()+0xf5) [0x7f4bf5744af5]
15: ceph-osd() [0x65cdc9]
2015-04-11 20:59:43.350376 7f4bf83a7880 -1 *** Caught signal (Aborted) **
in thread 7f4bf83a7880
</pre></p>
<p>most recent log<br /><pre>
-13> 2015-04-11 20:59:43.320175 7f4bf83a7880 2 osd.30 0 boot
-12> 2015-04-11 20:59:43.322833 7f4bf83a7880 1 <cls> cls/refcount/cls_refcount.cc:231: Loaded refcount class!
-11> 2015-04-11 20:59:43.322946 7f4bf83a7880 1 <cls> cls/replica_log/cls_replica_log.cc:141: Loaded replica log class!
-10> 2015-04-11 20:59:43.323048 7f4bf83a7880 1 <cls> cls/statelog/cls_statelog.cc:306: Loaded log class!
-9> 2015-04-11 20:59:43.323386 7f4bf83a7880 1 <cls> cls/log/cls_log.cc:312: Loaded log class!
-8> 2015-04-11 20:59:43.325417 7f4bf83a7880 1 <cls> cls/rgw/cls_rgw.cc:3046: Loaded rgw class!
-7> 2015-04-11 20:59:43.325541 7f4bf83a7880 1 <cls> cls/version/cls_version.cc:227: Loaded version class!
-6> 2015-04-11 20:59:43.325667 7f4bf83a7880 0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello
-5> 2015-04-11 20:59:43.325767 7f4bf83a7880 1 <cls> cls/user/cls_user.cc:367: Loaded user class!
-4> 2015-04-11 20:59:43.326800 7f4bf83a7880 0 osd.30 28642 crush map has features 1107558400, adjusting msgr requires for clients
-3> 2015-04-11 20:59:43.326810 7f4bf83a7880 0 osd.30 28642 crush map has features 1107558400 was 8705, adjusting msgr requires for mons
-2> 2015-04-11 20:59:43.326817 7f4bf83a7880 0 osd.30 28642 crush map has features 1107558400, adjusting msgr requires for osds
-1> 2015-04-11 20:59:43.326833 7f4bf83a7880 0 osd.30 28642 load_pgs
0> 2015-04-11 20:59:43.346451 7f4bf83a7880 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f4bf83a7880 time 2015-04-11 20:59:43.344831
osd/OSD.h: 716: FAILED assert(ret)
</pre></p>
Ceph - Bug #11373: OSD crash in OSDService::get_map
https://tracker.ceph.com/issues/11373?journal_id=50586
2015-04-13T20:49:43Z
Ilja Slepnev
<ul></ul><p>Enabled more debug. Missing osdmap 25584.<br />What could be the reason for osdmap loss and how to work around it with minimal data loss?</p>
<p>Startup log from OSD.58<br /><pre>
-9> 2015-04-13 23:40:29.333979 7f4fc5205880 10 osd.58 28557 pgid 5.1 coll 5.1_head
-8> 2015-04-13 23:40:29.333993 7f4fc5205880 15 filestore(/var/lib/ceph/osd/ceph-58) omap_get_values 5.1_head/1//head//5
-7> 2015-04-13 23:40:29.334021 7f4fc5205880 15 filestore(/var/lib/ceph/osd/ceph-58) collection_getattr /var/lib/ceph/osd/ceph-58/current/5.1_head 'info'
-6> 2015-04-13 23:40:29.334036 7f4fc5205880 10 filestore(/var/lib/ceph/osd/ceph-58) collection_getattr /var/lib/ceph/osd/ceph-58/current/5.1_head 'info' = 1
-5> 2015-04-13 23:40:29.334048 7f4fc5205880 15 filestore(/var/lib/ceph/osd/ceph-58) omap_get_values meta/16ef7597/infos/head//-1
-4> 2015-04-13 23:40:29.334297 7f4fc5205880 20 osd.58 0 get_map 25584 - loading and decoding 0x5618000
-3> 2015-04-13 23:40:29.334308 7f4fc5205880 15 filestore(/var/lib/ceph/osd/ceph-58) read meta/4eb33da9/osdmap.25584/0//-1 0~0
-2> 2015-04-13 23:40:29.334339 7f4fc5205880 10 filestore(/var/lib/ceph/osd/ceph-58) error opening file /var/lib/ceph/osd/ceph-58/current/meta/DIR_9/DIR_A/osdmap.25584__0_4EB33DA9__none with flags=2: (2) No such file or directory
-1> 2015-04-13 23:40:29.334353 7f4fc5205880 10 filestore(/var/lib/ceph/osd/ceph-58) FileStore::read(meta/4eb33da9/osdmap.25584/0//-1) open error: (2) No such file or directory
0> 2015-04-13 23:40:29.336211 7f4fc5205880 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f4fc5205880 time 2015-04-13 23:40:29.334365
osd/OSD.h: 716: FAILED assert(ret)
ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xbc51f5]
2: (OSDService::get_map(unsigned int)+0x3f) [0x6ff77f]
3: (OSD::load_pgs()+0x17c9) [0x6b7479]
4: (OSD::init()+0x729) [0x6b8b99]
5: (main()+0x27f3) [0x643b63]
6: (__libc_start_main()+0xf5) [0x7f4fc25a5af5]
7: ceph-osd() [0x65cdc9]
</pre></p>
<p>Another log from OSD.59<br /><pre>
-9> 2015-04-13 23:35:16.834060 7f3db534c880 10 osd.59 28521 pgid 5.13 coll 5.13_head
-8> 2015-04-13 23:35:16.834078 7f3db534c880 15 filestore(/var/lib/ceph/osd/ceph-59) omap_get_values 5.13_head/13//head//5
-7> 2015-04-13 23:35:16.834120 7f3db534c880 15 filestore(/var/lib/ceph/osd/ceph-59) collection_getattr /var/lib/ceph/osd/ceph-59/current/5.13_head 'info'
-6> 2015-04-13 23:35:16.834142 7f3db534c880 10 filestore(/var/lib/ceph/osd/ceph-59) collection_getattr /var/lib/ceph/osd/ceph-59/current/5.13_head 'info' = 1
-5> 2015-04-13 23:35:16.834154 7f3db534c880 15 filestore(/var/lib/ceph/osd/ceph-59) omap_get_values meta/16ef7597/infos/head//-1
-4> 2015-04-13 23:35:16.834512 7f3db534c880 20 osd.59 0 get_map 25584 - loading and decoding 0x4f08000
-3> 2015-04-13 23:35:16.834527 7f3db534c880 15 filestore(/var/lib/ceph/osd/ceph-59) read meta/4eb33da9/osdmap.25584/0//-1 0~0
-2> 2015-04-13 23:35:16.834569 7f3db534c880 10 filestore(/var/lib/ceph/osd/ceph-59) error opening file /var/lib/ceph/osd/ceph-59/current/meta/DIR_9/DIR_A/osdmap.25584__0_4EB33DA9__none with flags=2: (2) No such file or directo
ry
-1> 2015-04-13 23:35:16.834590 7f3db534c880 10 filestore(/var/lib/ceph/osd/ceph-59) FileStore::read(meta/4eb33da9/osdmap.25584/0//-1) open error: (2) No such file or directory
0> 2015-04-13 23:35:16.837324 7f3db534c880 -1 osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f3db534c880 time 2015-04-13 23:35:16.834606
osd/OSD.h: 716: FAILED assert(ret)
ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0xbc51f5]
2: (OSDService::get_map(unsigned int)+0x3f) [0x6ff77f]
3: (OSD::load_pgs()+0x17c9) [0x6b7479]
4: (OSD::init()+0x729) [0x6b8b99]
5: (main()+0x27f3) [0x643b63]
6: (__libc_start_main()+0xf5) [0x7f3db26ecaf5]
7: ceph-osd() [0x65cdc9]
</pre></p>
Ceph - Bug #11373: OSD crash in OSDService::get_map
https://tracker.ceph.com/issues/11373?journal_id=50593
2015-04-13T22:41:15Z
Ilja Slepnev
<ul></ul><p>Found the workaround.</p>
<p>In the past I have deleted some pools. It was before giant. Now I see that data was not erased from OSDs for some reason.<br />Until hammer it was not a problem, however after upgrade to hammer osd daemons failed to start.<br />Moved old and unused data to safe place.<br />OSDs started successfully. All PGs are active+clean.</p>
Ceph - Bug #11373: OSD crash in OSDService::get_map
https://tracker.ceph.com/issues/11373?journal_id=50774
2015-04-20T04:00:37Z
Samuel Just
sjust@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Duplicate</i></li></ul><p>I think you are quite right. The original bug is 10617, and I opened 11429 to handle upgrades from osds which hit this bug.</p>