Bug #1529
closedcosd: os/FileStore.cc: 2390: FAILED assert(0 == "ENOENT on clone suggests osd bug")
Added by Anonymous over 12 years ago. Updated about 12 years ago.
0%
Description
Full archive attached.
2011-09-09 12:21:25.702975 7f8b28a2f700 log [INF] : 4.7 scrub ok 2011-09-09 12:21:29.955711 7f8b28a2f700 log [INF] : 5.0 scrub ok 2011-09-09 12:21:30.955676 7f8b2822e700 log [INF] : 5.1 scrub ok os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '0x7 f8b2e33b700' os/FileStore.cc: 2390: FAILED assert(0 == "ENOENT on clone suggests osd bug") ceph version 0.34-482-g9713666 (commit:9713666c27412a121de4ffefbcfb184af238431d) 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x572) [0x6af562] 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6b3635] 3: (FileStore::_do_op(FileStore::OpSequencer*)+0x1a6) [0x698066] 4: (ThreadPool::worker()+0xa3a) [0x577a3a] 5: (ThreadPool::WorkThread::entry()+0xd) [0x54eefd] 6: (()+0x7971) [0x7f8b36db9971] 7: (clone()+0x6d) [0x7f8b3564992d] ceph version 0.34-482-g9713666 (commit:9713666c27412a121de4ffefbcfb184af238431d) 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x572) [0x6af562] 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6b3635] 3: (FileStore::_do_op(FileStore::OpSequencer*)+0x1a6) [0x698066] 4: (ThreadPool::worker()+0xa3a) [0x577a3a] 5: (ThreadPool::WorkThread::entry()+0xd) [0x54eefd] 6: (()+0x7971) [0x7f8b36db9971] 7: (clone()+0x6d) [0x7f8b3564992d] *** Caught signal (Aborted) **
Files
cosd-assert.tgz (1.35 MB) cosd-assert.tgz | Anonymous, 09/09/2011 12:27 PM |
Updated by Anonymous over 12 years ago
- File cosd-assert.tgz cosd-assert.tgz added
Updated by Samuel Just over 12 years ago
- Status changed from New to Resolved
829ebaa20272f2c6a7136513fa11ef5c187bbb67
Updated by Wido den Hollander over 12 years ago
- Status changed from Resolved to 4
I'm still seeing this one. All my 6 OSDs went down and after starting them most of them would crash:
Oct 31 14:28:26 node07 osd.4[9081]: 7f981f270720 filestore(/var/lib/ceph/osd.4) _do_transaction on 0x172a000 Oct 31 14:28:26 node07 osd.4[9081]: 7f981f270720 filestore(/var/lib/ceph/osd.4) remove 2.1b_head/rb.0.3.000000000127/7 Oct 31 14:28:26 node07 osd.4[9081]: 7f981f270720 filestore(/var/lib/ceph/osd.4) remove 2.1b_head/rb.0.3.000000000127/7 = -2 Oct 31 14:28:26 node07 osd.4[9081]: 7f981f270720 filestore(/var/lib/ceph/osd.4) clone_range 2.1b_head/rb.0.3.000000000127/head -> 2.1b_head/rb.0.3.000000000127/7 0~4194304 to 0 Oct 31 14:28:26 node07 osd.4[9081]: 7f981f270720 filestore(/var/lib/ceph/osd.4) clone_range 2.1b_head/rb.0.3.000000000127/head -> 2.1b_head/rb.0.3.000000000127/7 0~4194304 to 0 = -2 Oct 31 14:28:26 node07 osd.4[9081]: os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '7f981f270720'#012os/FileStore.cc: 2396: FAILED assert(0 == "ENOENT on clone suggests osd bug") Oct 31 14:28:26 node07 osd.4[9081]: ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 3: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 4: (FileStore::mount()+0x2803) [0x6abb13]#012 5: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 6: (main()+0x2c8d) [0x4a55dd]#012 7: (__libc_start_main()+0xfd) [0x7f981d600c4d]#012 8: /usr/bin/ceph-osd() [0x4a2529] Oct 31 14:28:26 node07 osd.4[9081]: ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 3: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 4: (FileStore::mount()+0x2803) [0x6abb13]#012 5: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 6: (main()+0x2c8d) [0x4a55dd]#012 7: (__libc_start_main()+0xfd) [0x7f981d600c4d]#012 8: /usr/bin/ceph-osd() [0x4a2529] Oct 31 14:28:26 node07 osd.4[9081]: *** Caught signal (Aborted) **#012 in thread 7f981f270720 Oct 31 14:28:26 node07 osd.4[9081]: ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: /usr/bin/ceph-osd() [0x67fd94]#012 2: (()+0xf8f0) [0x7f981ee518f0]#012 3: (gsignal()+0x35) [0x7f981d615a75]#012 4: (abort()+0x180) [0x7f981d6195c0]#012 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f981decb8e5]#012 6: (()+0xcad16) [0x7f981dec9d16]#012 7: (()+0xcad43) [0x7f981dec9d43]#012 8: (()+0xcae3e) [0x7f981dec9e3e]#012 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x39f) [0x5b561f]#012 10: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 11: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 12: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 13: (FileStore::mount()+0x2803) [0x6abb13]#012 14: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 15: (main()+0x2c8d) [0x4a55dd]#012 16: (__libc_start_main()+0xfd) [0x7f981d600c4d]#012 17: /usr/bin/ceph-osd() [0x4a2529]
Oct 31 14:40:14 node03 osd.0[11148]: 7f2ccf4ca720 filestore(/var/lib/ceph/osd.0) _do_transaction on 0x18070a0 Oct 31 14:40:14 node03 osd.0[11148]: 7f2ccf4ca720 filestore(/var/lib/ceph/osd.0) remove 2.19_head/rb.0.3.000000000136/7 Oct 31 14:40:14 node03 osd.0[11148]: 7f2ccf4ca720 filestore(/var/lib/ceph/osd.0) remove 2.19_head/rb.0.3.000000000136/7 = -2 Oct 31 14:40:14 node03 osd.0[11148]: 7f2ccf4ca720 filestore(/var/lib/ceph/osd.0) clone_range 2.19_head/rb.0.3.000000000136/head -> 2.19_head/rb.0.3.000000000136/7 0~4194304 to 0 Oct 31 14:40:14 node03 osd.0[11148]: 7f2ccf4ca720 filestore(/var/lib/ceph/osd.0) clone_range 2.19_head/rb.0.3.000000000136/head -> 2.19_head/rb.0.3.000000000136/7 0~4194304 to 0 = -2 Oct 31 14:40:14 node03 osd.0[11148]: os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '7f2ccf4ca720'#012os/FileStore.cc: 2396: FAILED assert(0 == "ENOENT on clone suggests osd bug") Oct 31 14:40:14 node03 osd.0[11148]: ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 3: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 4: (FileStore::mount()+0x2803) [0x6abb13]#012 5: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 6: (main()+0x2c8d) [0x4a55dd]#012 7: (__libc_start_main()+0xfd) [0x7f2ccd859c4d]#012 8: /usr/bin/ceph-osd() [0x4a2529] Oct 31 14:40:14 node03 osd.0[11148]: ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 3: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 4: (FileStore::mount()+0x2803) [0x6abb13]#012 5: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 6: (main()+0x2c8d) [0x4a55dd]#012 7: (__libc_start_main()+0xfd) [0x7f2ccd859c4d]#012 8: /usr/bin/ceph-osd() [0x4a2529] Oct 31 14:40:14 node03 osd.0[11148]: *** Caught signal (Aborted) **#012 in thread 7f2ccf4ca720 Oct 31 14:40:14 node03 osd.0[11148]: ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: /usr/bin/ceph-osd() [0x67fd94]#012 2: (()+0xf8f0) [0x7f2ccf0aa8f0]#012 3: (gsignal()+0x35) [0x7f2ccd86ea75]#012 4: (abort()+0x180) [0x7f2ccd8725c0]#012 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f2cce1248e5]#012 6: (()+0xcad16) [0x7f2cce122d16]#012 7: (()+0xcad43) [0x7f2cce122d43]#012 8: (()+0xcae3e) [0x7f2cce122e3e]#012 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x39f) [0x5b561f]#012 10: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 11: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 12: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 13: (FileStore::mount()+0x2803) [0x6abb13]#012 14: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 15: (main()+0x2c8d) [0x4a55dd]#012 16: (__libc_start_main()+0xfd) [0x7f2ccd859c4d]#012 17: /usr/bin/ceph-osd() [0x4a2529]
Oct 31 14:42:37 node06 osd.3[7397]: 7fa72b6c3720 filestore(/var/lib/ceph/osd.3) _do_transaction on 0xf11000 Oct 31 14:42:37 node06 osd.3[7397]: 7fa72b6c3720 filestore(/var/lib/ceph/osd.3) remove 2.6_head/rb.0.3.00000000024c/7 Oct 31 14:42:37 node06 osd.3[7397]: 7fa72b6c3720 filestore(/var/lib/ceph/osd.3) remove 2.6_head/rb.0.3.00000000024c/7 = -2 Oct 31 14:42:37 node06 osd.3[7397]: 7fa72b6c3720 filestore(/var/lib/ceph/osd.3) clone_range 2.6_head/rb.0.3.00000000024c/head -> 2.6_head/rb.0.3.00000000024c/7 0~4194304 to 0 Oct 31 14:42:37 node06 osd.3[7397]: 7fa72b6c3720 filestore(/var/lib/ceph/osd.3) clone_range 2.6_head/rb.0.3.00000000024c/head -> 2.6_head/rb.0.3.00000000024c/7 0~4194304 to 0 = -2 Oct 31 14:42:37 node06 osd.3[7397]: os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '7fa72b6c3720'#012os/FileStore.cc: 2396: FAILED assert(0 == "ENOENT on clone suggests osd bug") Oct 31 14:42:37 node06 osd.3[7397]: ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 3: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 4: (FileStore::mount()+0x2803) [0x6abb13]#012 5: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 6: (main()+0x2c8d) [0x4a55dd]#012 7: (__libc_start_main()+0xfd) [0x7fa729a52c4d]#012 8: /usr/bin/ceph-osd() [0x4a2529] Oct 31 14:42:37 node06 osd.3[7397]: ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 3: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 4: (FileStore::mount()+0x2803) [0x6abb13]#012 5: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 6: (main()+0x2c8d) [0x4a55dd]#012 7: (__libc_start_main()+0xfd) [0x7fa729a52c4d]#012 8: /usr/bin/ceph-osd() [0x4a2529] Oct 31 14:42:37 node06 osd.3[7397]: *** Caught signal (Aborted) **#012 in thread 7fa72b6c3720 Oct 31 14:42:37 node06 osd.3[7397]: ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: /usr/bin/ceph-osd() [0x67fd94]#012 2: (()+0xf8f0) [0x7fa72b2a38f0]#012 3: (gsignal()+0x35) [0x7fa729a67a75]#012 4: (abort()+0x180) [0x7fa729a6b5c0]#012 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fa72a31d8e5]#012 6: (()+0xcad16) [0x7fa72a31bd16]#012 7: (()+0xcad43) [0x7fa72a31bd43]#012 8: (()+0xcae3e) [0x7fa72a31be3e]#012 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x39f) [0x5b561f]#012 10: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 11: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 12: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 13: (FileStore::mount()+0x2803) [0x6abb13]#012 14: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 15: (main()+0x2c8d) [0x4a55dd]#012 16: (__libc_start_main()+0xfd) [0x7fa729a52c4d]#012 17: /usr/bin/ceph-osd() [0x4a2529]
The backtrace shows:
(gdb) bt #0 0x00007fa72b2a37bb in raise () from /lib/libpthread.so.0 #1 0x000000000067fa03 in reraise_fatal (signum=7397) at global/signal_handler.cc:59 #2 0x000000000067ffcc in handle_fatal_signal (signum=<value optimized out>) at global/signal_handler.cc:106 #3 <signal handler called> #4 0x00007fa729a67a75 in raise () from /lib/libc.so.6 #5 0x00007fa729a6b5c0 in abort () from /lib/libc.so.6 #6 0x00007fa72a31d8e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6 #7 0x00007fa72a31bd16 in ?? () from /usr/lib/libstdc++.so.6 #8 0x00007fa72a31bd43 in std::terminate() () from /usr/lib/libstdc++.so.6 #9 0x00007fa72a31be3e in __cxa_throw () from /usr/lib/libstdc++.so.6 #10 0x00000000005b561f in ceph::__ceph_assert_fail (assertion=<value optimized out>, file=<value optimized out>, line=<value optimized out>, func=<value optimized out>) at common/assert.cc:70 #11 0x00000000006bdbf7 in FileStore::_do_transaction (this=<value optimized out>, t=...) at os/FileStore.cc:2396 #12 0x00000000006c1e75 in FileStore::do_transactions (this=0xf14000, tls=<value optimized out>, op_seq=<value optimized out>) at os/FileStore.cc:2089 #13 0x00000000006ccfc3 in JournalingObjectStore::journal_replay (this=<value optimized out>, fs_op_seq=<value optimized out>) at os/JournalingObjectStore.cc:74 #14 0x00000000006abb13 in FileStore::mount (this=0xf14000) at os/FileStore.cc:1692 #15 0x0000000000524f18 in do_convertfs (dev=..., jdev=...) at osd/OSD.cc:191 #16 OSD::convertfs (dev=..., jdev=...) at osd/OSD.cc:254 #17 0x00000000004a55dd in main (argc=<value optimized out>, argv=<value optimized out>) at ceph_osd.cc:304 (gdb)
Anything I can do to debug this further?
Updated by Sage Weil over 12 years ago
Do you have the odd log preferring the restart?
Updated by Sage Weil over 12 years ago
Sage Weil wrote:
Do you have the odd log preferring the restart?
Er, osd log preceeding ...
Updated by Wido den Hollander over 12 years ago
Sorry for the slow response! Somehow I didn't get a e-mail update.
I do have logs preceeding the crash, but they are with a low logging.
I can search in them if you want to instead of posting them completely. Any specific lines you need?
Do note that this was a fresh installed cluster with just two RBD images on them.
Updated by Josh Durgin over 12 years ago
Thrashing with the snaps workload triggered this on several osds in run 1497 today.
Updated by Sage Weil about 12 years ago
- Status changed from 4 to Can't reproduce