Bug #1529
cosd: os/FileStore.cc: 2390: FAILED assert(0 == "ENOENT on clone suggests osd bug")
0%
Description
Full archive attached.
2011-09-09 12:21:25.702975 7f8b28a2f700 log [INF] : 4.7 scrub ok 2011-09-09 12:21:29.955711 7f8b28a2f700 log [INF] : 5.0 scrub ok 2011-09-09 12:21:30.955676 7f8b2822e700 log [INF] : 5.1 scrub ok os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '0x7 f8b2e33b700' os/FileStore.cc: 2390: FAILED assert(0 == "ENOENT on clone suggests osd bug") ceph version 0.34-482-g9713666 (commit:9713666c27412a121de4ffefbcfb184af238431d) 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x572) [0x6af562] 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6b3635] 3: (FileStore::_do_op(FileStore::OpSequencer*)+0x1a6) [0x698066] 4: (ThreadPool::worker()+0xa3a) [0x577a3a] 5: (ThreadPool::WorkThread::entry()+0xd) [0x54eefd] 6: (()+0x7971) [0x7f8b36db9971] 7: (clone()+0x6d) [0x7f8b3564992d] ceph version 0.34-482-g9713666 (commit:9713666c27412a121de4ffefbcfb184af238431d) 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x572) [0x6af562] 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6b3635] 3: (FileStore::_do_op(FileStore::OpSequencer*)+0x1a6) [0x698066] 4: (ThreadPool::worker()+0xa3a) [0x577a3a] 5: (ThreadPool::WorkThread::entry()+0xd) [0x54eefd] 6: (()+0x7971) [0x7f8b36db9971] 7: (clone()+0x6d) [0x7f8b3564992d] *** Caught signal (Aborted) **
History
#1 Updated by Anonymous over 12 years ago
- File cosd-assert.tgz added
#2 Updated by Samuel Just over 12 years ago
- Status changed from New to Resolved
829ebaa20272f2c6a7136513fa11ef5c187bbb67
#3 Updated by Wido den Hollander over 12 years ago
- Status changed from Resolved to 4
I'm still seeing this one. All my 6 OSDs went down and after starting them most of them would crash:
Oct 31 14:28:26 node07 osd.4[9081]: 7f981f270720 filestore(/var/lib/ceph/osd.4) _do_transaction on 0x172a000 Oct 31 14:28:26 node07 osd.4[9081]: 7f981f270720 filestore(/var/lib/ceph/osd.4) remove 2.1b_head/rb.0.3.000000000127/7 Oct 31 14:28:26 node07 osd.4[9081]: 7f981f270720 filestore(/var/lib/ceph/osd.4) remove 2.1b_head/rb.0.3.000000000127/7 = -2 Oct 31 14:28:26 node07 osd.4[9081]: 7f981f270720 filestore(/var/lib/ceph/osd.4) clone_range 2.1b_head/rb.0.3.000000000127/head -> 2.1b_head/rb.0.3.000000000127/7 0~4194304 to 0 Oct 31 14:28:26 node07 osd.4[9081]: 7f981f270720 filestore(/var/lib/ceph/osd.4) clone_range 2.1b_head/rb.0.3.000000000127/head -> 2.1b_head/rb.0.3.000000000127/7 0~4194304 to 0 = -2 Oct 31 14:28:26 node07 osd.4[9081]: os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '7f981f270720'#012os/FileStore.cc: 2396: FAILED assert(0 == "ENOENT on clone suggests osd bug") Oct 31 14:28:26 node07 osd.4[9081]: ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 3: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 4: (FileStore::mount()+0x2803) [0x6abb13]#012 5: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 6: (main()+0x2c8d) [0x4a55dd]#012 7: (__libc_start_main()+0xfd) [0x7f981d600c4d]#012 8: /usr/bin/ceph-osd() [0x4a2529] Oct 31 14:28:26 node07 osd.4[9081]: ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 3: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 4: (FileStore::mount()+0x2803) [0x6abb13]#012 5: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 6: (main()+0x2c8d) [0x4a55dd]#012 7: (__libc_start_main()+0xfd) [0x7f981d600c4d]#012 8: /usr/bin/ceph-osd() [0x4a2529] Oct 31 14:28:26 node07 osd.4[9081]: *** Caught signal (Aborted) **#012 in thread 7f981f270720 Oct 31 14:28:26 node07 osd.4[9081]: ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: /usr/bin/ceph-osd() [0x67fd94]#012 2: (()+0xf8f0) [0x7f981ee518f0]#012 3: (gsignal()+0x35) [0x7f981d615a75]#012 4: (abort()+0x180) [0x7f981d6195c0]#012 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f981decb8e5]#012 6: (()+0xcad16) [0x7f981dec9d16]#012 7: (()+0xcad43) [0x7f981dec9d43]#012 8: (()+0xcae3e) [0x7f981dec9e3e]#012 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x39f) [0x5b561f]#012 10: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 11: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 12: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 13: (FileStore::mount()+0x2803) [0x6abb13]#012 14: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 15: (main()+0x2c8d) [0x4a55dd]#012 16: (__libc_start_main()+0xfd) [0x7f981d600c4d]#012 17: /usr/bin/ceph-osd() [0x4a2529]
Oct 31 14:40:14 node03 osd.0[11148]: 7f2ccf4ca720 filestore(/var/lib/ceph/osd.0) _do_transaction on 0x18070a0 Oct 31 14:40:14 node03 osd.0[11148]: 7f2ccf4ca720 filestore(/var/lib/ceph/osd.0) remove 2.19_head/rb.0.3.000000000136/7 Oct 31 14:40:14 node03 osd.0[11148]: 7f2ccf4ca720 filestore(/var/lib/ceph/osd.0) remove 2.19_head/rb.0.3.000000000136/7 = -2 Oct 31 14:40:14 node03 osd.0[11148]: 7f2ccf4ca720 filestore(/var/lib/ceph/osd.0) clone_range 2.19_head/rb.0.3.000000000136/head -> 2.19_head/rb.0.3.000000000136/7 0~4194304 to 0 Oct 31 14:40:14 node03 osd.0[11148]: 7f2ccf4ca720 filestore(/var/lib/ceph/osd.0) clone_range 2.19_head/rb.0.3.000000000136/head -> 2.19_head/rb.0.3.000000000136/7 0~4194304 to 0 = -2 Oct 31 14:40:14 node03 osd.0[11148]: os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '7f2ccf4ca720'#012os/FileStore.cc: 2396: FAILED assert(0 == "ENOENT on clone suggests osd bug") Oct 31 14:40:14 node03 osd.0[11148]: ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 3: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 4: (FileStore::mount()+0x2803) [0x6abb13]#012 5: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 6: (main()+0x2c8d) [0x4a55dd]#012 7: (__libc_start_main()+0xfd) [0x7f2ccd859c4d]#012 8: /usr/bin/ceph-osd() [0x4a2529] Oct 31 14:40:14 node03 osd.0[11148]: ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 3: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 4: (FileStore::mount()+0x2803) [0x6abb13]#012 5: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 6: (main()+0x2c8d) [0x4a55dd]#012 7: (__libc_start_main()+0xfd) [0x7f2ccd859c4d]#012 8: /usr/bin/ceph-osd() [0x4a2529] Oct 31 14:40:14 node03 osd.0[11148]: *** Caught signal (Aborted) **#012 in thread 7f2ccf4ca720 Oct 31 14:40:14 node03 osd.0[11148]: ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: /usr/bin/ceph-osd() [0x67fd94]#012 2: (()+0xf8f0) [0x7f2ccf0aa8f0]#012 3: (gsignal()+0x35) [0x7f2ccd86ea75]#012 4: (abort()+0x180) [0x7f2ccd8725c0]#012 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f2cce1248e5]#012 6: (()+0xcad16) [0x7f2cce122d16]#012 7: (()+0xcad43) [0x7f2cce122d43]#012 8: (()+0xcae3e) [0x7f2cce122e3e]#012 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x39f) [0x5b561f]#012 10: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 11: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 12: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 13: (FileStore::mount()+0x2803) [0x6abb13]#012 14: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 15: (main()+0x2c8d) [0x4a55dd]#012 16: (__libc_start_main()+0xfd) [0x7f2ccd859c4d]#012 17: /usr/bin/ceph-osd() [0x4a2529]
Oct 31 14:42:37 node06 osd.3[7397]: 7fa72b6c3720 filestore(/var/lib/ceph/osd.3) _do_transaction on 0xf11000 Oct 31 14:42:37 node06 osd.3[7397]: 7fa72b6c3720 filestore(/var/lib/ceph/osd.3) remove 2.6_head/rb.0.3.00000000024c/7 Oct 31 14:42:37 node06 osd.3[7397]: 7fa72b6c3720 filestore(/var/lib/ceph/osd.3) remove 2.6_head/rb.0.3.00000000024c/7 = -2 Oct 31 14:42:37 node06 osd.3[7397]: 7fa72b6c3720 filestore(/var/lib/ceph/osd.3) clone_range 2.6_head/rb.0.3.00000000024c/head -> 2.6_head/rb.0.3.00000000024c/7 0~4194304 to 0 Oct 31 14:42:37 node06 osd.3[7397]: 7fa72b6c3720 filestore(/var/lib/ceph/osd.3) clone_range 2.6_head/rb.0.3.00000000024c/head -> 2.6_head/rb.0.3.00000000024c/7 0~4194304 to 0 = -2 Oct 31 14:42:37 node06 osd.3[7397]: os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '7fa72b6c3720'#012os/FileStore.cc: 2396: FAILED assert(0 == "ENOENT on clone suggests osd bug") Oct 31 14:42:37 node06 osd.3[7397]: ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 3: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 4: (FileStore::mount()+0x2803) [0x6abb13]#012 5: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 6: (main()+0x2c8d) [0x4a55dd]#012 7: (__libc_start_main()+0xfd) [0x7fa729a52c4d]#012 8: /usr/bin/ceph-osd() [0x4a2529] Oct 31 14:42:37 node06 osd.3[7397]: ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 3: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 4: (FileStore::mount()+0x2803) [0x6abb13]#012 5: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 6: (main()+0x2c8d) [0x4a55dd]#012 7: (__libc_start_main()+0xfd) [0x7fa729a52c4d]#012 8: /usr/bin/ceph-osd() [0x4a2529] Oct 31 14:42:37 node06 osd.3[7397]: *** Caught signal (Aborted) **#012 in thread 7fa72b6c3720 Oct 31 14:42:37 node06 osd.3[7397]: ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: /usr/bin/ceph-osd() [0x67fd94]#012 2: (()+0xf8f0) [0x7fa72b2a38f0]#012 3: (gsignal()+0x35) [0x7fa729a67a75]#012 4: (abort()+0x180) [0x7fa729a6b5c0]#012 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fa72a31d8e5]#012 6: (()+0xcad16) [0x7fa72a31bd16]#012 7: (()+0xcad43) [0x7fa72a31bd43]#012 8: (()+0xcae3e) [0x7fa72a31be3e]#012 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x39f) [0x5b561f]#012 10: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 11: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 12: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 13: (FileStore::mount()+0x2803) [0x6abb13]#012 14: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 15: (main()+0x2c8d) [0x4a55dd]#012 16: (__libc_start_main()+0xfd) [0x7fa729a52c4d]#012 17: /usr/bin/ceph-osd() [0x4a2529]
The backtrace shows:
(gdb) bt #0 0x00007fa72b2a37bb in raise () from /lib/libpthread.so.0 #1 0x000000000067fa03 in reraise_fatal (signum=7397) at global/signal_handler.cc:59 #2 0x000000000067ffcc in handle_fatal_signal (signum=<value optimized out>) at global/signal_handler.cc:106 #3 <signal handler called> #4 0x00007fa729a67a75 in raise () from /lib/libc.so.6 #5 0x00007fa729a6b5c0 in abort () from /lib/libc.so.6 #6 0x00007fa72a31d8e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6 #7 0x00007fa72a31bd16 in ?? () from /usr/lib/libstdc++.so.6 #8 0x00007fa72a31bd43 in std::terminate() () from /usr/lib/libstdc++.so.6 #9 0x00007fa72a31be3e in __cxa_throw () from /usr/lib/libstdc++.so.6 #10 0x00000000005b561f in ceph::__ceph_assert_fail (assertion=<value optimized out>, file=<value optimized out>, line=<value optimized out>, func=<value optimized out>) at common/assert.cc:70 #11 0x00000000006bdbf7 in FileStore::_do_transaction (this=<value optimized out>, t=...) at os/FileStore.cc:2396 #12 0x00000000006c1e75 in FileStore::do_transactions (this=0xf14000, tls=<value optimized out>, op_seq=<value optimized out>) at os/FileStore.cc:2089 #13 0x00000000006ccfc3 in JournalingObjectStore::journal_replay (this=<value optimized out>, fs_op_seq=<value optimized out>) at os/JournalingObjectStore.cc:74 #14 0x00000000006abb13 in FileStore::mount (this=0xf14000) at os/FileStore.cc:1692 #15 0x0000000000524f18 in do_convertfs (dev=..., jdev=...) at osd/OSD.cc:191 #16 OSD::convertfs (dev=..., jdev=...) at osd/OSD.cc:254 #17 0x00000000004a55dd in main (argc=<value optimized out>, argv=<value optimized out>) at ceph_osd.cc:304 (gdb)
Anything I can do to debug this further?
#4 Updated by Sage Weil over 12 years ago
Do you have the odd log preferring the restart?
#5 Updated by Sage Weil over 12 years ago
Sage Weil wrote:
Do you have the odd log preferring the restart?
Er, osd log preceeding ...
#6 Updated by Wido den Hollander over 12 years ago
Sorry for the slow response! Somehow I didn't get a e-mail update.
I do have logs preceeding the crash, but they are with a low logging.
I can search in them if you want to instead of posting them completely. Any specific lines you need?
Do note that this was a fresh installed cluster with just two RBD images on them.
#7 Updated by Josh Durgin over 12 years ago
Thrashing with the snaps workload triggered this on several osds in run 1497 today.
#8 Updated by Sage Weil about 12 years ago
- Status changed from 4 to Can't reproduce