Project

General

Profile

Bug #1529

cosd: os/FileStore.cc: 2390: FAILED assert(0 == "ENOENT on clone suggests osd bug")

Added by Anonymous about 8 years ago. Updated almost 8 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
Category:
OSD
Target version:
-
Start date:
09/09/2011
Due date:
% Done:

0%

Spent time:
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

Full archive attached.

2011-09-09 12:21:25.702975 7f8b28a2f700 log [INF] : 4.7 scrub ok
2011-09-09 12:21:29.955711 7f8b28a2f700 log [INF] : 5.0 scrub ok
2011-09-09 12:21:30.955676 7f8b2822e700 log [INF] : 5.1 scrub ok
os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '0x7
f8b2e33b700'
os/FileStore.cc: 2390: FAILED assert(0 == "ENOENT on clone suggests osd bug")
 ceph version 0.34-482-g9713666 (commit:9713666c27412a121de4ffefbcfb184af238431d)
 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x572) [0x6af562]
 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6b3635]
 3: (FileStore::_do_op(FileStore::OpSequencer*)+0x1a6) [0x698066]
 4: (ThreadPool::worker()+0xa3a) [0x577a3a]
 5: (ThreadPool::WorkThread::entry()+0xd) [0x54eefd]
 6: (()+0x7971) [0x7f8b36db9971]
 7: (clone()+0x6d) [0x7f8b3564992d]
 ceph version 0.34-482-g9713666 (commit:9713666c27412a121de4ffefbcfb184af238431d)
 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x572) [0x6af562]
 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6b3635]
 3: (FileStore::_do_op(FileStore::OpSequencer*)+0x1a6) [0x698066]
 4: (ThreadPool::worker()+0xa3a) [0x577a3a]
 5: (ThreadPool::WorkThread::entry()+0xd) [0x54eefd]
 6: (()+0x7971) [0x7f8b36db9971]
 7: (clone()+0x6d) [0x7f8b3564992d]
*** Caught signal (Aborted) **

cosd-assert.tgz (1.35 MB) Anonymous, 09/09/2011 12:27 PM

History

#1 Updated by Anonymous about 8 years ago

#2 Updated by Samuel Just about 8 years ago

  • Status changed from New to Resolved

829ebaa20272f2c6a7136513fa11ef5c187bbb67

#3 Updated by Wido den Hollander about 8 years ago

  • Status changed from Resolved to Feedback

I'm still seeing this one. All my 6 OSDs went down and after starting them most of them would crash:

Oct 31 14:28:26 node07 osd.4[9081]: 7f981f270720 filestore(/var/lib/ceph/osd.4) _do_transaction on 0x172a000
Oct 31 14:28:26 node07 osd.4[9081]: 7f981f270720 filestore(/var/lib/ceph/osd.4) remove 2.1b_head/rb.0.3.000000000127/7
Oct 31 14:28:26 node07 osd.4[9081]: 7f981f270720 filestore(/var/lib/ceph/osd.4) remove 2.1b_head/rb.0.3.000000000127/7 = -2
Oct 31 14:28:26 node07 osd.4[9081]: 7f981f270720 filestore(/var/lib/ceph/osd.4) clone_range 2.1b_head/rb.0.3.000000000127/head -> 2.1b_head/rb.0.3.000000000127/7 0~4194304 to 0
Oct 31 14:28:26 node07 osd.4[9081]: 7f981f270720 filestore(/var/lib/ceph/osd.4) clone_range 2.1b_head/rb.0.3.000000000127/head -> 2.1b_head/rb.0.3.000000000127/7 0~4194304 to 0 = -2
Oct 31 14:28:26 node07 osd.4[9081]: os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '7f981f270720'#012os/FileStore.cc: 2396: FAILED assert(0 == "ENOENT on clone suggests osd bug")
Oct 31 14:28:26 node07 osd.4[9081]:  ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 3: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 4: (FileStore::mount()+0x2803) [0x6abb13]#012 5: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 6: (main()+0x2c8d) [0x4a55dd]#012 7: (__libc_start_main()+0xfd) [0x7f981d600c4d]#012 8: /usr/bin/ceph-osd() [0x4a2529]
Oct 31 14:28:26 node07 osd.4[9081]:  ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 3: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 4: (FileStore::mount()+0x2803) [0x6abb13]#012 5: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 6: (main()+0x2c8d) [0x4a55dd]#012 7: (__libc_start_main()+0xfd) [0x7f981d600c4d]#012 8: /usr/bin/ceph-osd() [0x4a2529]
Oct 31 14:28:26 node07 osd.4[9081]: *** Caught signal (Aborted) **#012 in thread 7f981f270720
Oct 31 14:28:26 node07 osd.4[9081]:  ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: /usr/bin/ceph-osd() [0x67fd94]#012 2: (()+0xf8f0) [0x7f981ee518f0]#012 3: (gsignal()+0x35) [0x7f981d615a75]#012 4: (abort()+0x180) [0x7f981d6195c0]#012 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f981decb8e5]#012 6: (()+0xcad16) [0x7f981dec9d16]#012 7: (()+0xcad43) [0x7f981dec9d43]#012 8: (()+0xcae3e) [0x7f981dec9e3e]#012 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x39f) [0x5b561f]#012 10: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 11: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 12: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 13: (FileStore::mount()+0x2803) [0x6abb13]#012 14: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 15: (main()+0x2c8d) [0x4a55dd]#012 16: (__libc_start_main()+0xfd) [0x7f981d600c4d]#012 17: /usr/bin/ceph-osd() [0x4a2529]
Oct 31 14:40:14 node03 osd.0[11148]: 7f2ccf4ca720 filestore(/var/lib/ceph/osd.0) _do_transaction on 0x18070a0
Oct 31 14:40:14 node03 osd.0[11148]: 7f2ccf4ca720 filestore(/var/lib/ceph/osd.0) remove 2.19_head/rb.0.3.000000000136/7
Oct 31 14:40:14 node03 osd.0[11148]: 7f2ccf4ca720 filestore(/var/lib/ceph/osd.0) remove 2.19_head/rb.0.3.000000000136/7 = -2
Oct 31 14:40:14 node03 osd.0[11148]: 7f2ccf4ca720 filestore(/var/lib/ceph/osd.0) clone_range 2.19_head/rb.0.3.000000000136/head -> 2.19_head/rb.0.3.000000000136/7 0~4194304 to 0
Oct 31 14:40:14 node03 osd.0[11148]: 7f2ccf4ca720 filestore(/var/lib/ceph/osd.0) clone_range 2.19_head/rb.0.3.000000000136/head -> 2.19_head/rb.0.3.000000000136/7 0~4194304 to 0 = -2
Oct 31 14:40:14 node03 osd.0[11148]: os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '7f2ccf4ca720'#012os/FileStore.cc: 2396: FAILED assert(0 == "ENOENT on clone suggests osd bug")
Oct 31 14:40:14 node03 osd.0[11148]:  ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 3: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 4: (FileStore::mount()+0x2803) [0x6abb13]#012 5: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 6: (main()+0x2c8d) [0x4a55dd]#012 7: (__libc_start_main()+0xfd) [0x7f2ccd859c4d]#012 8: /usr/bin/ceph-osd() [0x4a2529]
Oct 31 14:40:14 node03 osd.0[11148]:  ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 3: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 4: (FileStore::mount()+0x2803) [0x6abb13]#012 5: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 6: (main()+0x2c8d) [0x4a55dd]#012 7: (__libc_start_main()+0xfd) [0x7f2ccd859c4d]#012 8: /usr/bin/ceph-osd() [0x4a2529]
Oct 31 14:40:14 node03 osd.0[11148]: *** Caught signal (Aborted) **#012 in thread 7f2ccf4ca720
Oct 31 14:40:14 node03 osd.0[11148]:  ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: /usr/bin/ceph-osd() [0x67fd94]#012 2: (()+0xf8f0) [0x7f2ccf0aa8f0]#012 3: (gsignal()+0x35) [0x7f2ccd86ea75]#012 4: (abort()+0x180) [0x7f2ccd8725c0]#012 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f2cce1248e5]#012 6: (()+0xcad16) [0x7f2cce122d16]#012 7: (()+0xcad43) [0x7f2cce122d43]#012 8: (()+0xcae3e) [0x7f2cce122e3e]#012 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x39f) [0x5b561f]#012 10: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 11: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 12: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 13: (FileStore::mount()+0x2803) [0x6abb13]#012 14: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 15: (main()+0x2c8d) [0x4a55dd]#012 16: (__libc_start_main()+0xfd) [0x7f2ccd859c4d]#012 17: /usr/bin/ceph-osd() [0x4a2529]
Oct 31 14:42:37 node06 osd.3[7397]: 7fa72b6c3720 filestore(/var/lib/ceph/osd.3) _do_transaction on 0xf11000
Oct 31 14:42:37 node06 osd.3[7397]: 7fa72b6c3720 filestore(/var/lib/ceph/osd.3) remove 2.6_head/rb.0.3.00000000024c/7
Oct 31 14:42:37 node06 osd.3[7397]: 7fa72b6c3720 filestore(/var/lib/ceph/osd.3) remove 2.6_head/rb.0.3.00000000024c/7 = -2
Oct 31 14:42:37 node06 osd.3[7397]: 7fa72b6c3720 filestore(/var/lib/ceph/osd.3) clone_range 2.6_head/rb.0.3.00000000024c/head -> 2.6_head/rb.0.3.00000000024c/7 0~4194304 to 0
Oct 31 14:42:37 node06 osd.3[7397]: 7fa72b6c3720 filestore(/var/lib/ceph/osd.3) clone_range 2.6_head/rb.0.3.00000000024c/head -> 2.6_head/rb.0.3.00000000024c/7 0~4194304 to 0 = -2
Oct 31 14:42:37 node06 osd.3[7397]: os/FileStore.cc: In function 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&)', in thread '7fa72b6c3720'#012os/FileStore.cc: 2396: FAILED assert(0 == "ENOENT on clone suggests osd bug")
Oct 31 14:42:37 node06 osd.3[7397]:  ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 3: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 4: (FileStore::mount()+0x2803) [0x6abb13]#012 5: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 6: (main()+0x2c8d) [0x4a55dd]#012 7: (__libc_start_main()+0xfd) [0x7fa729a52c4d]#012 8: /usr/bin/ceph-osd() [0x4a2529]
Oct 31 14:42:37 node06 osd.3[7397]:  ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 2: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 3: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 4: (FileStore::mount()+0x2803) [0x6abb13]#012 5: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 6: (main()+0x2c8d) [0x4a55dd]#012 7: (__libc_start_main()+0xfd) [0x7fa729a52c4d]#012 8: /usr/bin/ceph-osd() [0x4a2529]
Oct 31 14:42:37 node06 osd.3[7397]: *** Caught signal (Aborted) **#012 in thread 7fa72b6c3720
Oct 31 14:42:37 node06 osd.3[7397]:  ceph version 0.37-96-g1b846f4 (commit:1b846f43f181ac31a1a1911c74b7cf3ae063f35d)#012 1: /usr/bin/ceph-osd() [0x67fd94]#012 2: (()+0xf8f0) [0x7fa72b2a38f0]#012 3: (gsignal()+0x35) [0x7fa729a67a75]#012 4: (abort()+0x180) [0x7fa729a6b5c0]#012 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fa72a31d8e5]#012 6: (()+0xcad16) [0x7fa72a31bd16]#012 7: (()+0xcad43) [0x7fa72a31bd43]#012 8: (()+0xcae3e) [0x7fa72a31be3e]#012 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x39f) [0x5b561f]#012 10: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x4e7) [0x6bdbf7]#012 11: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x6c1e75]#012 12: (JournalingObjectStore::journal_replay(unsigned long)+0x793) [0x6ccfc3]#012 13: (FileStore::mount()+0x2803) [0x6abb13]#012 14: (OSD::convertfs(std::string const&, std::string const&)+0xc8) [0x524f18]#012 15: (main()+0x2c8d) [0x4a55dd]#012 16: (__libc_start_main()+0xfd) [0x7fa729a52c4d]#012 17: /usr/bin/ceph-osd() [0x4a2529]

The backtrace shows:

(gdb) bt
#0  0x00007fa72b2a37bb in raise () from /lib/libpthread.so.0
#1  0x000000000067fa03 in reraise_fatal (signum=7397) at global/signal_handler.cc:59
#2  0x000000000067ffcc in handle_fatal_signal (signum=<value optimized out>) at global/signal_handler.cc:106
#3  <signal handler called>
#4  0x00007fa729a67a75 in raise () from /lib/libc.so.6
#5  0x00007fa729a6b5c0 in abort () from /lib/libc.so.6
#6  0x00007fa72a31d8e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#7  0x00007fa72a31bd16 in ?? () from /usr/lib/libstdc++.so.6
#8  0x00007fa72a31bd43 in std::terminate() () from /usr/lib/libstdc++.so.6
#9  0x00007fa72a31be3e in __cxa_throw () from /usr/lib/libstdc++.so.6
#10 0x00000000005b561f in ceph::__ceph_assert_fail (assertion=<value optimized out>, file=<value optimized out>, line=<value optimized out>, func=<value optimized out>) at common/assert.cc:70
#11 0x00000000006bdbf7 in FileStore::_do_transaction (this=<value optimized out>, t=...) at os/FileStore.cc:2396
#12 0x00000000006c1e75 in FileStore::do_transactions (this=0xf14000, tls=<value optimized out>, op_seq=<value optimized out>) at os/FileStore.cc:2089
#13 0x00000000006ccfc3 in JournalingObjectStore::journal_replay (this=<value optimized out>, fs_op_seq=<value optimized out>) at os/JournalingObjectStore.cc:74
#14 0x00000000006abb13 in FileStore::mount (this=0xf14000) at os/FileStore.cc:1692
#15 0x0000000000524f18 in do_convertfs (dev=..., jdev=...) at osd/OSD.cc:191
#16 OSD::convertfs (dev=..., jdev=...) at osd/OSD.cc:254
#17 0x00000000004a55dd in main (argc=<value optimized out>, argv=<value optimized out>) at ceph_osd.cc:304
(gdb)

Anything I can do to debug this further?

#4 Updated by Sage Weil about 8 years ago

Do you have the odd log preferring the restart?

#5 Updated by Sage Weil about 8 years ago

Sage Weil wrote:

Do you have the odd log preferring the restart?

Er, osd log preceeding ...

#6 Updated by Wido den Hollander about 8 years ago

Sorry for the slow response! Somehow I didn't get a e-mail update.

I do have logs preceeding the crash, but they are with a low logging.

I can search in them if you want to instead of posting them completely. Any specific lines you need?

Do note that this was a fresh installed cluster with just two RBD images on them.

#7 Updated by Josh Durgin about 8 years ago

Thrashing with the snaps workload triggered this on several osds in run 1497 today.

#8 Updated by Sage Weil almost 8 years ago

  • Status changed from Feedback to Can't reproduce

Also available in: Atom PDF