Bug #1682
mds: segfault in CInode::authority
0%
Description
From teuthology:~teuthworker/archive/nightly_coverage_2011-11-04/1469/teuthology.log:
2011-11-04T00:44:12.235 INFO:teuthology.task.ceph.mds.0.err:*** Caught signal (Segmentation fault) ** 2011-11-04T00:44:12.236 INFO:teuthology.task.ceph.mds.0.err: in thread 7fc289bc7700 2011-11-04T00:44:12.238 INFO:teuthology.task.ceph.mds.0.err: ceph version 0.37-299-g256ac72 (commit:256ac72abc54504d613f2513fd8ac0a6a1e722fa) 2011-11-04T00:44:12.238 INFO:teuthology.task.ceph.mds.0.err: 1: /tmp/cephtest/binary/usr/local/bin/ceph-mds() [0x9102a4] 2011-11-04T00:44:12.238 INFO:teuthology.task.ceph.mds.0.err: 2: (()+0xfb40) [0x7fc28d43fb40] 2011-11-04T00:44:12.238 INFO:teuthology.task.ceph.mds.0.err: 3: (CInode::authority()+0x46) [0x71c2e6] 2011-11-04T00:44:12.239 INFO:teuthology.task.ceph.mds.0.err: 4: (CDir::authority()+0x56) [0x6ef6d6] 2011-11-04T00:44:12.239 INFO:teuthology.task.ceph.mds.0.err: 5: (CInode::authority()+0x49) [0x71c2e9] 2011-11-04T00:44:12.239 INFO:teuthology.task.ceph.mds.0.err: 6: (CDir::authority()+0x56) [0x6ef6d6] 2011-11-04T00:44:12.239 INFO:teuthology.task.ceph.mds.0.err: 7: (CInode::authority()+0x49) [0x71c2e9] 2011-11-04T00:44:12.239 INFO:teuthology.task.ceph.mds.0.err: 8: (CDir::authority()+0x56) [0x6ef6d6] 2011-11-04T00:44:12.240 INFO:teuthology.task.ceph.mds.0.err: 9: (CInode::authority()+0x49) [0x71c2e9] 2011-11-04T00:44:12.240 INFO:teuthology.task.ceph.mds.0.err: 10: (CDir::authority()+0x56) [0x6ef6d6] 2011-11-04T00:44:12.240 INFO:teuthology.task.ceph.mds.0.err: 11: (CInode::authority()+0x49) [0x71c2e9] 2011-11-04T00:44:12.240 INFO:teuthology.task.ceph.mds.0.err: 12: (CDir::authority()+0x56) [0x6ef6d6] 2011-11-04T00:44:12.241 INFO:teuthology.task.ceph.mds.0.err: 13: (CInode::authority()+0x49) [0x71c2e9] 2011-11-04T00:44:12.241 INFO:teuthology.task.ceph.mds.0.err: 14: (MDCache::predirty_journal_parents(Mutation*, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0xdb7) [0x5fa2c7] 2011-11-04T00:44:12.241 INFO:teuthology.task.ceph.mds.0.err: 15: (Locker::_do_cap_update(CInode*, Capability*, int, snapid_t, MClientCaps*, MClientCaps*)+0xca3) [0x686213] 2011-11-04T00:44:12.241 INFO:teuthology.task.ceph.mds.0.err: 16: (Locker::handle_client_caps(MClientCaps*)+0x1ebd) [0x68c37d] 2011-11-04T00:44:12.241 INFO:teuthology.task.ceph.mds.0.err: 17: (Locker::dispatch(Message*)+0xb5) [0x690045] 2011-11-04T00:44:12.242 INFO:teuthology.task.ceph.mds.0.err: 18: (MDS::handle_deferrable_message(Message*)+0x13df) [0x4ac80f] 2011-11-04T00:44:12.242 INFO:teuthology.task.ceph.mds.0.err: 19: (MDS::_dispatch(Message*)+0xe9a) [0x4cba9a] 2011-11-04T00:44:12.242 INFO:teuthology.task.ceph.mds.0.err: 20: (MDS::ms_dispatch(Message*)+0xa9) [0x4cd229] 2011-11-04T00:44:12.242 INFO:teuthology.task.ceph.mds.0.err: 21: (SimpleMessenger::dispatch_entry()+0x99a) [0x81854a] 2011-11-04T00:44:12.242 INFO:teuthology.task.ceph.mds.0.err: 22: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x4956cc] 2011-11-04T00:44:12.243 INFO:teuthology.task.ceph.mds.0.err: 23: (Thread::_entry_func(void*)+0x12) [0x813092] 2011-11-04T00:44:12.243 INFO:teuthology.task.ceph.mds.0.err: 24: (()+0x7971) [0x7fc28d437971] 2011-11-04T00:44:12.243 INFO:teuthology.task.ceph.mds.0.err: 25: (clone()+0x6d) [0x7fc28becb92d] 2011-11-04T00:44:26.932 INFO:teuthology.task.ceph.mds.0.err:daemon-helper: command crashed with signal 11
History
#1 Updated by Josh Durgin about 12 years ago
Another crash is CInode::Authority happened today, although a different backtrace.
From teuthology:~teuthworker/archive/nightly_coverage_2011-11-18-2/2660/remote/ubuntu@sepia68.ceph.dreamhost.com/log/mds.0.log.gz
2011-11-18 12:38:08.714034 7fe54d18d700 mds.0.1 beacon_kill last_acked_stamp 2011-11-18 12:37:36.812900, we are laggy! *** Caught signal (Segmentation fault) ** in thread 7fe54eb92700 ceph version 0.38-199-gdedf2c4 (commit:dedf2c4a066876bdab9a0b0154196194cefc1340) 1: /tmp/cephtest/binary/usr/local/bin/ceph-mds() [0x913614] 2: (()+0xfb40) [0x7fe55260fb40] 3: (CInode::authority()+0x46) [0x71cfa6] 4: (CDir::authority()+0x56) [0x6f0396] 5: (CInode::authority()+0x49) [0x71cfa9] 6: (Locker::try_eval(SimpleLock*, bool*)+0x2a) [0x6771ea] 7: (Locker::wrlock_finish(SimpleLock*, Mutation*, bool*)+0x384) [0x680cb4] 8: (Locker::_drop_non_rdlocks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x19d) [0x68133d] 9: (Locker::drop_locks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x94) [0x6917e4] 10: (Locker::file_update_finish(CInode*, Mutation*, bool, client_t, Capability*, MClientCaps*)+0x1ce) [0x691e0e] 11: (C_Locker_FileUpdate_finish::finish(int)+0x34) [0x69e514] 12: (Context::complete(int)+0x12) [0x49f1f2] 13: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x14e) [0x7e222e] 14: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x206) [0x7d6286] 15: (Journaler::C_Flush::finish(int)+0x1d) [0x7e245d] 16: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe2e) [0x7b72de] 17: (MDS::handle_core_message(Message*)+0xebf) [0x4cb72f] 18: (MDS::_dispatch(Message*)+0x3c) [0x4cb88c] 19: (MDS::ms_dispatch(Message*)+0xa5) [0x4cde85] 20: (SimpleMessenger::dispatch_entry()+0x99a) [0x81777a] 21: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x49630c] 22: (Thread::_entry_func(void*)+0x12) [0x8122c2] 23: (()+0x7971) [0x7fe552607971] 24: (clone()+0x6d) [0x7fe550e9692d]
#2 Updated by Sage Weil about 12 years ago
- Priority changed from Normal to High
#3 Updated by Sage Weil about 12 years ago
- Assignee set to Sage Weil
#4 Updated by Sage Weil about 12 years ago
- translation missing: en.field_position set to 5
#5 Updated by Sage Weil about 12 years ago
Hrm, this has me stumped.
The log leading up is
2011-11-19 21:02:18.741494 7fa5a5d45700 mds.0.locker revoking pAsxLsXsxFsxcrwb on [inode 1000000c1a8 [2,head] /client.0/tmp/usr.2/include/c++/4.4/ext/vstring_fwd.h auth v382 s=3216 n(v0 b3216 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4103=0-4194304@1} caps={4103=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@2},l=4103 | caps 0xe66dd40] 2011-11-19 21:02:18.741515 7fa5a5d45700 mds.0.locker eval 2496 [inode 1000000c1a8 [2,head] /client.0/tmp/usr.2/include/c++/4.4/ext/vstring_fwd.h auth v382 needsrecover s=3216 n(v0 b3216 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4103=0-4194304@1} caps={4103=-/pAsxXsxFsxcrwb@3},l=4103 | caps 0xe66dd40] 2011-11-19 21:02:18.741524 7fa5a5d45700 mds.0.locker eval doesn't want loner 2011-11-19 21:02:18.741544 7fa5a5d45700 mds.0.locker file_eval wanted= loner_wanted= other_wanted= filelock=(ifile excl) on [inode 1000000c1a8 [2,head] /client.0/tmp/usr.2/include/c++/4.4/ext/vstring_fwd.h auth v382 needsrecover s=3216 n(v0 b3216 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4103=0-4194304@1} caps={4103=-/pAsxXsxFsxcrwb@3},l=4103(-1) | caps 0xe66dd40] 2011-11-19 21:02:18.741565 7fa5a5d45700 mds.0.locker simple_sync on (ifile excl) on [inode 1000000c1a8 [2,head] /client.0/tmp/usr.2/include/c++/4.4/ext/vstring_fwd.h auth v382 needsrecover s=3216 n(v0 b3216 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4103=0-4194304@1} caps={4103=-/pAsxXsxFsxcrwb@3},l=4103(-1) | caps 0xe66dd40] 2011-11-19 21:02:18.741586 7fa5a5d45700 mds.0.cache queue_file_recover [inode 1000000c1a8 [2,head] /client.0/tmp/usr.2/include/c++/4.4/ext/vstring_fwd.h auth v382 needsrecover s=3216 n(v0 b3216 1=1+0) (iauth excl) (ifile excl->sync) (ixattr excl) (iversion lock) cr={4103=0-4194304@1} caps={4103=-/pAsxXsxFsxcrwb@3},l=4103(-1) | caps 0xe66dd40] 2011-11-19 21:02:18.741597 7fa5a5d45700 mds.0.cache.snaprealm(1 seq 1 0x1ca3b40) get_snaps (seq 1 cached_seq 1) 2011-11-19 21:02:18.741605 7fa5a5d45700 mds.0.cache snaps in [2,head] are 2011-11-19 21:02:18.741625 7fa5a5d45700 mds.0.cache.ino(1000000c1a8) auth_pin by 0x1ca6200 on [inode 1000000c1a8 [2,head] /client.0/tmp/usr.2/include/c++/4.4/ext/vstring_fwd.h auth v382 ap=1+0 recovering s=3216 n(v0 b3216 1=1+0) (iauth excl) (ifile excl->sync) (ixattr excl) (iversion lock) cr={4103=0-4194304@1} caps={4103=-/pAsxXsxFsxcrwb@3},l=4103(-1) | caps authpin 0xe66dd40] now 1+0 2011-11-19 21:02:18.741636 7fa5a5d45700 mds.0.cache do_file_recover 1186 queued, 5 recovering 2011-11-19 21:02:18.741656 7fa5a5d45700 mds.0.cache.ino(1000000c1a8) auth_pin by 0xe66e490 on [inode 1000000c1a8 [2,head] /client.0/tmp/usr.2/include/c++/4.4/ext/vstring_fwd.h auth v382 ap=2+0 recovering s=3216 n(v0 b3216 1=1+0) (iauth excl) (ifile excl->sync) (ixattr excl) (iversion lock) cr={4103=0-4194304@1} caps={4103=-/pAsxXsxFsxcrwb@3},l=4103(-1) | caps authpin 0xe66dd40] now 2+0 2011-11-19 21:02:18.741677 7fa5a5d45700 mds.0.locker simple_eval (iauth excl) on [inode 1000000c1a8 [2,head] /client.0/tmp/usr.2/include/c++/4.4/ext/vstring_fwd.h auth v382 ap=2+0 recovering s=3216 n(v0 b3216 1=1+0) (iauth excl) (ifile excl->sync) (ixattr excl) (iversion lock) cr={4103=0-4194304@1} caps={4103=-/pAsxXsxFsxcrwb@3},l=4103(-1) | caps authpin 0xe66dd40] 2011-11-19 21:02:18.741698 7fa5a5d45700 mds.0.locker simple_eval stable, syncing (iauth excl) on [inode 1000000c1a8 [2,head] /client.0/tmp/usr.2/include/c++/4.4/ext/vstring_fwd.h auth v382 ap=2+0 recovering s=3216 n(v0 b3216 1=1+0) (iauth excl) (ifile excl->sync) (ixattr excl) (iversion lock) cr={4103=0-4194304@1} caps={4103=-/pAsxXsxFsxcrwb@3},l=4103(-1) | caps authpin 0xe66dd40] 2011-11-19 21:02:18.741729 7fa5a5d45700 mds.0.locker simple_sync on (iauth excl) on [inode 1000000c1a8 [2,head] /client.0/tmp/usr.2/include/c++/4.4/ext/vstring_fwd.h auth v382 ap=2+0 recovering s=3216 n(v0 b3216 1=1+0) (iauth excl) (ifile excl->sync) (ixattr excl) (iversion lock) cr={4103=0-4194304@1} caps={4103=-/pAsxXsxFsxcrwb@3},l=4103(-1) | caps authpin 0xe66dd40] *** Caught signal (Segmentation fault) ** in thread 7fa5a5d45700 ceph version 0.38-204-g9920a16 (commit:9920a168c59807083019c62fdf381434edea12e5) 1: /tmp/cephtest/binary/usr/local/bin/ceph-mds() [0x913894] 2: (()+0xfb40) [0x7fa5ab1c7b40] 3: (CInode::make_path_string(std::string&, bool, CDentry*)+0x1d) [0x726f0d] 4: (CDentry::make_path_string(std::string&)+0x30) [0x6e9130] 5: (CInode::make_path_string(std::string&, bool, CDentry*)+0x44) [0x726f34] 6: (CDentry::make_path_string(std::string&)+0x30) [0x6e9130] 7: (CInode::make_path_string(std::string&, bool, CDentry*)+0x44) [0x726f34] 8: (CDentry::make_path_string(std::string&)+0x30) [0x6e9130] 9: (CInode::make_path_string(std::string&, bool, CDentry*)+0x44) [0x726f34] 10: (CDentry::make_path_string(std::string&)+0x30) [0x6e9130] 11: (CInode::make_path_string(std::string&, bool, CDentry*)+0x44) [0x726f34] 12: (CDentry::make_path_string(std::string&)+0x30) [0x6e9130] 13: (CInode::make_path_string(std::string&, bool, CDentry*)+0x44) [0x726f34] 14: (CDentry::make_path_string(std::string&)+0x30) [0x6e9130] 15: (CInode::make_path_string(std::string&, bool, CDentry*)+0x44) [0x726f34] 16: (CInode::make_path_string_projected(std::string&)+0x2c) [0x73854c] 17: (operator<<(std::ostream&, CInode&)+0x32) [0x738822] 18: (CInode::print(std::ostream&)+0x1a) [0x73fdea] 19: (Locker::simple_eval(SimpleLock*, bool*)+0x10b) [0x6769db] 20: (Locker::eval(SimpleLock*, bool*)+0x2a) [0x67716a] 21: (Locker::eval(CInode*, int)+0x8ed) [0x682add] 22: (Locker::try_eval(MDSCacheObject*, int)+0x50b) [0x684f9b] 23: (Locker::revoke_stale_caps(Session*)+0x35d) [0x685c0d] 24: (Server::find_idle_sessions()+0x891) [0x528c01] 25: (MDS::tick()+0x470) [0x4abfc0] 26: (MDS::C_MDS_Tick::finish(int)+0x24) [0x4df9f4] 27: (SafeTimer::timer_thread()+0x4b0) [0x886510] 28: (SafeTimerThread::entry()+0x15) [0x88a7b5] 29: (Thread::_entry_func(void*)+0x12) [0x812542] 30: (()+0x7971) [0x7fa5ab1bf971] 31: (clone()+0x6d) [0x7fa5a9a4e92d] ~
We crash because a CDir is zeroed out in memory:
$38 = (CDentry * const) 0x11bc1720 (gdb) p this->dir $39 = (CDir *) 0x1ce6f80 (gdb) p this->dir->inode $40 = (CInode *) 0x0
(in fact, all of *this->dir is zeros.
The dentry is #1/client.0/tmp/usr.2 and the dir is /client.0/tmp, which you'll notice was just successfully printed in the previous line of the log.
Looking through the prior simple_sync() call, and the call chain leading up to the crash (we just finished an eval on authlock and are doing linklock now), I don't see anything that could trigger a close_dirfrag.
Going to bump the logging up to 20 in the hopes that that will have a bit more info, but I suspect something else ugly is going on. May need to run this workload through valgrind?
#6 Updated by Sage Weil about 12 years ago
- Assignee deleted (
Sage Weil)
#7 Updated by Sage Weil about 12 years ago
- Target version changed from v0.39 to v0.40
#8 Updated by Sage Weil almost 12 years ago
- translation missing: en.field_position deleted (
37) - translation missing: en.field_position set to 1043
#9 Updated by Josh Durgin almost 12 years ago
Not sure if this is the same underlying problem, but here's another CInode::authority crash from teuthology:~teuthworker/archive/nightly_coverage_2011-12-29-b/5388/remote/ubuntu@sepia36.ceph.dreamhost.com/log/mds.0.log.gz during the locking test:
2011-12-29 13:21:51.535061 2011-12-29 13:21:51.689910 7fd8f6b61700 mds.0.1 ms_handle_reset on 10.3.14.170:0/923031963 2011-12-29 13:21:54.617497 7fd8f6b61700 mds.0.1 ms_handle_reset on 10.3.14.174:0/113717421 *** Caught signal (Segmentation fault) ** in thread 7fd8f6b61700 ceph version 0.39-171-gdcedda8 (commit:dcedda84d0e1f69af985c301276c67c1b11e7efc) 1: /tmp/cephtest/binary/usr/local/bin/ceph-mds() [0x917f54] 2: (()+0xfb40) [0x7fd8fa5deb40] 3: (CInode::authority()+0x46) [0x71dac6] 4: (Locker::try_eval(SimpleLock*, bool*)+0x2a) [0x677b9a] 5: (Locker::wrlock_finish(SimpleLock*, Mutation*, bool*)+0x384) [0x681664] 6: (Locker::_drop_non_rdlocks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x19d) [0x681ced] 7: (Locker::drop_locks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x94) [0x6827d4] 8: (Locker::file_update_finish(CInode*, Mutation*, bool, client_t, Capability*, MClientCaps*)+0x1ce) [0x688c9e] 9: (C_Locker_FileUpdate_finish::finish(int)+0x34) [0x69e9c4] 10: (Context::complete(int)+0x12) [0x49f962] 11: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x14e) [0x7e34be] 12: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x206) [0x7d74e6] 13: (Journaler::C_Flush::finish(int)+0x1d) [0x7e36ed] 14: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xd6d) [0x7b35dd] 15: (MDS::handle_core_message(Message*)+0xebf) [0x4cbeaf] 16: (MDS::_dispatch(Message*)+0x3c) [0x4cc00c] 17: (MDS::ms_dispatch(Message*)+0xa5) [0x4ce605] 18: (SimpleMessenger::dispatch_entry()+0x99a) [0x81b77a] 19: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x4969bc] 20: (Thread::_entry_func(void*)+0x12) [0x8162c2] 21: (()+0x7971) [0x7fd8fa5d6971] 22: (clone()+0x6d) [0x7fd8f8e6592d]
#10 Updated by Mark Nelson almost 12 years ago
- File ceph_1682_debug.txt View added
This probably isn't all that useful for anyone who knows the code well, but I threw together a quick run down of places where close_dirfrags gets called while browsing through the code. Like Sage said might be best to just try running the mds through valgrind and see if anything turns up.
#11 Updated by Sage Weil almost 12 years ago
hit this again:
2012-01-06T20:22:15.808 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x26fc050> 2012-01-06T20:22:15.808 INFO:teuthology.task.ceph:Shutting down mds daemons... 2012-01-06T20:22:15.810 INFO:teuthology.task.ceph.mds.0.err:*** Caught signal (Terminated) ** 2012-01-06T20:22:15.810 INFO:teuthology.task.ceph.mds.0.err: in thread 7f4e88892780. Shutting down. 2012-01-06T20:22:15.815 INFO:teuthology.task.ceph.mds.0.err:*** Caught signal (Segmentation fault) ** 2012-01-06T20:22:15.815 INFO:teuthology.task.ceph.mds.0.err: in thread 7f4e849f5700 2012-01-06T20:22:15.817 INFO:teuthology.task.ceph.mds.0.err: ceph version 0.39-263-g3c60e80 (commit:3c60e8046d0e64c0df01a6fced0d65f9788da8d8) 2012-01-06T20:22:15.817 INFO:teuthology.task.ceph.mds.0.err: 1: /tmp/cephtest/binary/usr/local/bin/ceph-mds() [0x918ef4] 2012-01-06T20:22:15.817 INFO:teuthology.task.ceph.mds.0.err: 2: (()+0xfb40) [0x7f4e88472b40] 2012-01-06T20:22:15.818 INFO:teuthology.task.ceph.mds.0.err: 3: (CInode::authority()+0x46) [0x71dbc6] 2012-01-06T20:22:15.818 INFO:teuthology.task.ceph.mds.0.err: 4: (Locker::try_eval(SimpleLock*, bool*)+0x2a) [0x677c9a] 2012-01-06T20:22:15.818 INFO:teuthology.task.ceph.mds.0.err: 5: (Locker::wrlock_finish(SimpleLock*, Mutation*, bool*)+0x384) [0x681764] 2012-01-06T20:22:15.818 INFO:teuthology.task.ceph.mds.0.err: 6: (Locker::_drop_non_rdlocks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x19d) [0x681ded] 2012-01-06T20:22:15.818 INFO:teuthology.task.ceph.mds.0.err: 7: (Locker::drop_locks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x94) [0x6828d4] 2012-01-06T20:22:15.819 INFO:teuthology.task.ceph.mds.0.err: 8: (Locker::file_update_finish(CInode*, Mutation*, bool, client_t, Capability*, MClientCaps*)+0x1ce) [0x688d9e] 2012-01-06T20:22:15.819 INFO:teuthology.task.ceph.mds.0.err: 9: (C_Locker_FileUpdate_finish::finish(int)+0x34) [0x69eac4] 2012-01-06T20:22:15.819 INFO:teuthology.task.ceph.mds.0.err: 10: (Context::complete(int)+0x12) [0x49fa72] 2012-01-06T20:22:15.819 INFO:teuthology.task.ceph.mds.0.err: 11: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x14e) [0x7e35de] 2012-01-06T20:22:15.819 INFO:teuthology.task.ceph.mds.0.err: 12: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x206) [0x7d7606] 2012-01-06T20:22:15.819 INFO:teuthology.task.ceph.mds.0.err: 13: (Journaler::C_Flush::finish(int)+0x1d) [0x7e380d] 2012-01-06T20:22:15.820 INFO:teuthology.task.ceph.mds.0.err: 14: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xd6f) [0x7b36ff] 2012-01-06T20:22:15.820 INFO:teuthology.task.ceph.mds.0.err: 15: (MDS::handle_core_message(Message*)+0xebf) [0x4cbf8f] 2012-01-06T20:22:15.820 INFO:teuthology.task.ceph.mds.0.err: 16: (MDS::_dispatch(Message*)+0x3c) [0x4cc0ec] 2012-01-06T20:22:15.820 INFO:teuthology.task.ceph.mds.0.err: 17: (MDS::ms_dispatch(Message*)+0xa5) [0x4ce6e5] 2012-01-06T20:22:15.834 INFO:teuthology.task.ceph.mds.0.err: 18: (SimpleMessenger::dispatch_entry()+0x99a) [0x81c2da] 2012-01-06T20:22:15.835 INFO:teuthology.task.ceph.mds.0.err: 19: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x496acc] 2012-01-06T20:22:15.835 INFO:teuthology.task.ceph.mds.0.err: 20: (Thread::_entry_func(void*)+0x12) [0x816e22] 2012-01-06T20:22:15.835 INFO:teuthology.task.ceph.mds.0.err: 21: (()+0x7971) [0x7f4e8846a971] 2012-01-06T20:22:15.835 INFO:teuthology.task.ceph.mds.0.err: 22: (clone()+0x6d) [0x7f4e86cf992d] 2012-01-06T20:22:15.841 INFO:teuthology.task.ceph.mds.0.err:daemon-helper: command crashed with signal 11
on job
kernel: branch: master nuke-on-error: true overrides: ceph: branch: testing btrfs: 1 coverage: true log-whitelist: - clocks not synchronized roles: - - mon.0 - mds.0 - osd.0 - osd.1 - - mon.1 - client.1 - - mon.2 - client.0 tasks: - ceph: null - kclient: null - locktest: - client.0 - client.1
ubuntu@teuthology:/var/lib/teuthworker/archive/testing-2012-01-06/6533
#12 Updated by Sage Weil almost 12 years ago
- Priority changed from High to Normal
#13 Updated by Sage Weil almost 12 years ago
- Target version deleted (
v0.40) - translation missing: en.field_position deleted (
1090) - translation missing: en.field_position set to 216
#14 Updated by Sage Weil almost 12 years ago
happened again on /var/lib/teuthworker/archive/nightly_coverage_2012-01-13-a/7335
2012-01-13T02:51:06.298 INFO:teuthology.task.ceph:Shutting down mds daemons... 2012-01-13T02:51:06.300 INFO:teuthology.task.ceph.mds.0.err:*** Caught signal (Terminated) ** 2012-01-13T02:51:06.300 INFO:teuthology.task.ceph.mds.0.err: in thread 7fe6353cc780. Shutting down. 2012-01-13T02:51:06.311 INFO:teuthology.task.ceph.mds.0.err:*** Caught signal (Segmentation fault) ** 2012-01-13T02:51:06.311 INFO:teuthology.task.ceph.mds.0.err: in thread 7fe63152f700 2012-01-13T02:51:06.313 INFO:teuthology.task.ceph.mds.0.err: ceph version 0.39-323-g845aa53 (commit:845aa534e3e0ddc4f652879c473f011fff9c573b) 2012-01-13T02:51:06.313 INFO:teuthology.task.ceph.mds.0.err: 1: /tmp/cephtest/binary/usr/local/bin/ceph-mds() [0x91cfe4] 2012-01-13T02:51:06.313 INFO:teuthology.task.ceph.mds.0.err: 2: (()+0xfb40) [0x7fe634facb40] 2012-01-13T02:51:06.313 INFO:teuthology.task.ceph.mds.0.err: 3: (CInode::authority()+0x46) [0x71e706] 2012-01-13T02:51:06.313 INFO:teuthology.task.ceph.mds.0.err: 4: (Locker::try_eval(SimpleLock*, bool*)+0x2a) [0x6787da] 2012-01-13T02:51:06.314 INFO:teuthology.task.ceph.mds.0.err: 5: (Locker::wrlock_finish(SimpleLock*, Mutation*, bool*)+0x384) [0x6822a4] 2012-01-13T02:51:06.314 INFO:teuthology.task.ceph.mds.0.err: 6: (Locker::_drop_non_rdlocks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x19d) [0x68292d] 2012-01-13T02:51:06.314 INFO:teuthology.task.ceph.mds.0.err: 7: (Locker::drop_locks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x94) [0x683414] 2012-01-13T02:51:06.314 INFO:teuthology.task.ceph.mds.0.err: 8: (Locker::file_update_finish(CInode*, Mutation*, bool, client_t, Capability*, MClientCaps*)+0x1ce) [0x6898de] 2012-01-13T02:51:06.314 INFO:teuthology.task.ceph.mds.0.err: 9: (C_Locker_FileUpdate_finish::finish(int)+0x34) [0x69f604] 2012-01-13T02:51:06.315 INFO:teuthology.task.ceph.mds.0.err: 10: (Context::complete(int)+0x12) [0x4a0252] 2012-01-13T02:51:06.315 INFO:teuthology.task.ceph.mds.0.err: 11: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x14e) [0x7e42ce] 2012-01-13T02:51:06.315 INFO:teuthology.task.ceph.mds.0.err: 12: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x206) [0x7d82f6] 2012-01-13T02:51:06.315 INFO:teuthology.task.ceph.mds.0.err: 13: (Journaler::C_Flush::finish(int)+0x1d) [0x7e44fd] 2012-01-13T02:51:06.315 INFO:teuthology.task.ceph.mds.0.err: 14: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xeb3) [0x7b4393] 2012-01-13T02:51:06.316 INFO:teuthology.task.ceph.mds.0.err: 15: (MDS::handle_core_message(Message*)+0xedf) [0x4c418f] 2012-01-13T02:51:06.316 INFO:teuthology.task.ceph.mds.0.err: 16: (MDS::_dispatch(Message*)+0x3c) [0x4c6cdc] 2012-01-13T02:51:06.316 INFO:teuthology.task.ceph.mds.0.err: 17: (MDS::ms_dispatch(Message*)+0xa9) [0x4c92c9] 2012-01-13T02:51:06.316 INFO:teuthology.task.ceph.mds.0.err: 18: (SimpleMessenger::dispatch_entry()+0x99a) [0x81ceba] 2012-01-13T02:51:06.316 INFO:teuthology.task.ceph.mds.0.err: 19: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x4972ac] 2012-01-13T02:51:06.316 INFO:teuthology.task.ceph.mds.0.err: 20: (Thread::_entry_func(void*)+0x12) [0x817a02] 2012-01-13T02:51:06.317 INFO:teuthology.task.ceph.mds.0.err: 21: (()+0x7971) [0x7fe634fa4971] 2012-01-13T02:51:06.317 INFO:teuthology.task.ceph.mds.0.err: 22: (clone()+0x6d) [0x7fe63383392d] 2012-01-13T02:51:06.464 INFO:teuthology.task.ceph.mds.0.err:daemon-helper: command crashed with signal 11 kernel: sha1: 28fe722b3fbdd8f891ef7c07151b1272f8e936f2 nuke-on-error: true overrides: ceph: btrfs: 1 coverage: true log-whitelist: - clocks not synchronized sha1: 845aa534e3e0ddc4f652879c473f011fff9c573b roles: - - mon.0 - mon.1 - mon.2 - mds.0 - osd.0 - osd.1 - - client.1 - - client.0 tasks: - chef: null - ceph: null - kclient: null - locktest: - client.0 - client.1
#15 Updated by Sage Weil almost 12 years ago
- Status changed from New to Resolved
calling this resolved too.
#16 Updated by John Spray about 7 years ago
- Project changed from Ceph to CephFS
- Category deleted (
1)
Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.