Project

General

Profile

Actions

Bug #1682

closed

mds: segfault in CInode::authority

Added by Josh Durgin over 12 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

From teuthology:~teuthworker/archive/nightly_coverage_2011-11-04/1469/teuthology.log:

2011-11-04T00:44:12.235 INFO:teuthology.task.ceph.mds.0.err:*** Caught signal (Segmentation fault) **
2011-11-04T00:44:12.236 INFO:teuthology.task.ceph.mds.0.err: in thread 7fc289bc7700
2011-11-04T00:44:12.238 INFO:teuthology.task.ceph.mds.0.err: ceph version 0.37-299-g256ac72 (commit:256ac72abc54504d613f2513fd8ac0a6a1e722fa)
2011-11-04T00:44:12.238 INFO:teuthology.task.ceph.mds.0.err: 1: /tmp/cephtest/binary/usr/local/bin/ceph-mds() [0x9102a4]
2011-11-04T00:44:12.238 INFO:teuthology.task.ceph.mds.0.err: 2: (()+0xfb40) [0x7fc28d43fb40]
2011-11-04T00:44:12.238 INFO:teuthology.task.ceph.mds.0.err: 3: (CInode::authority()+0x46) [0x71c2e6]
2011-11-04T00:44:12.239 INFO:teuthology.task.ceph.mds.0.err: 4: (CDir::authority()+0x56) [0x6ef6d6]
2011-11-04T00:44:12.239 INFO:teuthology.task.ceph.mds.0.err: 5: (CInode::authority()+0x49) [0x71c2e9]
2011-11-04T00:44:12.239 INFO:teuthology.task.ceph.mds.0.err: 6: (CDir::authority()+0x56) [0x6ef6d6]
2011-11-04T00:44:12.239 INFO:teuthology.task.ceph.mds.0.err: 7: (CInode::authority()+0x49) [0x71c2e9]
2011-11-04T00:44:12.239 INFO:teuthology.task.ceph.mds.0.err: 8: (CDir::authority()+0x56) [0x6ef6d6]
2011-11-04T00:44:12.240 INFO:teuthology.task.ceph.mds.0.err: 9: (CInode::authority()+0x49) [0x71c2e9]
2011-11-04T00:44:12.240 INFO:teuthology.task.ceph.mds.0.err: 10: (CDir::authority()+0x56) [0x6ef6d6]
2011-11-04T00:44:12.240 INFO:teuthology.task.ceph.mds.0.err: 11: (CInode::authority()+0x49) [0x71c2e9]
2011-11-04T00:44:12.240 INFO:teuthology.task.ceph.mds.0.err: 12: (CDir::authority()+0x56) [0x6ef6d6]
2011-11-04T00:44:12.241 INFO:teuthology.task.ceph.mds.0.err: 13: (CInode::authority()+0x49) [0x71c2e9]
2011-11-04T00:44:12.241 INFO:teuthology.task.ceph.mds.0.err: 14: (MDCache::predirty_journal_parents(Mutation*, EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0xdb7) [0x5fa2c7]
2011-11-04T00:44:12.241 INFO:teuthology.task.ceph.mds.0.err: 15: (Locker::_do_cap_update(CInode*, Capability*, int, snapid_t, MClientCaps*, MClientCaps*)+0xca3) [0x686213]
2011-11-04T00:44:12.241 INFO:teuthology.task.ceph.mds.0.err: 16: (Locker::handle_client_caps(MClientCaps*)+0x1ebd) [0x68c37d]
2011-11-04T00:44:12.241 INFO:teuthology.task.ceph.mds.0.err: 17: (Locker::dispatch(Message*)+0xb5) [0x690045]
2011-11-04T00:44:12.242 INFO:teuthology.task.ceph.mds.0.err: 18: (MDS::handle_deferrable_message(Message*)+0x13df) [0x4ac80f]
2011-11-04T00:44:12.242 INFO:teuthology.task.ceph.mds.0.err: 19: (MDS::_dispatch(Message*)+0xe9a) [0x4cba9a]
2011-11-04T00:44:12.242 INFO:teuthology.task.ceph.mds.0.err: 20: (MDS::ms_dispatch(Message*)+0xa9) [0x4cd229]
2011-11-04T00:44:12.242 INFO:teuthology.task.ceph.mds.0.err: 21: (SimpleMessenger::dispatch_entry()+0x99a) [0x81854a]
2011-11-04T00:44:12.242 INFO:teuthology.task.ceph.mds.0.err: 22: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x4956cc]
2011-11-04T00:44:12.243 INFO:teuthology.task.ceph.mds.0.err: 23: (Thread::_entry_func(void*)+0x12) [0x813092]
2011-11-04T00:44:12.243 INFO:teuthology.task.ceph.mds.0.err: 24: (()+0x7971) [0x7fc28d437971]
2011-11-04T00:44:12.243 INFO:teuthology.task.ceph.mds.0.err: 25: (clone()+0x6d) [0x7fc28becb92d]
2011-11-04T00:44:26.932 INFO:teuthology.task.ceph.mds.0.err:daemon-helper: command crashed with signal 11

Files

ceph_1682_debug.txt (10.3 KB) ceph_1682_debug.txt Mark Nelson, 01/02/2012 08:01 PM
Actions #1

Updated by Josh Durgin over 12 years ago

Another crash is CInode::Authority happened today, although a different backtrace.
From teuthology:~teut/log/mds.0.log.gz

2011-11-18 12:38:08.714034 7fe54d18d700 mds.0.1 beacon_kill last_acked_stamp 2011-11-18 12:37:36.812900, we are laggy!
*** Caught signal (Segmentation fault) **
 in thread 7fe54eb92700
 ceph version 0.38-199-gdedf2c4 (commit:dedf2c4a066876bdab9a0b0154196194cefc1340)
 1: /tmp/cephtest/binary/usr/local/bin/ceph-mds() [0x913614]
 2: (()+0xfb40) [0x7fe55260fb40]
 3: (CInode::authority()+0x46) [0x71cfa6]
 4: (CDir::authority()+0x56) [0x6f0396]
 5: (CInode::authority()+0x49) [0x71cfa9]
 6: (Locker::try_eval(SimpleLock*, bool*)+0x2a) [0x6771ea]
 7: (Locker::wrlock_finish(SimpleLock*, Mutation*, bool*)+0x384) [0x680cb4]
 8: (Locker::_drop_non_rdlocks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x19d) [0x68133d]
 9: (Locker::drop_locks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x94) [0x6917e4]
 10: (Locker::file_update_finish(CInode*, Mutation*, bool, client_t, Capability*, MClientCaps*)+0x1ce) [0x691e0e]
 11: (C_Locker_FileUpdate_finish::finish(int)+0x34) [0x69e514]
 12: (Context::complete(int)+0x12) [0x49f1f2]
 13: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x14e) [0x7e222e]
 14: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x206) [0x7d6286]
 15: (Journaler::C_Flush::finish(int)+0x1d) [0x7e245d]
 16: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xe2e) [0x7b72de]
 17: (MDS::handle_core_message(Message*)+0xebf) [0x4cb72f]
 18: (MDS::_dispatch(Message*)+0x3c) [0x4cb88c]
 19: (MDS::ms_dispatch(Message*)+0xa5) [0x4cde85]
 20: (SimpleMessenger::dispatch_entry()+0x99a) [0x81777a]
 21: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x49630c]
 22: (Thread::_entry_func(void*)+0x12) [0x8122c2]
 23: (()+0x7971) [0x7fe552607971]
 24: (clone()+0x6d) [0x7fe550e9692d]
Actions #2

Updated by Sage Weil over 12 years ago

  • Priority changed from Normal to High
Actions #3

Updated by Sage Weil over 12 years ago

  • Assignee set to Sage Weil
Actions #4

Updated by Sage Weil over 12 years ago

  • Translation missing: en.field_position set to 5
Actions #5

Updated by Sage Weil over 12 years ago

Hrm, this has me stumped.

The log leading up is

2011-11-19 21:02:18.741494 7fa5a5d45700 mds.0.locker  revoking pAsxLsXsxFsxcrwb on [inode 1000000c1a8 [2,head] /client.0/tmp/usr.2/include/c++/4.4/ext/vstring_fwd.h auth v382 s=3216 n(v0 b3216 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4103=0-4194304@1} caps={4103=pAsxLsXsxFsxcrwb/pAsxXsxFsxcrwb@2},l=4103 | caps 0xe66dd40]
2011-11-19 21:02:18.741515 7fa5a5d45700 mds.0.locker eval 2496 [inode 1000000c1a8 [2,head] /client.0/tmp/usr.2/include/c++/4.4/ext/vstring_fwd.h auth v382 needsrecover s=3216 n(v0 b3216 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4103=0-4194304@1} caps={4103=-/pAsxXsxFsxcrwb@3},l=4103 | caps 0xe66dd40]
2011-11-19 21:02:18.741524 7fa5a5d45700 mds.0.locker eval doesn't want loner
2011-11-19 21:02:18.741544 7fa5a5d45700 mds.0.locker file_eval wanted= loner_wanted= other_wanted=  filelock=(ifile excl) on [inode 1000000c1a8 [2,head] /client.0/tmp/usr.2/include/c++/4.4/ext/vstring_fwd.h auth v382 needsrecover s=3216 n(v0 b3216 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4103=0-4194304@1} caps={4103=-/pAsxXsxFsxcrwb@3},l=4103(-1) | caps 0xe66dd40]
2011-11-19 21:02:18.741565 7fa5a5d45700 mds.0.locker simple_sync on (ifile excl) on [inode 1000000c1a8 [2,head] /client.0/tmp/usr.2/include/c++/4.4/ext/vstring_fwd.h auth v382 needsrecover s=3216 n(v0 b3216 1=1+0) (iauth excl) (ifile excl) (ixattr excl) (iversion lock) cr={4103=0-4194304@1} caps={4103=-/pAsxXsxFsxcrwb@3},l=4103(-1) | caps 0xe66dd40]
2011-11-19 21:02:18.741586 7fa5a5d45700 mds.0.cache queue_file_recover [inode 1000000c1a8 [2,head] /client.0/tmp/usr.2/include/c++/4.4/ext/vstring_fwd.h auth v382 needsrecover s=3216 n(v0 b3216 1=1+0) (iauth excl) (ifile excl->sync) (ixattr excl) (iversion lock) cr={4103=0-4194304@1} caps={4103=-/pAsxXsxFsxcrwb@3},l=4103(-1) | caps 0xe66dd40]
2011-11-19 21:02:18.741597 7fa5a5d45700  mds.0.cache.snaprealm(1 seq 1 0x1ca3b40) get_snaps  (seq 1 cached_seq 1)
2011-11-19 21:02:18.741605 7fa5a5d45700 mds.0.cache  snaps in [2,head] are 
2011-11-19 21:02:18.741625 7fa5a5d45700 mds.0.cache.ino(1000000c1a8) auth_pin by 0x1ca6200 on [inode 1000000c1a8 [2,head] /client.0/tmp/usr.2/include/c++/4.4/ext/vstring_fwd.h auth v382 ap=1+0 recovering s=3216 n(v0 b3216 1=1+0) (iauth excl) (ifile excl->sync) (ixattr excl) (iversion lock) cr={4103=0-4194304@1} caps={4103=-/pAsxXsxFsxcrwb@3},l=4103(-1) | caps authpin 0xe66dd40] now 1+0
2011-11-19 21:02:18.741636 7fa5a5d45700 mds.0.cache do_file_recover 1186 queued, 5 recovering
2011-11-19 21:02:18.741656 7fa5a5d45700 mds.0.cache.ino(1000000c1a8) auth_pin by 0xe66e490 on [inode 1000000c1a8 [2,head] /client.0/tmp/usr.2/include/c++/4.4/ext/vstring_fwd.h auth v382 ap=2+0 recovering s=3216 n(v0 b3216 1=1+0) (iauth excl) (ifile excl->sync) (ixattr excl) (iversion lock) cr={4103=0-4194304@1} caps={4103=-/pAsxXsxFsxcrwb@3},l=4103(-1) | caps authpin 0xe66dd40] now 2+0
2011-11-19 21:02:18.741677 7fa5a5d45700 mds.0.locker simple_eval (iauth excl) on [inode 1000000c1a8 [2,head] /client.0/tmp/usr.2/include/c++/4.4/ext/vstring_fwd.h auth v382 ap=2+0 recovering s=3216 n(v0 b3216 1=1+0) (iauth excl) (ifile excl->sync) (ixattr excl) (iversion lock) cr={4103=0-4194304@1} caps={4103=-/pAsxXsxFsxcrwb@3},l=4103(-1) | caps authpin 0xe66dd40]
2011-11-19 21:02:18.741698 7fa5a5d45700 mds.0.locker simple_eval stable, syncing (iauth excl) on [inode 1000000c1a8 [2,head] /client.0/tmp/usr.2/include/c++/4.4/ext/vstring_fwd.h auth v382 ap=2+0 recovering s=3216 n(v0 b3216 1=1+0) (iauth excl) (ifile excl->sync) (ixattr excl) (iversion lock) cr={4103=0-4194304@1} caps={4103=-/pAsxXsxFsxcrwb@3},l=4103(-1) | caps authpin 0xe66dd40]
2011-11-19 21:02:18.741729 7fa5a5d45700 mds.0.locker simple_sync on (iauth excl) on [inode 1000000c1a8 [2,head] /client.0/tmp/usr.2/include/c++/4.4/ext/vstring_fwd.h auth v382 ap=2+0 recovering s=3216 n(v0 b3216 1=1+0) (iauth excl) (ifile excl->sync) (ixattr excl) (iversion lock) cr={4103=0-4194304@1} caps={4103=-/pAsxXsxFsxcrwb@3},l=4103(-1) | caps authpin 0xe66dd40]
*** Caught signal (Segmentation fault) **
 in thread 7fa5a5d45700
 ceph version 0.38-204-g9920a16 (commit:9920a168c59807083019c62fdf381434edea12e5)
 1: /tmp/cephtest/binary/usr/local/bin/ceph-mds() [0x913894]
 2: (()+0xfb40) [0x7fa5ab1c7b40]
 3: (CInode::make_path_string(std::string&, bool, CDentry*)+0x1d) [0x726f0d]
 4: (CDentry::make_path_string(std::string&)+0x30) [0x6e9130]
 5: (CInode::make_path_string(std::string&, bool, CDentry*)+0x44) [0x726f34]
 6: (CDentry::make_path_string(std::string&)+0x30) [0x6e9130]
 7: (CInode::make_path_string(std::string&, bool, CDentry*)+0x44) [0x726f34]
 8: (CDentry::make_path_string(std::string&)+0x30) [0x6e9130]
 9: (CInode::make_path_string(std::string&, bool, CDentry*)+0x44) [0x726f34]
 10: (CDentry::make_path_string(std::string&)+0x30) [0x6e9130]
 11: (CInode::make_path_string(std::string&, bool, CDentry*)+0x44) [0x726f34]
 12: (CDentry::make_path_string(std::string&)+0x30) [0x6e9130]
 13: (CInode::make_path_string(std::string&, bool, CDentry*)+0x44) [0x726f34]
 14: (CDentry::make_path_string(std::string&)+0x30) [0x6e9130]
 15: (CInode::make_path_string(std::string&, bool, CDentry*)+0x44) [0x726f34]
 16: (CInode::make_path_string_projected(std::string&)+0x2c) [0x73854c]
 17: (operator<<(std::ostream&, CInode&)+0x32) [0x738822]
 18: (CInode::print(std::ostream&)+0x1a) [0x73fdea]
 19: (Locker::simple_eval(SimpleLock*, bool*)+0x10b) [0x6769db]
 20: (Locker::eval(SimpleLock*, bool*)+0x2a) [0x67716a]
 21: (Locker::eval(CInode*, int)+0x8ed) [0x682add]
 22: (Locker::try_eval(MDSCacheObject*, int)+0x50b) [0x684f9b]
 23: (Locker::revoke_stale_caps(Session*)+0x35d) [0x685c0d]
 24: (Server::find_idle_sessions()+0x891) [0x528c01]
 25: (MDS::tick()+0x470) [0x4abfc0]
 26: (MDS::C_MDS_Tick::finish(int)+0x24) [0x4df9f4]
 27: (SafeTimer::timer_thread()+0x4b0) [0x886510]
 28: (SafeTimerThread::entry()+0x15) [0x88a7b5]
 29: (Thread::_entry_func(void*)+0x12) [0x812542]
 30: (()+0x7971) [0x7fa5ab1bf971]
 31: (clone()+0x6d) [0x7fa5a9a4e92d]
~

We crash because a CDir is zeroed out in memory:

$38 = (CDentry * const) 0x11bc1720
(gdb) p this->dir
$39 = (CDir *) 0x1ce6f80
(gdb) p this->dir->inode
$40 = (CInode *) 0x0

(in fact, all of *this->dir is zeros.

The dentry is #1/client.0/tmp/usr.2 and the dir is /client.0/tmp, which you'll notice was just successfully printed in the previous line of the log.

Looking through the prior simple_sync() call, and the call chain leading up to the crash (we just finished an eval on authlock and are doing linklock now), I don't see anything that could trigger a close_dirfrag.

Going to bump the logging up to 20 in the hopes that that will have a bit more info, but I suspect something else ugly is going on. May need to run this workload through valgrind?

Actions #6

Updated by Sage Weil over 12 years ago

  • Assignee deleted (Sage Weil)
Actions #7

Updated by Sage Weil over 12 years ago

  • Target version changed from v0.39 to v0.40
Actions #8

Updated by Sage Weil over 12 years ago

  • Translation missing: en.field_position deleted (37)
  • Translation missing: en.field_position set to 1043
Actions #9

Updated by Josh Durgin over 12 years ago

Not sure if this is the same underlying problem, but here's another CInode::authority crash from teuthology:~teut/log/mds.0.log.gz during the locking test:

2011-12-29 13:21:51.535061 2011-12-29 13:21:51.689910 7fd8f6b61700 mds.0.1 ms_handle_reset on 10.3.14.170:0/923031963
2011-12-29 13:21:54.617497 7fd8f6b61700 mds.0.1 ms_handle_reset on 10.3.14.174:0/113717421
*** Caught signal (Segmentation fault) **
 in thread 7fd8f6b61700
 ceph version 0.39-171-gdcedda8 (commit:dcedda84d0e1f69af985c301276c67c1b11e7efc)
 1: /tmp/cephtest/binary/usr/local/bin/ceph-mds() [0x917f54]
 2: (()+0xfb40) [0x7fd8fa5deb40]
 3: (CInode::authority()+0x46) [0x71dac6]
 4: (Locker::try_eval(SimpleLock*, bool*)+0x2a) [0x677b9a]
 5: (Locker::wrlock_finish(SimpleLock*, Mutation*, bool*)+0x384) [0x681664]
 6: (Locker::_drop_non_rdlocks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x19d) [0x681ced]
 7: (Locker::drop_locks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x94) [0x6827d4]
 8: (Locker::file_update_finish(CInode*, Mutation*, bool, client_t, Capability*, MClientCaps*)+0x1ce) [0x688c9e]
 9: (C_Locker_FileUpdate_finish::finish(int)+0x34) [0x69e9c4]
 10: (Context::complete(int)+0x12) [0x49f962]
 11: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x14e) [0x7e34be]
 12: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x206) [0x7d74e6]
 13: (Journaler::C_Flush::finish(int)+0x1d) [0x7e36ed]
 14: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xd6d) [0x7b35dd]
 15: (MDS::handle_core_message(Message*)+0xebf) [0x4cbeaf]
 16: (MDS::_dispatch(Message*)+0x3c) [0x4cc00c]
 17: (MDS::ms_dispatch(Message*)+0xa5) [0x4ce605]
 18: (SimpleMessenger::dispatch_entry()+0x99a) [0x81b77a]
 19: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x4969bc]
 20: (Thread::_entry_func(void*)+0x12) [0x8162c2]
 21: (()+0x7971) [0x7fd8fa5d6971]
 22: (clone()+0x6d) [0x7fd8f8e6592d]
Actions #10

Updated by Mark Nelson over 12 years ago

This probably isn't all that useful for anyone who knows the code well, but I threw together a quick run down of places where close_dirfrags gets called while browsing through the code. Like Sage said might be best to just try running the mds through valgrind and see if anything turns up.

Actions #11

Updated by Sage Weil over 12 years ago

hit this again:


2012-01-06T20:22:15.808 DEBUG:teuthology.run_tasks:Unwinding manager <contextlib.GeneratorContextManager object at 0x26fc050>
2012-01-06T20:22:15.808 INFO:teuthology.task.ceph:Shutting down mds daemons...
2012-01-06T20:22:15.810 INFO:teuthology.task.ceph.mds.0.err:*** Caught signal (Terminated) **
2012-01-06T20:22:15.810 INFO:teuthology.task.ceph.mds.0.err: in thread 7f4e88892780. Shutting down.
2012-01-06T20:22:15.815 INFO:teuthology.task.ceph.mds.0.err:*** Caught signal (Segmentation fault) **
2012-01-06T20:22:15.815 INFO:teuthology.task.ceph.mds.0.err: in thread 7f4e849f5700
2012-01-06T20:22:15.817 INFO:teuthology.task.ceph.mds.0.err: ceph version 0.39-263-g3c60e80 (commit:3c60e8046d0e64c0df01a6fced0d65f9788da8d8)
2012-01-06T20:22:15.817 INFO:teuthology.task.ceph.mds.0.err: 1: /tmp/cephtest/binary/usr/local/bin/ceph-mds() [0x918ef4]
2012-01-06T20:22:15.817 INFO:teuthology.task.ceph.mds.0.err: 2: (()+0xfb40) [0x7f4e88472b40]
2012-01-06T20:22:15.818 INFO:teuthology.task.ceph.mds.0.err: 3: (CInode::authority()+0x46) [0x71dbc6]
2012-01-06T20:22:15.818 INFO:teuthology.task.ceph.mds.0.err: 4: (Locker::try_eval(SimpleLock*, bool*)+0x2a) [0x677c9a]
2012-01-06T20:22:15.818 INFO:teuthology.task.ceph.mds.0.err: 5: (Locker::wrlock_finish(SimpleLock*, Mutation*, bool*)+0x384) [0x681764]
2012-01-06T20:22:15.818 INFO:teuthology.task.ceph.mds.0.err: 6: (Locker::_drop_non_rdlocks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x19d) [0x681ded]
2012-01-06T20:22:15.818 INFO:teuthology.task.ceph.mds.0.err: 7: (Locker::drop_locks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x94) [0x6828d4]
2012-01-06T20:22:15.819 INFO:teuthology.task.ceph.mds.0.err: 8: (Locker::file_update_finish(CInode*, Mutation*, bool, client_t, Capability*, MClientCaps*)+0x1ce) [0x688d9e]
2012-01-06T20:22:15.819 INFO:teuthology.task.ceph.mds.0.err: 9: (C_Locker_FileUpdate_finish::finish(int)+0x34) [0x69eac4]
2012-01-06T20:22:15.819 INFO:teuthology.task.ceph.mds.0.err: 10: (Context::complete(int)+0x12) [0x49fa72]
2012-01-06T20:22:15.819 INFO:teuthology.task.ceph.mds.0.err: 11: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x14e) [0x7e35de]
2012-01-06T20:22:15.819 INFO:teuthology.task.ceph.mds.0.err: 12: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x206) [0x7d7606]
2012-01-06T20:22:15.819 INFO:teuthology.task.ceph.mds.0.err: 13: (Journaler::C_Flush::finish(int)+0x1d) [0x7e380d]
2012-01-06T20:22:15.820 INFO:teuthology.task.ceph.mds.0.err: 14: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xd6f) [0x7b36ff]
2012-01-06T20:22:15.820 INFO:teuthology.task.ceph.mds.0.err: 15: (MDS::handle_core_message(Message*)+0xebf) [0x4cbf8f]
2012-01-06T20:22:15.820 INFO:teuthology.task.ceph.mds.0.err: 16: (MDS::_dispatch(Message*)+0x3c) [0x4cc0ec]
2012-01-06T20:22:15.820 INFO:teuthology.task.ceph.mds.0.err: 17: (MDS::ms_dispatch(Message*)+0xa5) [0x4ce6e5]
2012-01-06T20:22:15.834 INFO:teuthology.task.ceph.mds.0.err: 18: (SimpleMessenger::dispatch_entry()+0x99a) [0x81c2da]
2012-01-06T20:22:15.835 INFO:teuthology.task.ceph.mds.0.err: 19: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x496acc]
2012-01-06T20:22:15.835 INFO:teuthology.task.ceph.mds.0.err: 20: (Thread::_entry_func(void*)+0x12) [0x816e22]
2012-01-06T20:22:15.835 INFO:teuthology.task.ceph.mds.0.err: 21: (()+0x7971) [0x7f4e8846a971]
2012-01-06T20:22:15.835 INFO:teuthology.task.ceph.mds.0.err: 22: (clone()+0x6d) [0x7f4e86cf992d]
2012-01-06T20:22:15.841 INFO:teuthology.task.ceph.mds.0.err:daemon-helper: command crashed with signal 11

on job

  kernel:
    branch: master
  nuke-on-error: true
  overrides:
    ceph:
      branch: testing
      btrfs: 1
      coverage: true
      log-whitelist:
      - clocks not synchronized
  roles:
  - - mon.0
    - mds.0
    - osd.0
    - osd.1
  - - mon.1
    - client.1
  - - mon.2
    - client.0
  tasks:
  - ceph: null
  - kclient: null
  - locktest:
    - client.0
    - client.1

ubuntu@teuthology:/var/lib/teuthworker/archive/testing-2012-01-06/6533

Actions #12

Updated by Sage Weil over 12 years ago

  • Priority changed from High to Normal
Actions #13

Updated by Sage Weil over 12 years ago

  • Target version deleted (v0.40)
  • Translation missing: en.field_position deleted (1090)
  • Translation missing: en.field_position set to 216
Actions #14

Updated by Sage Weil over 12 years ago

happened again on /var/lib/teuthworker/archive/nightly_coverage_2012-01-13-a/7335

2012-01-13T02:51:06.298 INFO:teuthology.task.ceph:Shutting down mds daemons...
2012-01-13T02:51:06.300 INFO:teuthology.task.ceph.mds.0.err:*** Caught signal (Terminated) **
2012-01-13T02:51:06.300 INFO:teuthology.task.ceph.mds.0.err: in thread 7fe6353cc780. Shutting down.
2012-01-13T02:51:06.311 INFO:teuthology.task.ceph.mds.0.err:*** Caught signal (Segmentation fault) **
2012-01-13T02:51:06.311 INFO:teuthology.task.ceph.mds.0.err: in thread 7fe63152f700
2012-01-13T02:51:06.313 INFO:teuthology.task.ceph.mds.0.err: ceph version 0.39-323-g845aa53 (commit:845aa534e3e0ddc4f652879c473f011fff9c573b)
2012-01-13T02:51:06.313 INFO:teuthology.task.ceph.mds.0.err: 1: /tmp/cephtest/binary/usr/local/bin/ceph-mds() [0x91cfe4]
2012-01-13T02:51:06.313 INFO:teuthology.task.ceph.mds.0.err: 2: (()+0xfb40) [0x7fe634facb40]
2012-01-13T02:51:06.313 INFO:teuthology.task.ceph.mds.0.err: 3: (CInode::authority()+0x46) [0x71e706]
2012-01-13T02:51:06.313 INFO:teuthology.task.ceph.mds.0.err: 4: (Locker::try_eval(SimpleLock*, bool*)+0x2a) [0x6787da]
2012-01-13T02:51:06.314 INFO:teuthology.task.ceph.mds.0.err: 5: (Locker::wrlock_finish(SimpleLock*, Mutation*, bool*)+0x384) [0x6822a4]
2012-01-13T02:51:06.314 INFO:teuthology.task.ceph.mds.0.err: 6: (Locker::_drop_non_rdlocks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x19d) [0x68292d]
2012-01-13T02:51:06.314 INFO:teuthology.task.ceph.mds.0.err: 7: (Locker::drop_locks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x94) [0x683414]
2012-01-13T02:51:06.314 INFO:teuthology.task.ceph.mds.0.err: 8: (Locker::file_update_finish(CInode*, Mutation*, bool, client_t, Capability*, MClientCaps*)+0x1ce) [0x6898de]
2012-01-13T02:51:06.314 INFO:teuthology.task.ceph.mds.0.err: 9: (C_Locker_FileUpdate_finish::finish(int)+0x34) [0x69f604]
2012-01-13T02:51:06.315 INFO:teuthology.task.ceph.mds.0.err: 10: (Context::complete(int)+0x12) [0x4a0252]
2012-01-13T02:51:06.315 INFO:teuthology.task.ceph.mds.0.err: 11: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x14e) [0x7e42ce]
2012-01-13T02:51:06.315 INFO:teuthology.task.ceph.mds.0.err: 12: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x206) [0x7d82f6]
2012-01-13T02:51:06.315 INFO:teuthology.task.ceph.mds.0.err: 13: (Journaler::C_Flush::finish(int)+0x1d) [0x7e44fd]
2012-01-13T02:51:06.315 INFO:teuthology.task.ceph.mds.0.err: 14: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xeb3) [0x7b4393]
2012-01-13T02:51:06.316 INFO:teuthology.task.ceph.mds.0.err: 15: (MDS::handle_core_message(Message*)+0xedf) [0x4c418f]
2012-01-13T02:51:06.316 INFO:teuthology.task.ceph.mds.0.err: 16: (MDS::_dispatch(Message*)+0x3c) [0x4c6cdc]
2012-01-13T02:51:06.316 INFO:teuthology.task.ceph.mds.0.err: 17: (MDS::ms_dispatch(Message*)+0xa9) [0x4c92c9]
2012-01-13T02:51:06.316 INFO:teuthology.task.ceph.mds.0.err: 18: (SimpleMessenger::dispatch_entry()+0x99a) [0x81ceba]
2012-01-13T02:51:06.316 INFO:teuthology.task.ceph.mds.0.err: 19: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x4972ac]
2012-01-13T02:51:06.316 INFO:teuthology.task.ceph.mds.0.err: 20: (Thread::_entry_func(void*)+0x12) [0x817a02]
2012-01-13T02:51:06.317 INFO:teuthology.task.ceph.mds.0.err: 21: (()+0x7971) [0x7fe634fa4971]
2012-01-13T02:51:06.317 INFO:teuthology.task.ceph.mds.0.err: 22: (clone()+0x6d) [0x7fe63383392d]
2012-01-13T02:51:06.464 INFO:teuthology.task.ceph.mds.0.err:daemon-helper: command crashed with signal 11

  kernel:
    sha1: 28fe722b3fbdd8f891ef7c07151b1272f8e936f2
  nuke-on-error: true
  overrides:
    ceph:
      btrfs: 1
      coverage: true
      log-whitelist:
      - clocks not synchronized
      sha1: 845aa534e3e0ddc4f652879c473f011fff9c573b
  roles:
  - - mon.0
    - mon.1
    - mon.2
    - mds.0
    - osd.0
    - osd.1
  - - client.1
  - - client.0
  tasks:
  - chef: null
  - ceph: null
  - kclient: null
  - locktest:
    - client.0
    - client.1

Actions #15

Updated by Sage Weil about 12 years ago

  • Status changed from New to Resolved

calling this resolved too.

Actions #16

Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Actions

Also available in: Atom PDF