Project

General

Profile

Bug #1549

mds: zeroed root CDir* vtable in scatter_writebehind_finish

Added by Josh Durgin over 12 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Logs are in teuthology:~teuthworker/archive/nightly_coverage_2011-09-20/342/

2011-09-20T11:47:50.064 INFO:teuthology.task.ceph.mds.0.err:*** Caught signal (Segmentation fault) **
2011-09-20T11:47:50.064 INFO:teuthology.task.ceph.mds.0.err: in thread 0x7f089d459700
2011-09-20T11:47:50.066 INFO:teuthology.task.ceph.mds.0.err: ceph version 0.34-549-gd64237a (commit:d64237a6a555944d6d35676490bc4fb7c7db965d)
2011-09-20T11:47:50.066 INFO:teuthology.task.ceph.mds.0.err: 1: /tmp/cephtest/binary/usr/local/bin/cmds() [0x8ec204]
2011-09-20T11:47:50.066 INFO:teuthology.task.ceph.mds.0.err: 2: (()+0xfb40) [0x7f08a0cd1b40]
2011-09-20T11:47:50.066 INFO:teuthology.task.ceph.mds.0.err: 3: (Mutation::drop_local_auth_pins()+0x39) [0x5920a9]
2011-09-20T11:47:50.066 INFO:teuthology.task.ceph.mds.0.err: 4: (Mutation::cleanup()+0x11) [0x592fa1]
2011-09-20T11:47:50.067 INFO:teuthology.task.ceph.mds.0.err: 5: (Locker::scatter_writebehind_finish(ScatterLock*, Mutation*)+0x1f5) [0x68d785]
2011-09-20T11:47:50.067 INFO:teuthology.task.ceph.mds.0.err: 6: (Locker::C_Locker_ScatterWB::finish(int)+0x1d) [0x69a23d]
2011-09-20T11:47:50.067 INFO:teuthology.task.ceph.mds.0.err: 7: (Context::complete(int)+0x12) [0x49b862]
2011-09-20T11:47:50.067 INFO:teuthology.task.ceph.mds.0.err: 8: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x14e) [0x7eadbe]
2011-09-20T11:47:50.067 INFO:teuthology.task.ceph.mds.0.err: 9: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x206) [0x7e2a16]
2011-09-20T11:47:50.068 INFO:teuthology.task.ceph.mds.0.err: 10: (Journaler::C_Flush::finish(int)+0x1d) [0x7eafcd]
2011-09-20T11:47:50.068 INFO:teuthology.task.ceph.mds.0.err: 11: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xd8a) [0x7b160a]
2011-09-20T11:47:50.068 INFO:teuthology.task.ceph.mds.0.err: 12: (MDS::handle_core_message(Message*)+0xedf) [0x4c5fff]
2011-09-20T11:47:50.068 INFO:teuthology.task.ceph.mds.0.err: 13: (MDS::_dispatch(Message*)+0x3c) [0x4c615c]
2011-09-20T11:47:50.068 INFO:teuthology.task.ceph.mds.0.err: 14: (MDS::ms_dispatch(Message*)+0x97) [0x4c8697]
2011-09-20T11:47:50.069 INFO:teuthology.task.ceph.mds.0.err: 15: (SimpleMessenger::dispatch_entry()+0x9d2) [0x822012]
2011-09-20T11:47:50.069 INFO:teuthology.task.ceph.mds.0.err: 16: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x492b2c]
2011-09-20T11:47:50.069 INFO:teuthology.task.ceph.mds.0.err: 17: (Thread::_entry_func(void*)+0x12) [0x816052]
2011-09-20T11:47:50.069 INFO:teuthology.task.ceph.mds.0.err: 18: (()+0x7971) [0x7f08a0cc9971]
2011-09-20T11:47:50.069 INFO:teuthology.task.ceph.mds.0.err: 19: (clone()+0x6d) [0x7f089f75d92d]

History

#1 Updated by Sage Weil over 12 years ago

Grr, I ran a loop on #1464 for days and wasn't able to hit this. Want to see the mds log to see how we got into this corner.

#2 Updated by Sage Weil over 12 years ago

{CDentry,CInode,CDir}::auth_pin() pin the object too, so i'm not sure how we can have a use-after-free in the code that is dropping auth pins.

#3 Updated by Sage Weil over 12 years ago

  • Target version changed from v0.37 to v0.38

#4 Updated by Josh Durgin over 12 years ago

This happened again today after fsstress. From teuthology:~teuthworker/archive/nightly_coverage_2011-10-27/1083/teuthology.log:

2011-10-27T01:36:40.407 INFO:teuthology.task.ceph:Shutting down mds daemons...
2011-10-27T01:36:40.412 INFO:teuthology.task.ceph.mds.0.err:*** Caught signal (Segmentation fault) **
2011-10-27T01:36:40.412 INFO:teuthology.task.ceph.mds.0.err: in thread 7fe7a7ffc700
2011-10-27T01:36:40.414 INFO:teuthology.task.ceph.mds.0.err: ceph version 0.37-190-g11691a7 (commit:11691a7111d7329a6d11e25ad19005e3824e9dbb)
2011-10-27T01:36:40.414 INFO:teuthology.task.ceph.mds.0.err: 1: /tmp/cephtest/binary/usr/local/bin/ceph-mds() [0x90d164]
2011-10-27T01:36:40.414 INFO:teuthology.task.ceph.mds.0.err: 2: (()+0xfb40) [0x7fe7ab874b40]
2011-10-27T01:36:40.414 INFO:teuthology.task.ceph.mds.0.err: 3: (Mutation::drop_local_auth_pins()+0x39) [0x5952c9]
2011-10-27T01:36:40.415 INFO:teuthology.task.ceph.mds.0.err: 4: (Mutation::cleanup()+0x11) [0x5961c1]
2011-10-27T01:36:40.415 INFO:teuthology.task.ceph.mds.0.err: 5: (Locker::scatter_writebehind_finish(ScatterLock*, Mutation*)+0x1f5) [0x690af5]
2011-10-27T01:36:40.415 INFO:teuthology.task.ceph.mds.0.err: 6: (Locker::C_Locker_ScatterWB::finish(int)+0x1d) [0x69d50d]
2011-10-27T01:36:40.415 INFO:teuthology.task.ceph.mds.0.err: 7: (Context::complete(int)+0x12) [0x49e402]
2011-10-27T01:36:40.415 INFO:teuthology.task.ceph.mds.0.err: 8: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x14e) [0x7e2a6e]
2011-10-27T01:36:40.415 INFO:teuthology.task.ceph.mds.0.err: 9: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x206) [0x7d6ac6]
2011-10-27T01:36:40.415 INFO:teuthology.task.ceph.mds.0.err: 10: (Journaler::C_Flush::finish(int)+0x1d) [0x7e2c9d]
2011-10-27T01:36:40.415 INFO:teuthology.task.ceph.mds.0.err: 11: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0xf39) [0x7b2549]
2011-10-27T01:36:40.416 INFO:teuthology.task.ceph.mds.0.err: 12: (MDS::handle_core_message(Message*)+0xecf) [0x4ca93f]
2011-10-27T01:36:40.416 INFO:teuthology.task.ceph.mds.0.err: 13: (MDS::_dispatch(Message*)+0x3c) [0x4caa9c]
2011-10-27T01:36:40.416 INFO:teuthology.task.ceph.mds.0.err: 14: (MDS::ms_dispatch(Message*)+0xa9) [0x4cd089]
2011-10-27T01:36:40.416 INFO:teuthology.task.ceph.mds.0.err: 15: (SimpleMessenger::dispatch_entry()+0x9c2) [0x817ad2]
2011-10-27T01:36:40.416 INFO:teuthology.task.ceph.mds.0.err: 16: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x49551c]
2011-10-27T01:36:40.416 INFO:teuthology.task.ceph.mds.0.err: 17: (Thread::_entry_func(void*)+0x12) [0x8126a2]
2011-10-27T01:36:40.416 INFO:teuthology.task.ceph.mds.0.err: 18: (()+0x7971) [0x7fe7ab86c971]
2011-10-27T01:36:40.417 INFO:teuthology.task.ceph.mds.0.err: 19: (clone()+0x6d) [0x7fe7aa30092d]

#5 Updated by Sage Weil over 12 years ago

  • Status changed from New to Need More Info
  • Assignee set to Sage Weil

bleh. need logs... i'll start this up in a loop again.

#6 Updated by Sage Weil over 12 years ago

  • Target version changed from v0.38 to v0.39

#7 Updated by Josh Durgin over 12 years ago

This happened after the misc workunit today.

#8 Updated by Sage Weil over 12 years ago

  • Assignee deleted (Sage Weil)

Someone needs to try to reproduce this with logs. fwiw metropolis:~sage/src/teuthology/hammer.sh is what i've been using.

#9 Updated by Anonymous over 12 years ago

This happened again on 11/16, 2056 kclient_workunit_kernel_untar_build
2011-11-16T00:36:30.996 INFO:teuthology.task.ceph.mds.0.err:*** Caught signal (Segmentation fault) *
2011-11-16T00:36:30.997 INFO:teuthology.task.ceph.mds.0.err: in thread 7fbe995ef700
2011-11-16T00:36:30.998 INFO:teuthology.task.ceph.mds.0.err: ceph version 0.38-181-g2e19550 (commit:2e195500b5d3a8ab8512bcf2a219a6b7ff922c97)
2011-11-16T00:36:30.999 INFO:teuthology.task.ceph.mds.0.err: 1: /tmp/cephtest/binary/usr/local/bin/ceph-mds() [0x913774]
2011-11-16T00:36:30.999 INFO:teuthology.task.ceph.mds.0.err: 2: (()+0xfb40) [0x7fbe9d06cb40]
2011-11-16T00:36:30.999 INFO:teuthology.task.ceph.mds.0.err: 3: (Mutation::drop_local_auth_pins()+0x39) [0x5962a9]
2011-11-16T00:36:30.999 INFO:teuthology.task.ceph.mds.0.err: 4: (Mutation::cleanup()+0x11) [0x5971a1]
2011-11-16T00:36:30.999 INFO:teuthology.task.ceph.mds.0.err: 5: (Locker::scatter_writebehind_finish(ScatterLock
, Mutation*)+0x1f5) [0x691aa5]
2011-11-16T00:36:31.000 INFO:teuthology.task.ceph.mds.0.err: 6: (Locker::C_Locker_ScatterWB::finish(int)+0x1d) [0x69e4cd]
2011-11-16T00:36:31.000 INFO:teuthology.task.ceph.mds.0.err: 7: (Context::complete(int)+0x12) [0x49f1f2]
2011-11-16T00:36:31.000 INFO:teuthology.task.ceph.mds.0.err: 8: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x14e) [0x7e23ae]
2011-11-16T00:36:31.000 INFO:teuthology.task.ceph.mds.0.err: 9: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x206) [0x7d6406]
2011-11-16T00:36:31.001 INFO:teuthology.task.ceph.mds.0.err: 10: (Journaler::C_Flush::finish(int)+0x1d) [0x7e25dd]
2011-11-16T00:36:31.001 INFO:teuthology.task.ceph.mds.0.err: 11: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x100c) [0x7b43ac]
2011-11-16T00:36:31.001 INFO:teuthology.task.ceph.mds.0.err: 12: (MDS::handle_core_message(Message*)+0xebf) [0x4cb72f]
2011-11-16T00:36:31.001 INFO:teuthology.task.ceph.mds.0.err: 13: (MDS::_dispatch(Message*)+0x3c) [0x4cb88c]
2011-11-16T00:36:31.001 INFO:teuthology.task.ceph.mds.0.err: 14: (MDS::ms_dispatch(Message*)+0xa5) [0x4cde85]
2011-11-16T00:36:31.002 INFO:teuthology.task.ceph.mds.0.err: 15: (SimpleMessenger::dispatch_entry()+0x99a) [0x8178fa]
2011-11-16T00:36:31.002 INFO:teuthology.task.ceph.mds.0.err: 16: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x49630c]
2011-11-16T00:36:31.002 INFO:teuthology.task.ceph.mds.0.err: 17: (Thread::_entry_func(void*)+0x12) [0x812442]
2011-11-16T00:36:31.002 INFO:teuthology.task.ceph.mds.0.err: 18: (()+0x7971) [0x7fbe9d064971]
2011-11-16T00:36:31.002 INFO:teuthology.task.ceph.mds.0.err: 19: (clone()+0x6d) [0x7fbe9b8f392d]
2011-11-16T00:36:31.366 INFO:teuthology.task.ceph.mds.0.err:daemon-helper: command crashed with signal 11

#10 Updated by Sage Weil over 12 years ago

  • Status changed from Need More Info to In Progress
  • Assignee set to Sage Weil
  • Priority changed from Normal to High

#11 Updated by Sage Weil over 12 years ago

  • translation missing: en.field_position set to 6

#12 Updated by Sage Weil over 12 years ago

  • Target version changed from v0.39 to v0.40

#13 Updated by Sage Weil over 12 years ago

Happened twice today:

#0  0x00007f20be7fba0b in raise () from /lib/libpthread.so.0
#1  0x0000000000916a4b in reraise_fatal (signum=3392) at global/signal_handler.cc:59
#2  0x000000000091722c in handle_fatal_signal (signum=<value optimized out>) at global/signal_handler.cc:106
#3  <signal handler called>
#4  0x0000000000596929 in Mutation::drop_local_auth_pins (this=0x2df12a00) at mds/Mutation.cc:91
#5  0x0000000000597821 in Mutation::cleanup (this=0x225a000) at mds/Mutation.cc:163
#6  0x0000000000682765 in Locker::scatter_writebehind_finish (this=0x2214a00, lock=0x22497d0, mut=0x2df12a00) at mds/Locker.cc:3625
#7  0x000000000069e64d in Locker::C_Locker_ScatterWB::finish(int) ()
#8  0x000000000049f7d2 in Context::complete (this=0x225a000, r=770779648) at ./include/Context.h:41
#9  0x00000000007e28ae in finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int) ()
#10 0x00000000007d6906 in Journaler::_finish_flush (this=0x2243000, r=<value optimized out>, start=788596852, stamp=<value optimized out>) at osdc/Journaler.cc:419
#11 0x00000000007e2add in Journaler::C_Flush::finish(int) ()
#12 0x00000000007b780e in Objecter::handle_osd_op_reply (this=0x2232000, m=0x538a1c0) at osdc/Objecter.cc:1205
#13 0x00000000004cbd0f in MDS::handle_core_message (this=0x2225a00, m=0x538a1c0) at mds/MDS.cc:1695
#14 0x00000000004cbe6c in MDS::_dispatch (this=0x2225a00, m=0x538a1c0) at mds/MDS.cc:1818
#15 0x00000000004ce465 in MDS::ms_dispatch (this=0x2225a00, m=0x538a1c0) at mds/MDS.cc:1631
#16 0x000000000081ab5a in ms_deliver_dispatch (this=0x2225000) at msg/Messenger.h:102
#17 SimpleMessenger::dispatch_entry (this=0x2225000) at msg/SimpleMessenger.cc:358
#18 0x000000000049684c in SimpleMessenger::DispatchThread::entry (this=0x2225488) at ./msg/SimpleMessenger.h:549
#19 0x00000000008156a2 in Thread::_entry_func (arg=0x225a000) at common/Thread.cc:41
#20 0x00007f20be7f3971 in start_thread () from /lib/libpthread.so.0
#21 0x00007f20bd08292d in clone () from /lib/libc.so.6
#22 0x0000000000000000 in ?? ()

or
#0  0x00007f539323ea0b in raise () from /lib/libpthread.so.0
#1  0x0000000000916a4b in reraise_fatal (signum=11357) at global/signal_handler.cc:59
#2  0x000000000091722c in handle_fatal_signal (signum=<value optimized out>) at global/signal_handler.cc:106
#3  <signal handler called>
#4  0x000000000074046f in CInode::finish_scatter_gather_update (this=0x1082000, type=1024) at mds/CInode.cc:1724
#5  0x0000000000670ebe in Locker::scatter_writebehind (this=0x104ea00, lock=0x10827d0) at mds/Locker.cc:3588
#6  0x0000000000671952 in Locker::simple_lock (this=0x104ea00, lock=0x10827d0, need_issue=0x7f538f7bf9ff) at mds/Locker.cc:3471
#7  0x0000000000676d3f in Locker::scatter_eval (this=0x104ea00, lock=0x10827d0, need_issue=0x7f538f7bf9ff) at mds/Locker.cc:3665
#8  0x000000000067782d in Locker::eval (this=0x1093000, lock=0x400, need_issue=0x0) at mds/Locker.cc:971
#9  0x0000000000678070 in Locker::try_eval (this=0x104ea00, lock=0x10827d0, pneed_issue=0x7f538f7bf9ff) at mds/Locker.cc:915
#10 0x000000000067c431 in Locker::eval_gather (this=0x104ea00, lock=0x10827d0, first=<value optimized out>, pneed_issue=<value optimized out>, pfinishers=<value optimized out>) at mds/Locker.cc:751
#11 0x000000000068140d in Locker::wrlock_finish (this=0x104ea00, lock=0x10827d0, mut=0x19e7a00, pneed_issue=<value optimized out>) at mds/Locker.cc:1259
#12 0x00000000006819bd in Locker::_drop_non_rdlocks (this=0x104ea00, mut=0x19e7a00, pneed_issue=0x7f538f7bfb20) at mds/Locker.cc:491
#13 0x00000000006824a4 in Locker::drop_locks (this=0x104ea00, mut=0x19e7a00, pneed_issue=0x7f538f7bfb20) at mds/Locker.cc:524
#14 0x0000000000682755 in Locker::scatter_writebehind_finish (this=0x104ea00, lock=0x10827d0, mut=0x19e7a00) at mds/Locker.cc:3624
#15 0x000000000069e64d in Locker::C_Locker_ScatterWB::finish(int) ()
#16 0x000000000049f7d2 in Context::complete (this=0x1093000, r=1024) at ./include/Context.h:41
#17 0x00000000007e28ae in finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int) ()
#18 0x00000000007d6906 in Journaler::_finish_flush (this=0x107b000, r=<value optimized out>, start=12757588, stamp=<value optimized out>) at osdc/Journaler.cc:419
#19 0x00000000007e2add in Journaler::C_Flush::finish(int) ()
#20 0x00000000007b780e in Objecter::handle_osd_op_reply (this=0x106c000, m=0x1075a80) at osdc/Objecter.cc:1205
#21 0x00000000004cbd0f in MDS::handle_core_message (this=0x105fa00, m=0x1075a80) at mds/MDS.cc:1695
#22 0x00000000004cbe6c in MDS::_dispatch (this=0x105fa00, m=0x1075a80) at mds/MDS.cc:1818
#23 0x00000000004ce465 in MDS::ms_dispatch (this=0x105fa00, m=0x1075a80) at mds/MDS.cc:1631
#24 0x000000000081ab5a in ms_deliver_dispatch (this=0x105f000) at msg/Messenger.h:102
#25 SimpleMessenger::dispatch_entry (this=0x105f000) at msg/SimpleMessenger.cc:358
#26 0x000000000049684c in SimpleMessenger::DispatchThread::entry (this=0x105f488) at ./msg/SimpleMessenger.h:549
#27 0x00000000008156a2 in Thread::_entry_func (arg=0x1093000) at common/Thread.cc:41
#28 0x00007f5393236971 in start_thread () from /lib/libpthread.so.0
#29 0x00007f5391ac592d in clone () from /lib/libc.so.6

In both cases, we crash calling a method on a CDir* that has a zeroed vtable. In both cases, dir->inode->inode.ino == 1.

#14 Updated by Sage Weil over 12 years ago

  • Subject changed from mds segfault after trivial sync workunit on cfuse to mds: zeroed root CDir* vtable in scatter_writebehind_finish

#15 Updated by Sage Weil over 12 years ago

the tasks were in nightly_coverage_2011-11-30-a

3433: collection:basic clusters:fixed-3.yaml tasks:kclient_workunit_kernel_untar_build.yaml
3435: collection:basic clusters:fixed-3.yaml tasks:kclient_workunit_suites_ffsb.yaml

#16 Updated by Sage Weil over 12 years ago

  • translation missing: en.field_position deleted (28)
  • translation missing: en.field_position set to 1039

#17 Updated by Sage Weil over 12 years ago

  • Assignee deleted (Sage Weil)

I think the next step here is to run the mds under valgrind.

#18 Updated by Josh Durgin over 12 years ago

Happened again in teuthology:teuthworker~/log/mds.0.log.gz

#19 Updated by Sage Weil over 12 years ago

  • Status changed from In Progress to Need More Info

#21 Updated by Sage Weil about 12 years ago

hit this again, nightly_coverage_2011-12-29-b/5388

  - kclient: null
  - locktest:
    - client.0
    - client.1

#22 Updated by Sage Weil about 12 years ago

  • Priority changed from High to Normal

#23 Updated by Sage Weil about 12 years ago

  • Target version deleted (v0.40)
  • translation missing: en.field_position deleted (1087)
  • translation missing: en.field_position set to 101

#25 Updated by Sage Weil about 12 years ago

again,


2012-01-27T15:46:22.731 INFO:teuthology.task.ceph:Shutting down mds daemons...
2012-01-27T15:46:22.733 INFO:teuthology.task.ceph.mds.a.err:*** Caught signal (Terminated) **
2012-01-27T15:46:22.733 INFO:teuthology.task.ceph.mds.a.err: in thread 7fe843b50780. Shutting down.
2012-01-27T15:46:22.749 INFO:teuthology.task.ceph.mds.a.err:*** Caught signal (Segmentation fault) **
2012-01-27T15:46:22.749 INFO:teuthology.task.ceph.mds.a.err: in thread 7fe83fcb2700
2012-01-27T15:46:22.755 INFO:teuthology.task.ceph.mds.a.err: ceph version 0.40-242-g374fec4 (commit:374fec47253bad511eee52d372f182402fb17b1a)
2012-01-27T15:46:22.755 INFO:teuthology.task.ceph.mds.a.err: 1: /tmp/cephtest/binary/usr/local/bin/ceph-mds() [0x9219c4]
2012-01-27T15:46:22.756 INFO:teuthology.task.ceph.mds.a.err: 2: (()+0xfb40) [0x7fe84372fb40]
2012-01-27T15:46:22.756 INFO:teuthology.task.ceph.mds.a.err: 3: (CInode::finish_scatter_gather_update(int)+0x128f) [0x7433ff]
2012-01-27T15:46:22.756 INFO:teuthology.task.ceph.mds.a.err: 4: (Locker::scatter_writebehind(ScatterLock*)+0x5ce) [0x67389e]
2012-01-27T15:46:22.756 INFO:teuthology.task.ceph.mds.a.err: 5: (Locker::simple_lock(SimpleLock*, bool*)+0x5e2) [0x674332]
2012-01-27T15:46:22.756 INFO:teuthology.task.ceph.mds.a.err: 6: (Locker::scatter_eval(ScatterLock*, bool*)+0x58f) [0x67971f]
2012-01-27T15:46:22.757 INFO:teuthology.task.ceph.mds.a.err: 7: (Locker::eval(SimpleLock*, bool*)+0x6d) [0x67a20d]
2012-01-27T15:46:22.757 INFO:teuthology.task.ceph.mds.a.err: 8: (Locker::try_eval(SimpleLock*, bool*)+0x830) [0x67aa50]
2012-01-27T15:46:22.757 INFO:teuthology.task.ceph.mds.a.err: 9: (Locker::eval_gather(SimpleLock*, bool, bool*, std::list<Context*, std::allocator<Context*> >*)+0x1c31) [0x67ee11]
2012-01-27T15:46:22.757 INFO:teuthology.task.ceph.mds.a.err: 10: (Locker::wrlock_finish(SimpleLock*, Mutation*, bool*)+0x45d) [0x683ded]
2012-01-27T15:46:22.757 INFO:teuthology.task.ceph.mds.a.err: 11: (Locker::_drop_non_rdlocks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x19d) [0x68439d]
2012-01-27T15:46:22.758 INFO:teuthology.task.ceph.mds.a.err: 12: (Locker::drop_locks(Mutation*, std::set<CInode*, std::less<CInode*>, std::allocator<CInode*> >*)+0x94) [0x684e84]
2012-01-27T15:46:22.758 INFO:teuthology.task.ceph.mds.a.err: 13: (Locker::scatter_writebehind_finish(ScatterLock*, Mutation*)+0x1e5) [0x685135]
2012-01-27T15:46:22.758 INFO:teuthology.task.ceph.mds.a.err: 14: (Locker::C_Locker_ScatterWB::finish(int)+0x1d) [0x6a102d]
2012-01-27T15:46:22.758 INFO:teuthology.task.ceph.mds.a.err: 15: (Context::complete(int)+0x12) [0x4a0cc2]
2012-01-27T15:46:22.758 INFO:teuthology.task.ceph.mds.a.err: 16: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x14e) [0x7e765e]
2012-01-27T15:46:22.759 INFO:teuthology.task.ceph.mds.a.err: 17: (Journaler::_finish_flush(int, unsigned long, utime_t)+0x1fd) [0x7d9a4d]
2012-01-27T15:46:22.759 INFO:teuthology.task.ceph.mds.a.err: 18: (Journaler::C_Flush::finish(int)+0x1d) [0x7e788d]
2012-01-27T15:46:22.759 INFO:teuthology.task.ceph.mds.a.err: 19: (Objecter::handle_osd_op_reply(MOSDOpReply*)+0x173f) [0x7bb7bf]
2012-01-27T15:46:22.759 INFO:teuthology.task.ceph.mds.a.err: 20: (MDS::handle_core_message(Message*)+0xecf) [0x4c8adf]
2012-01-27T15:46:22.759 INFO:teuthology.task.ceph.mds.a.err: 21: (MDS::_dispatch(Message*)+0x3c) [0x4c8c3c]
2012-01-27T15:46:22.760 INFO:teuthology.task.ceph.mds.a.err: 22: (MDS::ms_dispatch(Message*)+0xa9) [0x4cb229]
2012-01-27T15:46:22.760 INFO:teuthology.task.ceph.mds.a.err: 23: (SimpleMessenger::dispatch_entry()+0xa1a) [0x8206ea]
2012-01-27T15:46:22.760 INFO:teuthology.task.ceph.mds.a.err: 24: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x497b9c]
2012-01-27T15:46:22.760 INFO:teuthology.task.ceph.mds.a.err: 25: (Thread::_entry_func(void*)+0x12) [0x81ac22]
2012-01-27T15:46:22.762 INFO:teuthology.task.ceph.mds.a.err: 26: (()+0x7971) [0x7fe843727971]
2012-01-27T15:46:22.763 INFO:teuthology.task.ceph.mds.a.err: 27: (clone()+0x6d) [0x7fe841fb692d]
2012-01-27T15:46:22.958 INFO:teuthology.task.ceph.mds.a.err:daemon-helper: command crashed with signal 11

i wonder if this is just an issue with the signal handler racing with the other threads? i think most (all?) of these crashes are when daemon-helper sends a signal to the process....

#26 Updated by Sage Weil about 12 years ago

  • Status changed from Need More Info to Resolved

using clean shutdown now, yay

#27 Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Also available in: Atom PDF