Project

General

Profile

Actions

Bug #3049

closed

mds: startup+suicide failure, MDLog::handle_journaler_write_error

Added by Sage Weil over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Objecter
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description


     0> 2012-08-25 21:38:18.087785 7fba8c9a4700 -1 common/Thread.cc: In function 'int Thread::join(void**)' thread 7fba8c9a4700 time 2012-08-25 21:38:18.084887
common/Thread.cc: 122: FAILED assert(status == 0)

 ceph version 0.50-383-gaa91cf8 (commit:aa91cf81af548463ae583872e319316871771e94)
 1: (Thread::detach()+0) [0x8abee0]
 2: (Finisher::stop()+0x197) [0x951957]
 3: (MonClient::shutdown()+0xbc) [0x86503c]
 4: (MDS::suicide()+0x137) [0x4c63e7]
 5: (MDLog::handle_journaler_write_error(int)+0x1b5) [0x7a78c5]
 6: (MDLog::C_MDL_WriteError::finish(int)+0x15) [0x7b0485]
 7: (Journaler::handle_write_error(int)+0x77) [0x7b4a47]
 8: (Journaler::_finish_write_head(int, Journaler::Header&, Context*)+0x404) [0x7b5794]
 9: (Journaler::C_WriteHead::finish(int)+0x1d) [0x7bfdbd]
 10: (Context::complete(int)+0x12) [0x4b7032]
 11: (Objecter::C_Op_Map_Latest::finish(int)+0x160) [0x7d0f90]
 12: (Context::complete(int)+0x12) [0x4b7032]
 13: (C_IsLatestMap::finish(int)+0x27) [0x8719b7]
 14: (Finisher::finisher_thread_entry()+0x3ad) [0x95291d]
 15: (Finisher::FinisherThread::entry()+0x15) [0x871bf5]
 16: (Thread::_entry_func(void*)+0x12) [0x8abb52]
 17: (()+0x7e9a) [0x7fba92710e9a]
 18: (clone()+0x6d) [0x7fba90cb84bd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

ubuntu@teuthology:/a/teuthology-2012-08-25_19:00:04-regression-master-testing-gcov/8818 for a reasonably useful log

this looks like it's related to the msgr failure injection, which seems to be enough to break mds startup
Actions #1

Updated by Sage Weil over 11 years ago

  • Category changed from 1 to Objecter
  • Status changed from 12 to Fix Under Review

was able to reproduce after a few attempts with

./stop.sh  ; CEPH_NUM_MDS=15 CEPH_NUM_MON=1 ./vstart.sh  -d -n -x -o 'debug monc = 20' -o 'debug objecter = 20' -o 'ms inject socket failures = 250'

see fix in wip-monc-latest

Actions #2

Updated by Sage Weil over 11 years ago

  • Status changed from Fix Under Review to 7
Actions #3

Updated by Sage Weil over 11 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF