Project

General

Profile

Actions

Bug #1023

closed

just-recovered mds fails journaler assert (to > trimming_pos);

Added by Alexandre Oliva about 13 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

3 mdses configured for standby-replay. mds.1 was active, the other two?!? (not sure) were apparently following it. mds.1 got kicked out for lack of heartbeats. mds.0 takes over, and immediately crashes.

backtrace:

#0 0x00000036c5288403 in _memcpy_sse2 () from /lib64/libc.so.6
#1 0x000000000075a2a4 in ceph::BackTrace::print (this=0x7f3e87e69930, out=...)
at common/BackTrace.cc:37
#2 0x000000000073db24 in handle_fatal_signal (signum=11)
at common/signal.cc:74
#3 <signal handler called>
#4 0x00000036c5288403 in __memcpy_sse2 () from /lib64/libc.so.6
#5 0x000000000075a2a4 in ceph::BackTrace::print (this=0x7f3e87e6a730, out=...)
at common/BackTrace.cc:37
#6 0x000000000073db24 in handle_fatal_signal (signum=6) at common/signal.cc:74
#7 <signal handler called>
#8 0x00000036c52330c5 in raise () from /lib64/libc.so.6
#9 0x00000036c5234a76 in abort () from /lib64/libc.so.6
#10 0x00000036cc6bc08d in __gnu_cxx::
_verbose_terminate_handler() ()
from /usr/lib64/libstdc++.so.6
#11 0x00000036cc6ba2a6 in ?? () from /usr/lib64/libstdc++.so.6
#12 0x00000036cc6ba2d3 in std::terminate() () from /usr/lib64/libstdc++.so.6
#13 0x00000036cc6ba3de in _cxa_throw () from /usr/lib64/libstdc++.so.6
#14 0x0000000000724052 in ceph::
_ceph_assert_fail (
assertion=<value optimized out>, file=<value optimized out>,
line=<value optimized out>,
func=0x790dc0 "void Journaler::_trim_finish(int, uint64_t)")
at common/assert.cc:86
#15 0x00000000006c1fed in Journaler::_trim_finish (this=0x2db0000, r=0,
to=92274688) at osdc/Journaler.cc:963
#16 0x00000000006ba8c7 in Filer::_do_purge_range (this=0x2db00d0,
pr=0x2c99b00, fin=<value optimized out>) at osdc/Filer.cc:278
#17 0x00000000006a22f4 in Objecter::handle_osd_op_reply (this=0x2ca0240,
m=0x2d7d940) at osdc/Objecter.cc:806
#18 0x00000000004cfc8f in MDS::handle_core_message (this=0x2cf0a00,
m=0x2d7d940) at mds/MDS.cc:1680
#19 0x00000000004cfdec in MDS::_dispatch (this=0x2cf0a00, m=0x2d7d940)
at mds/MDS.cc:1795
#20 0x00000000004d16c1 in MDS::ms_dispatch (this=0x2cf0a00, m=0x2d7d940)
at mds/MDS.cc:1616
#21 0x00000000004aa755 in ms_deliver_dispatch (this=0x2cf0000)
at msg/Messenger.h:98
#22 SimpleMessenger::dispatch_entry (this=0x2cf0000)
at msg/SimpleMessenger.cc:352
#23 0x000000000049e13c in SimpleMessenger::DispatchThread::entry (
this=<value optimized out>) at ./msg/SimpleMessenger.h:533
#24 0x00000036c5e06ccb in start_thread () from /lib64/libpthread.so.0
#25 0x00000036c52e0c2d in clone () from /lib64/libc.so.6

The entire log history of the 3 mdses is attached. My untrained eyes couldn't see anything interesting.


Files

trim-assert.tar.bz2 (1.52 MB) trim-assert.tar.bz2 Alexandre Oliva, 04/22/2011 01:37 PM
Actions #1

Updated by Sage Weil about 13 years ago

  • Category set to 1
  • Assignee set to Sage Weil
  • Target version set to v0.27.1

When it went fron standby-replay to replay a second replay thread was being forked. Cleaning this up in stable.

Actions #2

Updated by Sage Weil about 13 years ago

  • Status changed from New to Resolved

fixed by commit:e8847b2cddbadf4f31972b490d80944dcb9f992d and others

Actions #3

Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)
  • Target version deleted (v0.27.1)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Actions

Also available in: Atom PDF