Project

General

Profile

Bug #255

MDS crash while during journal replay

Added by Wido den Hollander over 13 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

While updating my cluster to the latest unstable i saw a crash on both my MDS'es.

The backtrace (same on both):

root@node14:~# gdb /usr/bin/cmds /core.node14.17132 
GNU gdb (GDB) 7.1-ubuntu
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying" 
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/cmds...Reading symbols from /usr/lib/debug/usr/bin/cmds...done.
done.
[New Thread 17156]
[New Thread 17157]
[New Thread 17159]
[New Thread 17158]
[New Thread 17160]
[New Thread 17161]
[New Thread 17165]
[New Thread 17162]
[New Thread 17163]
[New Thread 17166]
[New Thread 17164]
[New Thread 17167]
[New Thread 17132]
[New Thread 17169]
[New Thread 17133]
[New Thread 17170]
[New Thread 17135]
[New Thread 17172]
[New Thread 17136]
[New Thread 17173]
[New Thread 17137]
[New Thread 17134]
[New Thread 17141]
[New Thread 17140]
[New Thread 17145]
[New Thread 17142]
[New Thread 17147]
[New Thread 17144]
[New Thread 17148]
[New Thread 17146]
[New Thread 17149]
[New Thread 17151]
[New Thread 17152]
[New Thread 17150]
[New Thread 17153]
[New Thread 17154]
[New Thread 17168]
[New Thread 17155]
[New Thread 17143]

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libcrypto.so.0.9.8...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypto.so.0.9.8
Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libz.so.1
Core was generated by `/usr/bin/cmds -i 1 -c /etc/ceph/ceph.conf'.
Program terminated with signal 6, Aborted.
#0  0x00007fa5614b5a75 in raise () from /lib/libc.so.6
(gdb) bt
#0  0x00007fa5614b5a75 in raise () from /lib/libc.so.6
#1  0x00007fa5614b95c0 in abort () from /lib/libc.so.6
#2  0x00007fa561d6a8e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#3  0x00007fa561d68d16 in ?? () from /usr/lib/libstdc++.so.6
#4  0x00007fa561d68d43 in std::terminate() () from /usr/lib/libstdc++.so.6
#5  0x00007fa561d68e3e in __cxa_throw () from /usr/lib/libstdc++.so.6
#6  0x000000000069f178 in ceph::__ceph_assert_fail (assertion=0x6de4de "in", file=<value optimized out>, line=895, 
    func=<value optimized out>) at common/assert.cc:30
#7  0x0000000000629b4d in EOpen::replay (this=0x7fa5583c68e0, mds=0xb6d930) at mds/journal.cc:895
#8  0x000000000061a7b9 in MDLog::_replay_thread (this=0xb6f880) at mds/MDLog.cc:541
#9  0x00000000004a44fd in MDLog::ReplayThread::entry() ()
#10 0x00000000004877aa in Thread::_entry_func (arg=0x42ec) at ./common/Thread.h:39
#11 0x00007fa5623489ca in start_thread () from /lib/libpthread.so.0
#12 0x00007fa5615686cd in clone () from /lib/libc.so.6
#13 0x0000000000000000 in ?? ()
(gdb) 

The log of the MDS is attached, which is also the same on both MDS'es, they both end on:

10.07.06_09:45:28.197582 7fa55cb52710 mds0.journal EMetaBlob.replay for [2,2] had [inode 10000000152 [2,2] ~mds0/stray/1000000000e/ubuntu-10.04-desktop-i386.iso.zsync auth v5391 s=1432653 nl=1 rb=1432653 rf=1 rd=0 (iauth sync) (ilink sync) (ifile sync) (ixattr sync) (iversion lock) | dirty 0xd0e2f0]

mds.1.log.gz (2.2 MB) Wido den Hollander, 07/06/2010 04:39 AM

mds.0.log.gz (2.21 MB) Wido den Hollander, 07/06/2010 04:39 AM

History

#1 Updated by Sage Weil over 13 years ago

  • Status changed from New to Resolved

fixed by commit:100b6776ddb095c43cf20734b48e399d359d7b1b

#2 Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Also available in: Atom PDF