Bug #312: MDS crash: LogSegment::try_to_expire(MDS*) - CephFS - Ceph

Actions

Copy link

Bug #312

closed

MDS crash: LogSegment::try_to_expire(MDS*)

Added by Wido den Hollander almost 14 years ago. Updated over 7 years ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Greg Farnum

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

This morning i upgraded my cluster to the latest unstable, afterwards i tried to mount the cluster, which failed.

While mounting i saw that my MDS'es crashed, both with almost the same backtrace:

mds0

Core was generated by `/usr/bin/cmds -i 0 -c /etc/ceph/ceph.conf'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000621374 in LogSegment::try_to_expire(MDS*) ()
(gdb) bt
#0  0x0000000000621374 in LogSegment::try_to_expire(MDS*) ()
#1  0x000000000061b06d in MDLog::try_expire(LogSegment*) ()
#2  0x000000000061bcc0 in MDLog::trim(int) ()
#3  0x000000000049553a in MDS::tick() ()
#4  0x000000000069bfb9 in SafeTimer::EventWrapper::finish(int) ()
#5  0x000000000069e3bc in Timer::timer_entry() ()
#6  0x0000000000474ebd in Timer::TimerThread::entry() ()
#7  0x0000000000487c2a in Thread::_entry_func(void*) ()
#8  0x00007ff8fee5f9ca in start_thread () from /lib/libpthread.so.0
#9  0x00007ff8fe07f6cd in clone () from /lib/libc.so.6
#10 0x0000000000000000 in ?? ()
(gdb)

mds1

Core was generated by `/usr/bin/cmds -i 1 -c /etc/ceph/ceph.conf'.
Program terminated with signal 11, Segmentation fault.
#0  CDentry::get_dir (this=0x94e9b0, mds=0x1476330) at mds/events/../CDentry.h:200
200    mds/events/../CDentry.h: No such file or directory.
    in mds/events/../CDentry.h
(gdb) bt
#0  CDentry::get_dir (this=0x94e9b0, mds=0x1476330) at mds/events/../CDentry.h:200
#1  LogSegment::try_to_expire (this=0x94e9b0, mds=0x1476330) at mds/journal.cc:105
#2  0x000000000061b06d in MDLog::try_expire (this=0x1475580, ls=0x2689810) at mds/MDLog.cc:363
#3  0x000000000061bcc0 in MDLog::trim (this=0x1475580, m=<value optimized out>) at mds/MDLog.cc:355
#4  0x000000000049553a in MDS::tick (this=0x1476330) at mds/MDS.cc:513
#5  0x000000000069bfb9 in SafeTimer::EventWrapper::finish (this=0x7fadc44bd780, r=0) at common/Timer.cc:295
#6  0x000000000069e3bc in Timer::timer_entry (this=0x1476378) at common/Timer.cc:100
#7  0x0000000000474ebd in Timer::TimerThread::entry (this=<value optimized out>) at ./common/Timer.h:77
#8  0x0000000000487c2a in Thread::_entry_func (arg=0x94e9b0) at ./common/Thread.h:39
#9  0x00007fadcd0eb9ca in start_thread () from /lib/libpthread.so.0
#10 0x00007fadcc30a6fd in clone () from /lib/libc.so.6
#11 0x0000000000000000 in ?? ()
(gdb)

For mds1 i upped the loglevel to 20 to see what the last entries are.

The corefiles, binaries and logfiles are uploaded to logger.ceph.widodh.nl in the directory /srv/ceph/issues/cmds_crash_logsegment_try_to_expire

The most relevant files are:

core.cmds.node13.9718 (Last crash of mds0, debug at 20)
core.cmds.node14.11525 (Last crash with debug at 20 for mds.1.log)
mds.1.log (With debug at 20)
mds.0.log (With debug at 20)

I've preserved the timestamps of the corefiles (the two posted above), so you can compare them with the logfiles.

Nothing weird happend, last night i did a sync of kernel.org which went fine, then this morning (a few hours later) i upgraded to the last unstable.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #312

MDS crash: LogSegment::try_to_expire(MDS*)

Updated by Greg Farnum almost 14 years ago

Updated by Greg Farnum almost 14 years ago

Updated by Wido den Hollander over 13 years ago

Updated by Wido den Hollander over 13 years ago

Updated by John Spray over 7 years ago