Actions
Bug #312
closedMDS crash: LogSegment::try_to_expire(MDS*)
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
This morning i upgraded my cluster to the latest unstable, afterwards i tried to mount the cluster, which failed.
While mounting i saw that my MDS'es crashed, both with almost the same backtrace:
mds0
Core was generated by `/usr/bin/cmds -i 0 -c /etc/ceph/ceph.conf'. Program terminated with signal 11, Segmentation fault. #0 0x0000000000621374 in LogSegment::try_to_expire(MDS*) () (gdb) bt #0 0x0000000000621374 in LogSegment::try_to_expire(MDS*) () #1 0x000000000061b06d in MDLog::try_expire(LogSegment*) () #2 0x000000000061bcc0 in MDLog::trim(int) () #3 0x000000000049553a in MDS::tick() () #4 0x000000000069bfb9 in SafeTimer::EventWrapper::finish(int) () #5 0x000000000069e3bc in Timer::timer_entry() () #6 0x0000000000474ebd in Timer::TimerThread::entry() () #7 0x0000000000487c2a in Thread::_entry_func(void*) () #8 0x00007ff8fee5f9ca in start_thread () from /lib/libpthread.so.0 #9 0x00007ff8fe07f6cd in clone () from /lib/libc.so.6 #10 0x0000000000000000 in ?? () (gdb)
mds1
Core was generated by `/usr/bin/cmds -i 1 -c /etc/ceph/ceph.conf'. Program terminated with signal 11, Segmentation fault. #0 CDentry::get_dir (this=0x94e9b0, mds=0x1476330) at mds/events/../CDentry.h:200 200 mds/events/../CDentry.h: No such file or directory. in mds/events/../CDentry.h (gdb) bt #0 CDentry::get_dir (this=0x94e9b0, mds=0x1476330) at mds/events/../CDentry.h:200 #1 LogSegment::try_to_expire (this=0x94e9b0, mds=0x1476330) at mds/journal.cc:105 #2 0x000000000061b06d in MDLog::try_expire (this=0x1475580, ls=0x2689810) at mds/MDLog.cc:363 #3 0x000000000061bcc0 in MDLog::trim (this=0x1475580, m=<value optimized out>) at mds/MDLog.cc:355 #4 0x000000000049553a in MDS::tick (this=0x1476330) at mds/MDS.cc:513 #5 0x000000000069bfb9 in SafeTimer::EventWrapper::finish (this=0x7fadc44bd780, r=0) at common/Timer.cc:295 #6 0x000000000069e3bc in Timer::timer_entry (this=0x1476378) at common/Timer.cc:100 #7 0x0000000000474ebd in Timer::TimerThread::entry (this=<value optimized out>) at ./common/Timer.h:77 #8 0x0000000000487c2a in Thread::_entry_func (arg=0x94e9b0) at ./common/Thread.h:39 #9 0x00007fadcd0eb9ca in start_thread () from /lib/libpthread.so.0 #10 0x00007fadcc30a6fd in clone () from /lib/libc.so.6 #11 0x0000000000000000 in ?? () (gdb)
For mds1 i upped the loglevel to 20 to see what the last entries are.
The corefiles, binaries and logfiles are uploaded to logger.ceph.widodh.nl in the directory /srv/ceph/issues/cmds_crash_logsegment_try_to_expire
The most relevant files are:- core.cmds.node13.9718 (Last crash of mds0, debug at 20)
- core.cmds.node14.11525 (Last crash with debug at 20 for mds.1.log)
- mds.1.log (With debug at 20)
- mds.0.log (With debug at 20)
I've preserved the timestamps of the corefiles (the two posted above), so you can compare them with the logfiles.
Nothing weird happend, last night i did a sync of kernel.org which went fine, then this morning (a few hours later) i upgraded to the last unstable.
Actions