Project

General

Profile

Actions

Bug #259

closed

MDS crash during log initialize

Added by Wido den Hollander almost 14 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Running with the latest unstable (83d1ea6636dd432dcbb6a0c6046d551bee7be5c6) my MDS'es crash while initializing their logfile.

Crash occurs on both MDS'es and there is enough free space available to create a new logfile.

The rotation goes fine, that said, mds.1.log is created and the symlink too, but then the MDS crashes with the following loglines:

root@node14:/var/log/ceph# cat mds.1.log
10.07.07_11:01:59.300353 --- 6627 opened log /var/log/ceph/mds.1.log ---
ceph version 0.21~rc (83d1ea6636dd432dcbb6a0c6046d551bee7be5c6)
10.07.07_11:01:59.301495 7fa6f6058720 mds-1.0 168    MDSCacheObject
10.07.07_11:01:59.301579 7fa6f6058720 mds-1.0 1616    CInode
10.07.07_11:01:59.301587 7fa6f6058720 mds-1.0 16     elist<>::item   *7=112
10.07.07_11:01:59.301594 7fa6f6058720 mds-1.0 352     inode_t 
10.07.07_11:01:59.301601 7fa6f6058720 mds-1.0 56      nest_info_t 
10.07.07_11:01:59.301608 7fa6f6058720 mds-1.0 32      frag_info_t 
10.07.07_11:01:59.301614 7fa6f6058720 mds-1.0 40     SimpleLock   *5=200
10.07.07_11:01:59.301620 7fa6f6058720 mds-1.0 48     ScatterLock  *3=144
10.07.07_11:01:59.301627 7fa6f6058720 mds-1.0 464    CDentry
10.07.07_11:01:59.301634 7fa6f6058720 mds-1.0 16     elist<>::item
10.07.07_11:01:59.301640 7fa6f6058720 mds-1.0 40     SimpleLock
10.07.07_11:01:59.301646 7fa6f6058720 mds-1.0 1528    CDir 
10.07.07_11:01:59.301653 7fa6f6058720 mds-1.0 16     elist<>::item   *2=32
10.07.07_11:01:59.301659 7fa6f6058720 mds-1.0 192     fnode_t 
10.07.07_11:01:59.301666 7fa6f6058720 mds-1.0 56      nest_info_t *2
10.07.07_11:01:59.301672 7fa6f6058720 mds-1.0 32      frag_info_t *2
10.07.07_11:01:59.301681 7fa6f6058720 mds-1.0 168    Capability 
10.07.07_11:01:59.301688 7fa6f6058720 mds-1.0 32     xlist<>::item   *2=64
10.07.07_11:01:59.302113 7fa6f6057710 mds-1.0 MDS::ms_get_authorizer type=mon
10.07.07_11:01:59.302203 7fa6f3949710 mds-1.0 ms_handle_connect on 213.189.18.214:6789/0
10.07.07_11:01:59.303193 7fa6f3949710 monclient(hunting): found mon1
10.07.07_11:01:59.303528 7fa6f6058720 mds-1.0 beacon_send up:boot seq 1 (currently up:boot)
10.07.07_11:01:59.304045 7fa6f3949710 mds-1.0 handle_mds_map epoch 6 from mon1
10.07.07_11:01:59.304077 7fa6f3949710 mds-1.0      my compat compat={},rocompat={},incompat={1=base v0.20}
10.07.07_11:01:59.304095 7fa6f3949710 mds-1.0  mdsmap compat compat={},rocompat={},incompat={1=base v0.20}
10.07.07_11:01:59.304104 7fa6f3949710 mds-1.0 map says i am 213.189.18.214:6800/6627 mds-1 state down:dne
10.07.07_11:01:59.304122 7fa6f3949710 mds-1.0 not in map yet
10.07.07_11:01:59.369430 7fa6f3949710 mds-1.0 handle_mds_map epoch 7 from mon1
10.07.07_11:01:59.369482 7fa6f3949710 mds-1.0      my compat compat={},rocompat={},incompat={1=base v0.20}
10.07.07_11:01:59.369492 7fa6f3949710 mds-1.0  mdsmap compat compat={},rocompat={},incompat={1=base v0.20}
10.07.07_11:01:59.369501 7fa6f3949710 mds-1.0 map says i am 213.189.18.214:6800/6627 mds-1 state up:standby
10.07.07_11:01:59.369514 7fa6f3949710 mds-1.0 handle_mds_map standby
10.07.07_11:02:00.221097 7fa6f3949710 mds-1.0 handle_mds_map epoch 8 from mon1
10.07.07_11:02:00.221156 7fa6f3949710 mds-1.0      my compat compat={},rocompat={},incompat={1=base v0.20}
10.07.07_11:02:00.221166 7fa6f3949710 mds-1.0  mdsmap compat compat={},rocompat={},incompat={1=base v0.20}
10.07.07_11:02:00.221175 7fa6f3949710 mds0.0 map says i am 213.189.18.214:6800/6627 mds0 state up:creating
10.07.07_11:02:00.221250 7fa6f3949710 mds0.3 handle_mds_map i am now mds0.3
10.07.07_11:02:00.221260 7fa6f3949710 mds0.3 handle_mds_map state change up:standby --> up:creating
10.07.07_11:02:00.221267 7fa6f3949710 mds0.3 boot_create
10.07.07_11:02:00.221279 7fa6f3949710 mds0.3 boot_create creating fresh journal
10.07.07_11:02:00.221290 7fa6f3949710 mds0.log create empty log
root@node14:/var/log/ceph#

debug mds was set to 20.

Backtrace for both MDS'es is the same:

Core was generated by `/usr/bin/cmds -i 0 -c /etc/ceph/ceph.conf'.
Program terminated with signal 11, Segmentation fault.
#0  Logger::set (this=0x0, key=5013, v=4194304) at common/Logger.cc:347
347    common/Logger.cc: No such file or directory.
    in common/Logger.cc
(gdb) bt
#0  Logger::set (this=0x0, key=5013, v=4194304) at common/Logger.cc:347
#1  0x0000000000617235 in MDLog::create (this=0x14a7580, c=0x14b3eb0) at mds/MDLog.cc:128
#2  0x000000000049523b in MDS::boot_create (this=0x14a6010) at mds/MDS.cc:957
#3  0x000000000049f7b9 in MDS::handle_mds_map (this=0x14a6010, m=0x7f1e200010d0) at mds/MDS.cc:829
#4  0x00000000004a07d5 in MDS::_dispatch (this=0x14a6010, m=0x7f1e200010d0) at mds/MDS.cc:1391
#5  0x00000000004a206d in MDS::ms_dispatch (this=0x14a6010, m=0x7f1e200010d0) at mds/MDS.cc:1319
#6  0x000000000047e949 in Messenger::ms_deliver_dispatch (this=0x1493be0) at msg/Messenger.h:97
#7  SimpleMessenger::dispatch_entry (this=0x1493be0) at msg/SimpleMessenger.cc:342
#8  0x0000000000474bdc in SimpleMessenger::DispatchThread::entry (this=0x1494068) at msg/SimpleMessenger.h:534
#9  0x0000000000487aba in Thread::_entry_func (arg=0x1) at ./common/Thread.h:39
#10 0x00007f1e279029ca in start_thread () from /lib/libpthread.so.0
#11 0x00007f1e26b226cd in clone () from /lib/libc.so.6
#12 0x0000000000000000 in ?? ()

The core file is attached.

Tried another fresh mkcephfs, but didn't work either, the MDS keep crashing.


Files

core.node14.6874.gz (167 KB) core.node14.6874.gz Wido den Hollander, 07/07/2010 02:37 AM
Actions #1

Updated by Wido den Hollander almost 14 years ago

Seems to be in commit 83d1ea6636dd432dcbb6a0c6046d551bee7be5c6, reverting to 1ca446dd9ac2a03c47b3b6f8cc7007660da911ec "fixed" it.

Actions #2

Updated by Sage Weil almost 14 years ago

  • Status changed from New to Resolved

sorry, fixed for real by commit:9432a9588972860aa2fdb3f9ea18eb88073ace9a

Actions #3

Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Actions

Also available in: Atom PDF