Bug #259
closedMDS crash during log initialize
0%
Description
Running with the latest unstable (83d1ea6636dd432dcbb6a0c6046d551bee7be5c6) my MDS'es crash while initializing their logfile.
Crash occurs on both MDS'es and there is enough free space available to create a new logfile.
The rotation goes fine, that said, mds.1.log is created and the symlink too, but then the MDS crashes with the following loglines:
root@node14:/var/log/ceph# cat mds.1.log 10.07.07_11:01:59.300353 --- 6627 opened log /var/log/ceph/mds.1.log --- ceph version 0.21~rc (83d1ea6636dd432dcbb6a0c6046d551bee7be5c6) 10.07.07_11:01:59.301495 7fa6f6058720 mds-1.0 168 MDSCacheObject 10.07.07_11:01:59.301579 7fa6f6058720 mds-1.0 1616 CInode 10.07.07_11:01:59.301587 7fa6f6058720 mds-1.0 16 elist<>::item *7=112 10.07.07_11:01:59.301594 7fa6f6058720 mds-1.0 352 inode_t 10.07.07_11:01:59.301601 7fa6f6058720 mds-1.0 56 nest_info_t 10.07.07_11:01:59.301608 7fa6f6058720 mds-1.0 32 frag_info_t 10.07.07_11:01:59.301614 7fa6f6058720 mds-1.0 40 SimpleLock *5=200 10.07.07_11:01:59.301620 7fa6f6058720 mds-1.0 48 ScatterLock *3=144 10.07.07_11:01:59.301627 7fa6f6058720 mds-1.0 464 CDentry 10.07.07_11:01:59.301634 7fa6f6058720 mds-1.0 16 elist<>::item 10.07.07_11:01:59.301640 7fa6f6058720 mds-1.0 40 SimpleLock 10.07.07_11:01:59.301646 7fa6f6058720 mds-1.0 1528 CDir 10.07.07_11:01:59.301653 7fa6f6058720 mds-1.0 16 elist<>::item *2=32 10.07.07_11:01:59.301659 7fa6f6058720 mds-1.0 192 fnode_t 10.07.07_11:01:59.301666 7fa6f6058720 mds-1.0 56 nest_info_t *2 10.07.07_11:01:59.301672 7fa6f6058720 mds-1.0 32 frag_info_t *2 10.07.07_11:01:59.301681 7fa6f6058720 mds-1.0 168 Capability 10.07.07_11:01:59.301688 7fa6f6058720 mds-1.0 32 xlist<>::item *2=64 10.07.07_11:01:59.302113 7fa6f6057710 mds-1.0 MDS::ms_get_authorizer type=mon 10.07.07_11:01:59.302203 7fa6f3949710 mds-1.0 ms_handle_connect on 213.189.18.214:6789/0 10.07.07_11:01:59.303193 7fa6f3949710 monclient(hunting): found mon1 10.07.07_11:01:59.303528 7fa6f6058720 mds-1.0 beacon_send up:boot seq 1 (currently up:boot) 10.07.07_11:01:59.304045 7fa6f3949710 mds-1.0 handle_mds_map epoch 6 from mon1 10.07.07_11:01:59.304077 7fa6f3949710 mds-1.0 my compat compat={},rocompat={},incompat={1=base v0.20} 10.07.07_11:01:59.304095 7fa6f3949710 mds-1.0 mdsmap compat compat={},rocompat={},incompat={1=base v0.20} 10.07.07_11:01:59.304104 7fa6f3949710 mds-1.0 map says i am 213.189.18.214:6800/6627 mds-1 state down:dne 10.07.07_11:01:59.304122 7fa6f3949710 mds-1.0 not in map yet 10.07.07_11:01:59.369430 7fa6f3949710 mds-1.0 handle_mds_map epoch 7 from mon1 10.07.07_11:01:59.369482 7fa6f3949710 mds-1.0 my compat compat={},rocompat={},incompat={1=base v0.20} 10.07.07_11:01:59.369492 7fa6f3949710 mds-1.0 mdsmap compat compat={},rocompat={},incompat={1=base v0.20} 10.07.07_11:01:59.369501 7fa6f3949710 mds-1.0 map says i am 213.189.18.214:6800/6627 mds-1 state up:standby 10.07.07_11:01:59.369514 7fa6f3949710 mds-1.0 handle_mds_map standby 10.07.07_11:02:00.221097 7fa6f3949710 mds-1.0 handle_mds_map epoch 8 from mon1 10.07.07_11:02:00.221156 7fa6f3949710 mds-1.0 my compat compat={},rocompat={},incompat={1=base v0.20} 10.07.07_11:02:00.221166 7fa6f3949710 mds-1.0 mdsmap compat compat={},rocompat={},incompat={1=base v0.20} 10.07.07_11:02:00.221175 7fa6f3949710 mds0.0 map says i am 213.189.18.214:6800/6627 mds0 state up:creating 10.07.07_11:02:00.221250 7fa6f3949710 mds0.3 handle_mds_map i am now mds0.3 10.07.07_11:02:00.221260 7fa6f3949710 mds0.3 handle_mds_map state change up:standby --> up:creating 10.07.07_11:02:00.221267 7fa6f3949710 mds0.3 boot_create 10.07.07_11:02:00.221279 7fa6f3949710 mds0.3 boot_create creating fresh journal 10.07.07_11:02:00.221290 7fa6f3949710 mds0.log create empty log root@node14:/var/log/ceph#
debug mds was set to 20.
Backtrace for both MDS'es is the same:
Core was generated by `/usr/bin/cmds -i 0 -c /etc/ceph/ceph.conf'. Program terminated with signal 11, Segmentation fault. #0 Logger::set (this=0x0, key=5013, v=4194304) at common/Logger.cc:347 347 common/Logger.cc: No such file or directory. in common/Logger.cc (gdb) bt #0 Logger::set (this=0x0, key=5013, v=4194304) at common/Logger.cc:347 #1 0x0000000000617235 in MDLog::create (this=0x14a7580, c=0x14b3eb0) at mds/MDLog.cc:128 #2 0x000000000049523b in MDS::boot_create (this=0x14a6010) at mds/MDS.cc:957 #3 0x000000000049f7b9 in MDS::handle_mds_map (this=0x14a6010, m=0x7f1e200010d0) at mds/MDS.cc:829 #4 0x00000000004a07d5 in MDS::_dispatch (this=0x14a6010, m=0x7f1e200010d0) at mds/MDS.cc:1391 #5 0x00000000004a206d in MDS::ms_dispatch (this=0x14a6010, m=0x7f1e200010d0) at mds/MDS.cc:1319 #6 0x000000000047e949 in Messenger::ms_deliver_dispatch (this=0x1493be0) at msg/Messenger.h:97 #7 SimpleMessenger::dispatch_entry (this=0x1493be0) at msg/SimpleMessenger.cc:342 #8 0x0000000000474bdc in SimpleMessenger::DispatchThread::entry (this=0x1494068) at msg/SimpleMessenger.h:534 #9 0x0000000000487aba in Thread::_entry_func (arg=0x1) at ./common/Thread.h:39 #10 0x00007f1e279029ca in start_thread () from /lib/libpthread.so.0 #11 0x00007f1e26b226cd in clone () from /lib/libc.so.6 #12 0x0000000000000000 in ?? ()
The core file is attached.
Tried another fresh mkcephfs, but didn't work either, the MDS keep crashing.
Files
Updated by Wido den Hollander almost 14 years ago
Seems to be in commit 83d1ea6636dd432dcbb6a0c6046d551bee7be5c6, reverting to 1ca446dd9ac2a03c47b3b6f8cc7007660da911ec "fixed" it.
Updated by Sage Weil almost 14 years ago
- Status changed from New to Resolved
sorry, fixed for real by commit:9432a9588972860aa2fdb3f9ea18eb88073ace9a
Updated by John Spray over 7 years ago
- Project changed from Ceph to CephFS
- Category deleted (
1)
Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.