Bug #41026
closed
MDS process crashes on 14.2.2
Added by Anonymous almost 5 years ago.
Updated almost 5 years ago.
Description
MDS Process on Ubuntu 18.04 Nautilus 14.2.2 are crashing, unable to recover
7> 2019-07-31 13:29:46.888 7fb36a61a700 -1 --2 [v2:10.3.0.1:6800/2730552661,v1:10.3.0.1:6803/2730552661] >> [v2:10.3.0.242:$
6> 2019-07-31 13:29:46.888 7fb367465700 1 mds.1.objecter ms_handle_reset 0x4b4df80 session 0xd91c840 osd.179
-5> 2019-07-31 13:29:46.888 7fb36a61a700 10 monclient: get_auth_request con 0x5666f600 auth_method 0
-4> 2019-07-31 13:29:46.888 7fb36ae1b700 -1 --2 [v2:10.3.0.1:6800/2730552661,v1:10.3.0.1:6803/2730552661] >> [v2:10.3.0.242:$
-3> 2019-07-31 13:29:46.888 7fb367465700 1 mds.1.objecter ms_handle_reset 0x5666e880 session 0xd91cf20 osd.0
-2> 2019-07-31 13:29:46.888 7fb367465700 4 mds.1.server handle_client_request client_request(client.25568318:505 lookup #0x2$
-1> 2019-07-31 13:29:46.888 7fb36ae1b700 10 monclient: get_auth_request con 0x5666fa80 auth_method 0
0> 2019-07-31 13:29:46.888 7fb36b61c700 -1 ** Caught signal (Aborted) *
in thread 7fb36b61c700 thread_name:msgr-worker-0
ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)
1: (()+0x11390) [0x7fb36f571390]
2: (gsignal()+0x38) [0x7fb36ecbe428]
3: (abort()+0x16a) [0x7fb36ecc002a]
4: (_gnu_cxx::_verbose_terminate_handler()+0x135) [0x7fb3702a7155]
5: (_cxxabiv1::_terminate(void (*)())+0x6) [0x7fb37029b136]
6: (()+0x8ad181) [0x7fb37029b181]
7: (()+0x91568e) [0x7fb37030368e]
8: (()+0x76ba) [0x7fb36f5676ba]
9: (clone()+0x6d) [0x7fb36ed9041d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
After trying to fix the server, SINGLE MDS setup now:
-18> 2019-07-31 17:59:21.339 7f11df8bf700 4 mds.0.purge_queue operator(): open complete
-17> 2019-07-31 17:59:21.339 7f11df8bf700 1 mds.0.journaler.pq(ro) set_writeable
-16> 2019-07-31 17:59:21.343 7f11e8a80700 10 monclient: get_auth_request con 0x387a400 auth_method 0
-15> 2019-07-31 17:59:21.343 7f11de8bd700 1 mds.0.journaler.mdlog(ro) _finish_read_head loghead(trim 27370815750144, expire 27370817825265, write 27370928226441, stream_format 1). probing for end of log (from 27370928226441)...
-14> 2019-07-31 17:59:21.343 7f11de8bd700 1 mds.0.journaler.mdlog(ro) probing for end of the log
-13> 2019-07-31 17:59:21.427 7f11de8bd700 1 mds.0.journaler.mdlog(ro) _finish_probe_end write_pos = 27370928240437 (header had 27370928226441). recovered.
-12> 2019-07-31 17:59:21.427 7f11de0bc700 4 mds.0.log Journal 0x200 recovered.
-11> 2019-07-31 17:59:21.427 7f11de0bc700 4 mds.0.log Recovered journal 0x200 in format 1
-10> 2019-07-31 17:59:21.427 7f11de0bc700 2 mds.0.78866 Booting: 1: loading/discovering base inodes
-9> 2019-07-31 17:59:21.427 7f11de0bc700 0 mds.0.cache creating system inode with ino:0x100
-8> 2019-07-31 17:59:21.427 7f11de0bc700 0 mds.0.cache creating system inode with ino:0x1
-7> 2019-07-31 17:59:21.427 7f11de8bd700 2 mds.0.78866 Booting: 2: replaying mds log
-6> 2019-07-31 17:59:21.427 7f11de8bd700 2 mds.0.78866 Booting: 2: waiting for purge queue recovered
-5> 2019-07-31 17:59:21.479 7f11e48c9700 4 mds.0.78866 handle_osd_map epoch 208882, 0 new blacklist entries
-4> 2019-07-31 17:59:21.479 7f11e48c9700 10 monclient: _renew_subs
-3> 2019-07-31 17:59:21.479 7f11e48c9700 10 monclient: _send_mon_message to mon.km-fsn-1-dc4-m1-797678 at v2:10.3.0.1:3300/0
-2> 2019-07-31 17:59:21.663 7f11dd0ba700 -1 log_channel(cluster) log [ERR] : ESession.replay sessionmap v 825175264 - 1 > table 0
-1> 2019-07-31 17:59:21.663 7f11dd0ba700 -1 /build/ceph-14.2.2/src/mds/journal.cc: In function 'virtual void ESession::replay(MDSRank*)' thread 7f11dd0ba700 time 2019-07-31 17:59:21.666728
/build/ceph-14.2.2/src/mds/journal.cc: 1655: FAILED ceph_assert(g_conf()->mds_wipe_sessions)
ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f11ed133bb2]
2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f11ed133d8d]
3: (ESession::replay(MDSRank*)+0xfa0) [0x809030]
4: (MDLog::_replay_thread()+0x892) [0x7a7432]
5: (MDLog::ReplayThread::entry()+0xd) [0x50ab6d]
6: (()+0x76ba) [0x7f11ec9cb6ba]
7: (clone()+0x6d) [0x7f11ec1f441d]
0> 2019-07-31 17:59:21.663 7f11dd0ba700 -1 *** Caught signal (Aborted) **
in thread 7f11dd0ba700 thread_name:md_log_replay
ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)
1: (()+0x11390) [0x7f11ec9d5390]
2: (gsignal()+0x38) [0x7f11ec122428]
3: (abort()+0x16a) [0x7f11ec12402a]
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x7f11ed133c03]
5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f11ed133d8d]
6: (ESession::replay(MDSRank*)+0xfa0) [0x809030]
7: (MDLog::_replay_thread()+0x892) [0x7a7432]
8: (MDLog::ReplayThread::entry()+0xd) [0x50ab6d]
9: (()+0x76ba) [0x7f11ec9cb6ba]
10: (clone()+0x6d) [0x7f11ec1f441d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 rbd_mirror
0/ 5 rbd_replay
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
1/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 0 ms
1/ 5 mon
0/10 monc
1/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 1 reserver
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 rgw_sync
1/10 civetweb
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
4/ 5 memdb
1/ 5 kinetic
1/ 5 fuse
1/ 5 mgr
1/ 5 mgrc
1/ 5 dpdk
1/ 5 eventtrace
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 10000
max_new 1000
log_file /var/log/ceph/ceph-mds.km-fsn-1-dc4-m1-797678.log
- Project changed from Ceph to CephFS
- Status changed from New to Rejected
Please seek help on ceph-users. Provide more information about your cluster and how the error came about.
Also available in: Atom
PDF