Actions
Bug #41026
closedMDS process crashes on 14.2.2
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
MDS Process on Ubuntu 18.04 Nautilus 14.2.2 are crashing, unable to recover
7> 2019-07-31 13:29:46.888 7fb36a61a700 -1 --2 [v2:10.3.0.1:6800/2730552661,v1:10.3.0.1:6803/2730552661] >> [v2:10.3.0.242:$
6> 2019-07-31 13:29:46.888 7fb367465700 1 mds.1.objecter ms_handle_reset 0x4b4df80 session 0xd91c840 osd.179
-5> 2019-07-31 13:29:46.888 7fb36a61a700 10 monclient: get_auth_request con 0x5666f600 auth_method 0
-4> 2019-07-31 13:29:46.888 7fb36ae1b700 -1 --2 [v2:10.3.0.1:6800/2730552661,v1:10.3.0.1:6803/2730552661] >> [v2:10.3.0.242:$
-3> 2019-07-31 13:29:46.888 7fb367465700 1 mds.1.objecter ms_handle_reset 0x5666e880 session 0xd91cf20 osd.0
-2> 2019-07-31 13:29:46.888 7fb367465700 4 mds.1.server handle_client_request client_request(client.25568318:505 lookup #0x2$
-1> 2019-07-31 13:29:46.888 7fb36ae1b700 10 monclient: get_auth_request con 0x5666fa80 auth_method 0
0> 2019-07-31 13:29:46.888 7fb36b61c700 -1 ** Caught signal (Aborted) *
in thread 7fb36b61c700 thread_name:msgr-worker-0
ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)
1: (()+0x11390) [0x7fb36f571390]
2: (gsignal()+0x38) [0x7fb36ecbe428]
3: (abort()+0x16a) [0x7fb36ecc002a]
4: (_gnu_cxx::_verbose_terminate_handler()+0x135) [0x7fb3702a7155]
5: (_cxxabiv1::_terminate(void (*)())+0x6) [0x7fb37029b136]
6: (()+0x8ad181) [0x7fb37029b181]
7: (()+0x91568e) [0x7fb37030368e]
8: (()+0x76ba) [0x7fb36f5676ba]
9: (clone()+0x6d) [0x7fb36ed9041d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Anonymous over 4 years ago
After trying to fix the server, SINGLE MDS setup now: -18> 2019-07-31 17:59:21.339 7f11df8bf700 4 mds.0.purge_queue operator(): open complete -17> 2019-07-31 17:59:21.339 7f11df8bf700 1 mds.0.journaler.pq(ro) set_writeable -16> 2019-07-31 17:59:21.343 7f11e8a80700 10 monclient: get_auth_request con 0x387a400 auth_method 0 -15> 2019-07-31 17:59:21.343 7f11de8bd700 1 mds.0.journaler.mdlog(ro) _finish_read_head loghead(trim 27370815750144, expire 27370817825265, write 27370928226441, stream_format 1). probing for end of log (from 27370928226441)... -14> 2019-07-31 17:59:21.343 7f11de8bd700 1 mds.0.journaler.mdlog(ro) probing for end of the log -13> 2019-07-31 17:59:21.427 7f11de8bd700 1 mds.0.journaler.mdlog(ro) _finish_probe_end write_pos = 27370928240437 (header had 27370928226441). recovered. -12> 2019-07-31 17:59:21.427 7f11de0bc700 4 mds.0.log Journal 0x200 recovered. -11> 2019-07-31 17:59:21.427 7f11de0bc700 4 mds.0.log Recovered journal 0x200 in format 1 -10> 2019-07-31 17:59:21.427 7f11de0bc700 2 mds.0.78866 Booting: 1: loading/discovering base inodes -9> 2019-07-31 17:59:21.427 7f11de0bc700 0 mds.0.cache creating system inode with ino:0x100 -8> 2019-07-31 17:59:21.427 7f11de0bc700 0 mds.0.cache creating system inode with ino:0x1 -7> 2019-07-31 17:59:21.427 7f11de8bd700 2 mds.0.78866 Booting: 2: replaying mds log -6> 2019-07-31 17:59:21.427 7f11de8bd700 2 mds.0.78866 Booting: 2: waiting for purge queue recovered -5> 2019-07-31 17:59:21.479 7f11e48c9700 4 mds.0.78866 handle_osd_map epoch 208882, 0 new blacklist entries -4> 2019-07-31 17:59:21.479 7f11e48c9700 10 monclient: _renew_subs -3> 2019-07-31 17:59:21.479 7f11e48c9700 10 monclient: _send_mon_message to mon.km-fsn-1-dc4-m1-797678 at v2:10.3.0.1:3300/0 -2> 2019-07-31 17:59:21.663 7f11dd0ba700 -1 log_channel(cluster) log [ERR] : ESession.replay sessionmap v 825175264 - 1 > table 0 -1> 2019-07-31 17:59:21.663 7f11dd0ba700 -1 /build/ceph-14.2.2/src/mds/journal.cc: In function 'virtual void ESession::replay(MDSRank*)' thread 7f11dd0ba700 time 2019-07-31 17:59:21.666728 /build/ceph-14.2.2/src/mds/journal.cc: 1655: FAILED ceph_assert(g_conf()->mds_wipe_sessions) ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f11ed133bb2] 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f11ed133d8d] 3: (ESession::replay(MDSRank*)+0xfa0) [0x809030] 4: (MDLog::_replay_thread()+0x892) [0x7a7432] 5: (MDLog::ReplayThread::entry()+0xd) [0x50ab6d] 6: (()+0x76ba) [0x7f11ec9cb6ba] 7: (clone()+0x6d) [0x7f11ec1f441d] 0> 2019-07-31 17:59:21.663 7f11dd0ba700 -1 *** Caught signal (Aborted) ** in thread 7f11dd0ba700 thread_name:md_log_replay ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable) 1: (()+0x11390) [0x7f11ec9d5390] 2: (gsignal()+0x38) [0x7f11ec122428] 3: (abort()+0x16a) [0x7f11ec12402a] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x7f11ed133c03] 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f11ed133d8d] 6: (ESession::replay(MDSRank*)+0xfa0) [0x809030] 7: (MDLog::_replay_thread()+0x892) [0x7a7432] 8: (MDLog::ReplayThread::entry()+0xd) [0x50ab6d] 9: (()+0x76ba) [0x7f11ec9cb6ba] 10: (clone()+0x6d) [0x7f11ec1f441d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 1/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 0 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 1 reserver 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 rgw_sync 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 4/ 5 memdb 1/ 5 kinetic 1/ 5 fuse 1/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-mds.km-fsn-1-dc4-m1-797678.log
Updated by Patrick Donnelly over 4 years ago
- Status changed from New to Rejected
Please seek help on ceph-users. Provide more information about your cluster and how the error came about.
Actions