Testing out steps to recover a multiple active MDS file system after recovering monitor store using OSDs:
- Stop all MDSs
- create FSMap with defaults
fs new <fs> <metadata pool> <data pool> --force
- set MDS rank 0 to failed state to read in-RADOS metadata when starting up
fs reset <fs> --yes-i-really-mean-it
- set number of active MDS of recovered file system to 1
fs set <fs> joinable false
fs set <fs> max_mds 1
fs set <fs> joinable true
- Restart MDSs
- See that each MDS gets to rejoin state and dies, and the next MDS does the same. Expected one of the MDSs to become active, other MDSs become standby, and the file system be healthy.
-23> 2021-07-19T17:41:20.203-0400 7f924ed5c640 10 mds.d my gid is 4183
-22> 2021-07-19T17:41:20.203-0400 7f924ed5c640 10 mds.d map says I am mds.0.20 state up:rejoin
-21> 2021-07-19T17:41:20.203-0400 7f924ed5c640 10 mds.d msgr says I am [v2:192.168.0.10:6810/120552584,v1:192.168.0.10:6811/120552584]
-20> 2021-07-19T17:41:20.203-0400 7f924ed5c640 10 mds.d handle_mds_map: handling map as rank 0
-19> 2021-07-19T17:41:20.203-0400 7f924ed5c640 1 mds.0.20 handle_mds_map i am now mds.0.20
-18> 2021-07-19T17:41:20.203-0400 7f924ed5c640 1 mds.0.20 handle_mds_map state change up:reconnect --> up:rejoin
-17> 2021-07-19T17:41:20.203-0400 7f924ed5c640 1 mds.0.20 rejoin_start
-16> 2021-07-19T17:41:20.203-0400 7f924ed5c640 10 mds.0.cache rejoin_start
-15> 2021-07-19T17:41:20.203-0400 7f924ed5c640 10 mds.0.cache process_imported_caps
-14> 2021-07-19T17:41:20.203-0400 7f924ed5c640 10 mds.0.openfiles prefetch_inodes
-13> 2021-07-19T17:41:20.203-0400 7f924ed5c640 10 mds.0.openfiles _prefetch_inodes state 1
-12> 2021-07-19T17:41:20.203-0400 7f924ed5c640 10 mds.0.openfiles _prefetch_dirfrags
-11> 2021-07-19T17:41:20.203-0400 7f924ed5c640 10 mds.0.openfiles _prefetch_inodes state 3
-10> 2021-07-19T17:41:20.203-0400 7f924ed5c640 7 mds.0.cache trim_non_auth
-9> 2021-07-19T17:41:20.203-0400 7f924ed5c640 10 mds.0.cache number of subtrees = 12; not printing subtrees
-8> 2021-07-19T17:41:20.203-0400 7f924ed5c640 1 mds.0.20 rejoin_joint_start
-7> 2021-07-19T17:41:20.203-0400 7f924ed5c640 10 mds.0.cache rejoin_send_rejoins with recovery_set
-6> 2021-07-19T17:41:20.217-0400 7f9252d64640 1 -- [v2:192.168.0.10:6810/120552584,v1:192.168.0.10:6811/120552584] <== mon.0 v2:192.168.0.10:40360/0 21 ==== mdsbeacon(4183/d up:rejoin seq=5 v22) v8 ==== 130+0+0 (secure 0 0 0) 0x55781989b600 con 0x557814b83600
-5> 2021-07-19T17:41:20.217-0400 7f9252d64640 5 mds.beacon.d received beacon reply up:rejoin seq 5 rtt 1.046
-4> 2021-07-19T17:41:20.220-0400 7f924ed5c640 -1 ../src/mds/MDCache.cc: In function 'void MDCache::rejoin_send_rejoins()' thread 7f924ed5c640 time 2021-07-19T17:41:20.204686-0400
../src/mds/MDCache.cc: 4074: FAILED ceph_assert(auth >= 0)
ceph version 17.0.0-5874-ga0a8ba5087f (a0a8ba5087f0b82588860cda188dfdb48a964771) quincy (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x19d) [0x7f925563ce17]
2: /home/rraja/ceph/build/lib/libceph-common.so.2(+0x1641099) [0x7f925563d099]
3: (MDCache::rejoin_send_rejoins()+0xd0f) [0x5578119b409b]
4: (MDSRank::rejoin_joint_start()+0x133) [0x5578117e6827]
5: (MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)+0x19a7) [0x5578117eb395]
6: (MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)+0x1d3b) [0x5578117b9209]
7: (MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0x50d) [0x5578117bb0d7]
8: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x19a) [0x5578117ba9f4]
9: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xe9) [0x7f92558236cf]
10: (DispatchQueue::entry()+0x626) [0x7f9255822116]
11: (DispatchQueue::DispatchThread::entry()+0x1c) [0x7f92559a3a74]
12: (Thread::entry_wrapper()+0x83) [0x7f92555fd551]
13: (Thread::_entry_func(void*)+0x18) [0x7f92555fd4c4]
14: /lib64/libpthread.so.0(+0x93f9) [0x7f9253c443f9]
15: clone()
-3> 2021-07-19T17:41:20.226-0400 7f924c557640 20 mgrc operator() sending 126 counters (of possible 379), 0 new, 0 removed
-2> 2021-07-19T17:41:20.233-0400 7f924c557640 20 mgrc _send_report encoded 1494 bytes
-1> 2021-07-19T17:41:20.233-0400 7f924c557640 1 -- [v2:192.168.0.10:6810/120552584,v1:192.168.0.10:6811/120552584] --> [v2:192.168.0.10:6800/1709313,v1:192.168.0.10:6801/1709313] -- mgrreport(unknown.d +0-0 packed 1494 task_status=0) v9 -- 0x557814b71c00 con 0x557814c74480
0> 2021-07-19T17:41:20.233-0400 7f924ed5c640 -1 *** Caught signal (Aborted) **
in thread 7f924ed5c640 thread_name:ms_dispatch
ceph version 17.0.0-5874-ga0a8ba5087f (a0a8ba5087f0b82588860cda188dfdb48a964771) quincy (dev)
1: /home/rraja/ceph/build/bin/ceph-mds(+0x1216fd2) [0x557811e2bfd2]
2: /lib64/libpthread.so.0(+0x141e0) [0x7f9253c4f1e0]
3: gsignal()
4: abort()
5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x36c) [0x7f925563cfe6]
6: /home/rraja/ceph/build/lib/libceph-common.so.2(+0x1641099) [0x7f925563d099]
7: (MDCache::rejoin_send_rejoins()+0xd0f) [0x5578119b409b]
8: (MDSRank::rejoin_joint_start()+0x133) [0x5578117e6827]
9: (MDSRankDispatcher::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&, MDSMap const&)+0x19a7) [0x5578117eb395]
10: (MDSDaemon::handle_mds_map(boost::intrusive_ptr<MMDSMap const> const&)+0x1d3b) [0x5578117b9209]
11: (MDSDaemon::handle_core_message(boost::intrusive_ptr<Message const> const&)+0x50d) [0x5578117bb0d7]
12: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x19a) [0x5578117ba9f4]
13: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0xe9) [0x7f92558236cf]
14: (DispatchQueue::entry()+0x626) [0x7f9255822116]
15: (DispatchQueue::DispatchThread::entry()+0x1c) [0x7f92559a3a74]
16: (Thread::entry_wrapper()+0x83) [0x7f92555fd551]
17: (Thread::_entry_func(void*)+0x18) [0x7f92555fd4c4]
18: /lib64/libpthread.so.0(+0x93f9) [0x7f9253c443f9]
19: clone()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
- Also, on Patrick's suggestion, made sure that file system was not mounted before stopping all MDSs, removing the filesystem/FSMap, and testing the above recovery steps. But still end up with the same crash of MDSs.