Actions
Bug #48711
closedmds: standby-replay mds abort when replay metablob
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Ceph Version 14.2.15
OS: CentOS 7.6.1810
We create a fs that have three active mds, three standby-replay mds, three standby mds.
Create a dir and then export it with samba or nfs-ganesha for long IO testing. When run sometimes, the standby-replay mds will be crashed and the log list below:
-16> 2020-12-22 09:17:40.419 7fa2fcb1e700 5 mds.beacon.node181-0 Sending beacon up:standby-replay seq 82158 -15> 2020-12-22 09:17:40.420 7fa2fcb1e700 10 monclient: _send_mon_message to mon.node181 at v1:10.0.50.181:6789/0 -14> 2020-12-22 09:17:40.420 7fa302329700 5 mds.beacon.node181-0 received beacon reply up:standby-replay seq 82158 rtt 0.000999997 -13> 2020-12-22 09:17:40.618 7fa2fd31f700 5 mds.1.0 Restarting replay as standby-replay -12> 2020-12-22 09:17:40.619 7fa2f9317700 1 mds.147261.journaler.mdlog(ro) probing for end of the log -11> 2020-12-22 09:17:40.619 7fa2f9317700 1 mds.147261.journaler.mdlog(ro) _finish_reprobe new_end = 31909564363 (header had 31909185210). -10> 2020-12-22 09:17:40.619 7fa2f9317700 2 mds.1.0 Booting: 2: replaying mds log -9> 2020-12-22 09:17:40.646 7fa2f7b14700 0 mds.1.journal EMetaBlob.replay missing dir ino 0x1000009ba45 -8> 2020-12-22 09:17:40.646 7fa2f7b14700 -1 log_channel(cluster) log [ERR] : failure replaying journal (EMetaBlob) -7> 2020-12-22 09:17:40.646 7fa2f7b14700 5 mds.beacon.node181-0 set_want_state: up:standby-replay -> down:damaged -6> 2020-12-22 09:17:40.646 7fa2f7b14700 10 log_client log_queue is 1 last_log 1 sent 0 num 1 unsent 1 sending 1 -5> 2020-12-22 09:17:40.646 7fa2f7b14700 10 log_client will send 2020-12-22 09:17:40.647702 mds.node181-0 (mds.147261) 1 : cluster [ERR] failure replaying journal (EMetaBlob) -4> 2020-12-22 09:17:40.647 7fa2f7b14700 10 monclient: _send_mon_message to mon.node181 at v1:10.0.50.181:6789/0 -3> 2020-12-22 09:17:40.647 7fa2f7b14700 5 mds.beacon.node181-0 Sending beacon down:damaged seq 82159 -2> 2020-12-22 09:17:40.648 7fa2f7b14700 10 monclient: _send_mon_message to mon.node181 at v1:10.0.50.181:6789/0 -1> 2020-12-22 09:17:40.648 7fa302329700 5 mds.beacon.node181-0 received beacon reply down:damaged seq 82159 rtt 0.000999996 0> 2020-12-22 09:17:40.648 7fa2f7b14700 1 mds.node181-0 respawn! --- logging levels --- 0/ 5 none
It occurs two times in long IO testing.
The attach file is the completely log.
Files
Actions