Project

General

Profile

Actions

Bug #15156

closed

mds stuck in clientreplay

Added by Denis kaganovich about 8 years ago. Updated over 7 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

3 mds, standby-replay mode,

1) active (b) - also question (usual state):
2016-03-16 04:16:48.713071 7f6752cd2700 1 mds.b handle_mds_map i (10.227.227.101:6800/28799) dne in the mdsmap, respawning myself
2016-03-16 04:16:48.713098 7f6752cd2700 1 mds.b respawn

2) (c) - finally killed by me:
2016-03-16 04:16:31.922030 7f70c33d6700 1 mds.0.0 replay_done (as standby)
2016-03-16 04:16:32.922104 7f8f477ff700 1 mds.0.0 standby_replay_restart (as standby)
2016-03-16 04:16:48.635639 7f8f4a8c6700 1 mds.0.346 handle_mds_map i am now mds.0.346
2016-03-16 04:16:48.635642 7f8f4a8c6700 1 mds.0.346 handle_mds_map state change up:standby-replay --> up:replay
2016-03-16 04:16:50.919522 7f70c2bd5700 1 mds.0.346 replay_done (as standby)
2016-03-16 04:16:50.919530 7f70c2bd5700 1 mds.0.346 standby_replay_restart (final takeover pass)
2016-03-16 04:16:50.921496 7f8f44ffe700 1 mds.0.346 replay_done
2016-03-16 04:16:50.921499 7f8f44ffe700 1 mds.0.346 making mds journal writeable
2016-03-16 04:16:51.238121 7f8f4a8c6700 1 mds.0.346 handle_mds_map i am now mds.0.346
2016-03-16 04:16:51.238127 7f8f4a8c6700 1 mds.0.346 handle_mds_map state change up:replay --> up:reconnect
2016-03-16 04:16:51.238147 7f8f4a8c6700 1 mds.0.346 reconnect_start
2016-03-16 04:16:51.238150 7f8f4a8c6700 1 mds.0.346 reopen_log
2016-03-16 04:16:51.238165 7f8f4a8c6700 1 mds.0.server reconnect_clients -- 8 sessions
2016-03-16 04:16:51.305027 7f8f4a8c6700 0 log_channel(cluster) log [DBG] : reconnect by client.1374131 10.227.227.103:0/255555702 after 0.066820
2016-03-16 04:16:51.323679 7f8f4a8c6700 0 log_channel(cluster) log [DBG] : reconnect by client.1326992 10.227.227.200:0/2227338088 after 0.085495
2016-03-16 04:16:51.368321 7f8f4a8c6700 0 log_channel(cluster) log [DBG] : reconnect by client.1349805 10.227.227.108:0/2239800373 after 0.130136
2016-03-16 04:16:51.380533 7f8f4a8c6700 0 log_channel(cluster) log [DBG] : reconnect by client.1343818 10.227.227.104:0/3991639707 after 0.142353
2016-03-16 04:16:51.399929 7f8f4a8c6700 0 log_channel(cluster) log [DBG] : reconnect by client.1404102 10.227.227.101:0/2561368501 after 0.161751
2016-03-16 04:16:52.513141 7f8f4a8c6700 0 log_channel(cluster) log [DBG] : reconnect by client.1493824 10.227.227.205:0/3192443660 after 1.274951
2016-03-16 04:16:53.731586 7f8f4a8c6700 0 log_channel(cluster) log [DBG] : reconnect by client.1334383 10.227.227.109:0/2782514177 after 2.493360
2016-03-16 04:16:55.251840 7f8f4a8c6700 0 log_channel(cluster) log [DBG] : reconnect by client.1474629 10.227.227.204:0/116078108 after 4.013644
2016-03-16 04:16:56.117750 7f8f4a8c6700 1 mds.0.346 reconnect_done
2016-03-16 04:16:57.158544 7f8f4a8c6700 1 mds.0.346 handle_mds_map i am now mds.0.346
2016-03-16 04:16:57.158549 7f8f4a8c6700 1 mds.0.346 handle_mds_map state change up:reconnect --> up:rejoin
2016-03-16 04:16:57.158559 7f8f4a8c6700 1 mds.0.346 rejoin_start
2016-03-16 04:16:57.631866 7f8f4a8c6700 1 mds.0.346 rejoin_joint_start
2016-03-16 04:17:41.130462 7f8f457ff700 1 mds.0.346 rejoin_done
2016-03-16 04:17:41.671241 7f8f4a8c6700 1 mds.0.346 handle_mds_map i am now mds.0.346
2016-03-16 04:17:41.671255 7f8f4a8c6700 1 mds.0.346 handle_mds_map state change up:rejoin --> up:clientreplay
2016-03-16 04:17:41.671266 7f8f4a8c6700 1 mds.0.346 recovery_done -- successful recovery!
2016-03-16 04:17:41.671469 7f8f4a8c6700 1 mds.0.346 clientreplay_start
2016-03-16 04:17:42.632784 7f8f477ff700 0 log_channel(cluster) log [WRN] : 2 slow requests, 2 included below; oldest blocked for > 51.012003 secs
2016-03-16 04:17:42.632809 7f8f477ff700 0 log_channel(cluster) log [WRN] : slow request 51.012003 seconds old, received at 2016-03-16 04:16:51.620712: client_request(client.1493824:114711 create #100001d0270/dovecot.index.log.lock 2016-03-16 04:15:20.082935 RETRY=1 REPLAY) currently submit entry: journal_and_reply
2016-03-16 04:17:42.632816 7f8f477ff700 0 log_channel(cluster) log [WRN] : slow request 51.011956 seconds old, received at 2016-03-16 04:16:51.620759: client_request(client.1493824:114713 unlink #100001d0270/dovecot.index.log.lock 2016-03-16 04:15:20.082935 RETRY=1 REPLAY) currently initiated
2016-03-16 04:17:43.466622 7f8f477ff700 0 log_channel(cluster) log [INF] : closing stale session client.1493824 10.227.227.205:0/3192443660 after 30156.096445
2016-03-16 04:17:43.467018 7f8f477ff700 0 log_channel(cluster) log [INF] : closing stale session client.1404102 10.227.227.101:0/2561368501 after 30156.096453
2016-03-16 04:17:43.467031 7f8f477ff700 0 log_channel(cluster) log [INF] : closing stale session client.1374131 10.227.227.103:0/255555702 after 30156.096457
2016-03-16 04:17:43.467043 7f8f477ff700 0 log_channel(cluster) log [INF] : closing stale session client.1326992 10.227.227.200:0/2227338088 after 30156.096492
2016-03-16 04:17:43.467076 7f8f477ff700 0 log_channel(cluster) log [INF] : closing stale session client.1334383 10.227.227.109:0/2782514177 after 30156.096475
2016-03-16 04:17:43.467088 7f8f477ff700 0 log_channel(cluster) log [INF] : closing stale session client.1343818 10.227.227.104:0/3991639707 after 30156.096471
2016-03-16 04:17:43.467102 7f8f477ff700 0 log_channel(cluster) log [INF] : closing stale session client.1349805 10.227.227.108:0/2239800373 after 30156.096463
2016-03-16 04:17:43.467117 7f8f477ff700 0 log_channel(cluster) log [INF] : closing stale session client.1474629 10.227.227.204:0/116078108 after 20346.398960
2016-03-16 04:17:50.087931 7f7f0e393700 0 -- 10.227.227.104:6800/2381 >> 10.227.227.108:0/2239800373 pipe(0x7f711adf4200 sd=32 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f71f70e2280).accept we reset (peer sent cseq 2), sending RESETSESSION
2016-03-16 04:17:50.088074 7f8915743700 0 -- 10.227.227.104:6800/2381 >> 10.227.227.104:0/3991639707 pipe(0x7f70d7f85c00 sd=27 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f71f70e1e00).accept we reset (peer sent cseq 2), sending RESETSESSION
2016-03-16 04:17:50.088188 7f83a95ab700 0 -- 10.227.227.104:6800/2381 >> 10.227.227.103:0/255555702 pipe(0x7f70d7f91f40 sd=30 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f71f70e1f80).accept we reset (peer sent cseq 2), sending RESETSESSION
2016-03-16 04:17:50.088299 7f82f99eb700 0 -- 10.227.227.104:6800/2381 >> 10.227.227.101:0/2561368501 pipe(0x7f70d7f8b900 sd=31 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f71f70e2100).accept we reset (peer sent cseq 2), sending RESETSESSION
2016-03-16 04:17:50.089152 7f8f4a8c6700 1 mds.0.server no longer in reconnect state, ignoring reconnect, sending close
2016-03-16 04:17:50.089192 7f8f4a8c6700 0 log_channel(cluster) log [INF] : denied reconnect attempt (mds is up:clientreplay) from client.1349805 10.227.227.108:0/2239800373 after 58.850986 (allowed interval 45)
2016-03-16 04:17:50.089398 7f7e26b60700 0 -- 10.227.227.104:6800/2381 >> 10.227.227.108:0/2239800373 pipe(0x7f719fbaf4c0 sd=33 :6800 s=2 pgs=389 cs=1 l=0 c=0x7f71f70e2400).fault, server, going to standby
2016-03-16 04:17:50.532120 7f7d8657c700 0 -- 10.227.227.104:6800/2381 >> 10.227.227.204:0/116078108 pipe(0x7f711adf4140 sd=32 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f71f70e2580).accept we reset (peer sent cseq 2), sending RESETSESSION
2016-03-16 04:17:51.887118 7f8f4a8c6700 1 mds.0.server no longer in reconnect state, ignoring reconnect, sending close
2016-03-16 04:17:51.887143 7f8f4a8c6700 0 log_channel(cluster) log [INF] : denied reconnect attempt (mds is up:clientreplay) from client.1474629 10.227.227.204:0/116078108 after 60.648953 (allowed interval 45)
2016-03-16 04:17:51.890531 7f7d8657c700 0 -- 10.227.227.104:6800/2381 >> 10.227.227.204:0/116078108 pipe(0x7f711adf4280 sd=32 :6800 s=2 pgs=457 cs=1 l=0 c=0x7f71f70e2700).fault, server, going to standby
2016-03-16 04:17:52.836210 7f8f447fd700 0 -- 10.227.227.104:6800/2381 >> 10.227.227.109:0/2782514177 pipe(0x7f719fbb53c0 sd=34 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f72db27ea80).accept we reset (peer sent cseq 2), sending RESETSESSION
2016-03-16 04:17:52.837407 7f8f4a8c6700 1 mds.0.server no longer in reconnect state, ignoring reconnect, sending close
2016-03-16 04:17:52.837427 7f8f4a8c6700 0 log_channel(cluster) log [INF] : denied reconnect attempt (mds is up:clientreplay) from client.1334383 10.227.227.109:0/2782514177 after 61.599242 (allowed interval 45)
2016-03-16 04:17:52.837644 7f8cd716a700 0 -- 10.227.227.104:6800/2381 >> 10.227.227.109:0/2782514177 pipe(0x7f70d7f97e40 sd=35 :6800 s=2 pgs=1833 cs=1 l=0 c=0x7f72db27ec00).fault, server, going to standby
2016-03-16 04:17:52.860012 7f850d990700 0 -- 10.227.227.104:6800/2381 >> 10.227.227.205:0/3192443660 pipe(0x7f719fbb5cc0 sd=34 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f72db27ed80).accept we reset (peer sent cseq 2), sending RESETSESSION
2016-03-16 04:17:52.986761 7f850d990700 0 -- 10.227.227.104:6800/2381 >> 10.227.227.200:0/2227338088 pipe(0x7f719fbb5a40 sd=34 :6800 s=0 pgs=0 cs=0 l=0 c=0x7f72db27ef00).accept we reset (peer sent cseq 2), sending RESETSESSION
2016-03-16 04:17:52.988306 7f8f4a8c6700 1 mds.0.server no longer in reconnect state, ignoring reconnect, sending close
2016-03-16 04:17:52.988321 7f8f4a8c6700 0 log_channel(cluster) log [INF] : denied reconnect attempt (mds is up:clientreplay) from client.1326992 10.227.227.200:0/2227338088 after 61.750140 (allowed interval 45)
2016-03-16 04:17:52.988589 7f850d990700 0 -- 10.227.227.104:6800/2381 >> 10.227.227.200:0/2227338088 pipe(0x7f719fbb5e40 sd=34 :6800 s=2 pgs=861 cs=1 l=0 c=0x7f72db27f080).fault, server, going to standby
2016-03-16 04:17:53.245652 7f82f99eb700 0 -- 10.227.227.104:6800/2381 >> 10.227.227.101:0/2561368501 pipe(0x7f70d7f8b900 sd=31 :6800 s=2 pgs=746 cs=1 l=0 c=0x7f71f70e2100).reader missed message? skipped from seq 0 to 1532953441
2016-03-16 04:17:54.040768 7f8f4a8c6700 1 mds.0.server no longer in reconnect state, ignoring reconnect, sending close
2016-03-16 04:17:54.040792 7f8f4a8c6700 0 log_channel(cluster) log [INF] : denied reconnect attempt (mds is up:clientreplay) from client.1493824 10.227.227.205:0/3192443660 after 62.802603 (allowed interval 45)
2016-03-16 04:17:54.043070 7f7ee0da5700 0 -- 10.227.227.104:6800/2381 >> 10.227.227.205:0/3192443660 pipe(0x7f70d7f9d140 sd=36 :6800 s=2 pgs=1184 cs=1 l=0 c=0x7f72db27f200).fault, server, going to standby
2016-03-16 04:18:01.495332 7f8915743700 0 -- 10.227.227.104:6800/2381 >> 10.227.227.104:0/3991639707 pipe(0x7f70d7f85c00 sd=27 :6800 s=2 pgs=669 cs=1 l=0 c=0x7f71f70e1e00).reader missed message? skipped from seq 0 to 1061258990
2016-03-16 04:18:05.016983 7f83a95ab700 0 -- 10.227.227.104:6800/2381 >> 10.227.227.103:0/255555702 pipe(0x7f70d7f91f40 sd=30 :6800 s=2 pgs=779 cs=1 l=0 c=0x7f71f70e1f80).reader missed message? skipped from seq 0 to 1826659341
2016-03-16 08:07:40.155512 7f8f467fd700 -1 mds.c * got signal Terminated *

Actions

Also available in: Atom PDF