Project

General

Profile

Actions

Bug #36477

open

mds: up:standbyreplay log replay falls behind up:active

Added by qinglong li over 5 years ago. Updated over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2018-10-17 17:02:31.952915 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215397 : cluster [INF] Standby daemon mds.ceph-mds-active-2 is not responding, dropping it
2018-10-17 17:02:46.470143 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215400 : cluster [INF] daemon mds.ceph-mds-replay-1 restarted
2018-10-17 17:04:09.836140 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215407 : cluster [INF] daemon mds.ceph-mds-replay-1 restarted
2018-10-17 17:05:33.734630 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215418 : cluster [INF] daemon mds.ceph-mds-replay-1 restarted
2018-10-17 17:05:57.502093 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215425 : cluster [INF] daemon mds.ceph-mds-active-2 restarted
2018-10-17 17:08:02.481141 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215478 : cluster [INF] daemon mds.ceph-mds-active-2 restarted
2018-10-17 17:08:25.524678 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215483 : cluster [INF] daemon mds.ceph-mds-replay-1 restarted
2018-10-17 17:08:28.350629 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215488 : cluster [INF] daemon mds.ceph-mds-active-2 restarted
2018-10-17 17:08:53.765715 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215495 : cluster [INF] daemon mds.ceph-mds-active-2 restarted
2018-10-17 17:09:19.716380 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215500 : cluster [INF] daemon mds.ceph-mds-active-2 restarted
2018-10-17 17:09:44.087301 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215505 : cluster [INF] daemon mds.ceph-mds-active-2 restarted
Actions #1

Updated by John Spray over 5 years ago

  • Project changed from Ceph to CephFS

Take a look at the log of the MDS that is restarting to see if it's saying why.

Actions #2

Updated by qinglong li over 5 years ago

The osd reply no such file or directory when the standby mds reads the journal object, for example the object 201.026cf408,I found the log from active and standby mds,as below:
Active mds:

2018-10-17 14:03:28.272 7f44b9fb3700  1 -- 11.3.112.38:6800/3712404392 --> 11.7.149.98:6832/3897644 -- osd_op(unknown.0.107197:87954096 7.61e 7:787c0b96:::201.026cf408:head [delete] snapc 0=[] ondisk+write+known_if_redirected+full_force e15459) v8 -- 0x55f2e7e6d080 con 0
2018-10-17 14:03:28.273 7f44c1fc3700  1 -- 11.3.112.38:6800/3712404392 <== osd.233 11.7.149.98:6832/3897644 123525 ==== osd_op_reply(87954096 201.026cf408 [delete] v15459'270441 uv270441 ondisk = 0) v8 ==== 156+0+0 (975714789 0 0) 0x55ebfe77b340 con 0x55e0e9a71e00

Standby mds:
2018-10-17 14:05:33.163 7efcc4749700  1 -- 11.3.112.68:6800/731143362 --> 11.7.149.98:6832/3897644 -- osd_op(unknown.0.0:72 7.61e 7:787c0b96:::201.026cf408:head [read 0~4194304 [fadvise_dontneed]] snapc 0=[] ondisk+read+known_if_redirected+full_force e15459) v8 -- 0x558bf41a4580 con 0
2018-10-17 14:05:33.163 7efcce75d700  1 -- 11.3.112.68:6800/731143362 <== osd.233 11.7.149.98:6832/3897644 4 ==== osd_op_reply(72 201.026cf408 [read 0~4194304 [fadvise_dontneed]] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 156+0+0 (2687212124 0 0) 0x558bf3f11b80 con 0x558bf416c000
2018-10-17 14:05:33.163 7efcc5f4c700  0 mds.531836.journaler.mdlog(ro) _finish_read got error -2

Actions #3

Updated by Patrick Donnelly over 5 years ago

  • Subject changed from hot standby mds restarted every few minutes(ceph version 13.2.1) to mds: hot standby mds restarted every few minutes
  • Target version set to v14.0.0
  • Start date deleted (10/17/2018)
  • Source set to Community (user)
  • Backport set to mimic,luminous
  • ceph-qa-suite deleted (fs)
  • Component(FS) MDS added
Actions #4

Updated by Patrick Donnelly over 5 years ago

  • Description updated (diff)
Actions #5

Updated by Patrick Donnelly over 5 years ago

  • Subject changed from mds: hot standby mds restarted every few minutes to mds: up:standbyreplay log replay falls behind up:active
Actions #6

Updated by Patrick Donnelly over 5 years ago

  • Assignee set to Zheng Yan
Actions #7

Updated by Patrick Donnelly about 5 years ago

  • Target version changed from v14.0.0 to v15.0.0
Actions #8

Updated by Patrick Donnelly over 4 years ago

  • Target version deleted (v15.0.0)
Actions #9

Updated by Patrick Donnelly over 3 years ago

  • Assignee deleted (Zheng Yan)
Actions

Also available in: Atom PDF