Project

General

Profile

Bug #36477

mds: up:standbyreplay log replay falls behind up:active

Added by qinglong li 2 months ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:

Description

2018-10-17 17:02:31.952915 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215397 : cluster [INF] Standby daemon mds.ceph-mds-active-2 is not responding, dropping it
2018-10-17 17:02:46.470143 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215400 : cluster [INF] daemon mds.ceph-mds-replay-1 restarted
2018-10-17 17:04:09.836140 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215407 : cluster [INF] daemon mds.ceph-mds-replay-1 restarted
2018-10-17 17:05:33.734630 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215418 : cluster [INF] daemon mds.ceph-mds-replay-1 restarted
2018-10-17 17:05:57.502093 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215425 : cluster [INF] daemon mds.ceph-mds-active-2 restarted
2018-10-17 17:08:02.481141 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215478 : cluster [INF] daemon mds.ceph-mds-active-2 restarted
2018-10-17 17:08:25.524678 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215483 : cluster [INF] daemon mds.ceph-mds-replay-1 restarted
2018-10-17 17:08:28.350629 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215488 : cluster [INF] daemon mds.ceph-mds-active-2 restarted
2018-10-17 17:08:53.765715 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215495 : cluster [INF] daemon mds.ceph-mds-active-2 restarted
2018-10-17 17:09:19.716380 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215500 : cluster [INF] daemon mds.ceph-mds-active-2 restarted
2018-10-17 17:09:44.087301 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215505 : cluster [INF] daemon mds.ceph-mds-active-2 restarted

History

#1 Updated by John Spray about 2 months ago

  • Project changed from Ceph to fs

Take a look at the log of the MDS that is restarting to see if it's saying why.

#2 Updated by qinglong li about 2 months ago

The osd reply no such file or directory when the standby mds reads the journal object, for example the object 201.026cf408,I found the log from active and standby mds,as below:
Active mds:

2018-10-17 14:03:28.272 7f44b9fb3700  1 -- 11.3.112.38:6800/3712404392 --> 11.7.149.98:6832/3897644 -- osd_op(unknown.0.107197:87954096 7.61e 7:787c0b96:::201.026cf408:head [delete] snapc 0=[] ondisk+write+known_if_redirected+full_force e15459) v8 -- 0x55f2e7e6d080 con 0
2018-10-17 14:03:28.273 7f44c1fc3700  1 -- 11.3.112.38:6800/3712404392 <== osd.233 11.7.149.98:6832/3897644 123525 ==== osd_op_reply(87954096 201.026cf408 [delete] v15459'270441 uv270441 ondisk = 0) v8 ==== 156+0+0 (975714789 0 0) 0x55ebfe77b340 con 0x55e0e9a71e00

Standby mds:
2018-10-17 14:05:33.163 7efcc4749700  1 -- 11.3.112.68:6800/731143362 --> 11.7.149.98:6832/3897644 -- osd_op(unknown.0.0:72 7.61e 7:787c0b96:::201.026cf408:head [read 0~4194304 [fadvise_dontneed]] snapc 0=[] ondisk+read+known_if_redirected+full_force e15459) v8 -- 0x558bf41a4580 con 0
2018-10-17 14:05:33.163 7efcce75d700  1 -- 11.3.112.68:6800/731143362 <== osd.233 11.7.149.98:6832/3897644 4 ==== osd_op_reply(72 201.026cf408 [read 0~4194304 [fadvise_dontneed]] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 156+0+0 (2687212124 0 0) 0x558bf3f11b80 con 0x558bf416c000
2018-10-17 14:05:33.163 7efcc5f4c700  0 mds.531836.journaler.mdlog(ro) _finish_read got error -2

#3 Updated by Patrick Donnelly about 2 months ago

  • Subject changed from hot standby mds restarted every few minutes(ceph version 13.2.1) to mds: hot standby mds restarted every few minutes
  • Target version set to v14.0.0
  • Start date deleted (10/17/2018)
  • Source set to Community (user)
  • Backport set to mimic,luminous
  • ceph-qa-suite deleted (fs)
  • Component(FS) MDS added

#4 Updated by Patrick Donnelly about 2 months ago

  • Description updated (diff)

#5 Updated by Patrick Donnelly about 2 months ago

  • Subject changed from mds: hot standby mds restarted every few minutes to mds: up:standbyreplay log replay falls behind up:active

#6 Updated by Patrick Donnelly about 2 months ago

  • Assignee set to Zheng Yan

Also available in: Atom PDF