Actions
Bug #36477
openmds: up:standbyreplay log replay falls behind up:active
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2018-10-17 17:02:31.952915 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215397 : cluster [INF] Standby daemon mds.ceph-mds-active-2 is not responding, dropping it 2018-10-17 17:02:46.470143 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215400 : cluster [INF] daemon mds.ceph-mds-replay-1 restarted 2018-10-17 17:04:09.836140 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215407 : cluster [INF] daemon mds.ceph-mds-replay-1 restarted 2018-10-17 17:05:33.734630 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215418 : cluster [INF] daemon mds.ceph-mds-replay-1 restarted 2018-10-17 17:05:57.502093 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215425 : cluster [INF] daemon mds.ceph-mds-active-2 restarted 2018-10-17 17:08:02.481141 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215478 : cluster [INF] daemon mds.ceph-mds-active-2 restarted 2018-10-17 17:08:25.524678 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215483 : cluster [INF] daemon mds.ceph-mds-replay-1 restarted 2018-10-17 17:08:28.350629 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215488 : cluster [INF] daemon mds.ceph-mds-active-2 restarted 2018-10-17 17:08:53.765715 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215495 : cluster [INF] daemon mds.ceph-mds-active-2 restarted 2018-10-17 17:09:19.716380 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215500 : cluster [INF] daemon mds.ceph-mds-active-2 restarted 2018-10-17 17:09:44.087301 mon.A01-R21-I148-3-4000114 mon.0 11.7.148.3:6789/0 215505 : cluster [INF] daemon mds.ceph-mds-active-2 restarted
Updated by John Spray over 5 years ago
- Project changed from Ceph to CephFS
Take a look at the log of the MDS that is restarting to see if it's saying why.
Updated by qinglong li over 5 years ago
The osd reply no such file or directory when the standby mds reads the journal object, for example the object 201.026cf408,I found the log from active and standby mds,as below:
Active mds:
2018-10-17 14:03:28.272 7f44b9fb3700 1 -- 11.3.112.38:6800/3712404392 --> 11.7.149.98:6832/3897644 -- osd_op(unknown.0.107197:87954096 7.61e 7:787c0b96:::201.026cf408:head [delete] snapc 0=[] ondisk+write+known_if_redirected+full_force e15459) v8 -- 0x55f2e7e6d080 con 0 2018-10-17 14:03:28.273 7f44c1fc3700 1 -- 11.3.112.38:6800/3712404392 <== osd.233 11.7.149.98:6832/3897644 123525 ==== osd_op_reply(87954096 201.026cf408 [delete] v15459'270441 uv270441 ondisk = 0) v8 ==== 156+0+0 (975714789 0 0) 0x55ebfe77b340 con 0x55e0e9a71e00
Standby mds:
2018-10-17 14:05:33.163 7efcc4749700 1 -- 11.3.112.68:6800/731143362 --> 11.7.149.98:6832/3897644 -- osd_op(unknown.0.0:72 7.61e 7:787c0b96:::201.026cf408:head [read 0~4194304 [fadvise_dontneed]] snapc 0=[] ondisk+read+known_if_redirected+full_force e15459) v8 -- 0x558bf41a4580 con 0 2018-10-17 14:05:33.163 7efcce75d700 1 -- 11.3.112.68:6800/731143362 <== osd.233 11.7.149.98:6832/3897644 4 ==== osd_op_reply(72 201.026cf408 [read 0~4194304 [fadvise_dontneed]] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) v8 ==== 156+0+0 (2687212124 0 0) 0x558bf3f11b80 con 0x558bf416c000 2018-10-17 14:05:33.163 7efcc5f4c700 0 mds.531836.journaler.mdlog(ro) _finish_read got error -2
Updated by Patrick Donnelly over 5 years ago
- Subject changed from hot standby mds restarted every few minutes(ceph version 13.2.1) to mds: hot standby mds restarted every few minutes
- Target version set to v14.0.0
- Start date deleted (
10/17/2018) - Source set to Community (user)
- Backport set to mimic,luminous
- ceph-qa-suite deleted (
fs) - Component(FS) MDS added
Updated by Patrick Donnelly over 5 years ago
- Subject changed from mds: hot standby mds restarted every few minutes to mds: up:standbyreplay log replay falls behind up:active
Updated by Patrick Donnelly about 5 years ago
- Target version changed from v14.0.0 to v15.0.0
Actions