Project

General

Profile

Bug #24856

mds may get discontinuous mdsmap

Added by Zheng Yan over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

steps to reproduce. two active mds, no standby

- gdb attach mds.0, wait until 'ceph w' shows it's laggy
restart mds.1, wait until it enter resolve state.
- continue execution of mds.0
- check log mds.0


Related issues

Related to CephFS - Bug #25228: mds: recovering mds receive export_cancel message Resolved 08/02/2018
Copied to CephFS - Backport #25047: mimic: mds may get discontinuous mdsmap Resolved
Copied to CephFS - Backport #25048: luminous: mds may get discontinuous mdsmap Resolved

History

#1 Updated by Patrick Donnelly over 5 years ago

  • Subject changed from mds may get discontinuous mdsmap to mds: may get discontinuous mdsmap
  • Assignee set to Zheng Yan
  • Priority changed from Normal to Urgent
  • Target version set to v14.0.0
  • Source set to Development
  • Backport set to mimic,luminous
  • Component(FS) MDS added

#2 Updated by Zheng Yan over 5 years ago

  • Subject changed from mds: may get discontinuous mdsmap to mds may get discontinuous mdsmap
  • Status changed from New to Fix Under Review
  • Assignee deleted (Zheng Yan)
  • Priority changed from Urgent to Normal
  • Target version deleted (v14.0.0)
  • Source deleted (Development)
  • Component(FS) deleted (MDS)

#3 Updated by Patrick Donnelly over 5 years ago

  • Assignee set to Zheng Yan
  • Priority changed from Normal to Urgent
  • Target version set to v14.0.0
  • Source set to Development
  • Component(FS) MDS added

#4 Updated by Zheng Yan over 5 years ago

steps to reproduce mds stuck at reconnect state for mimic and later. (snap cache code gets confused)

- 2 active mds, assign mds to fixed ranks (mds_standby_for_rank config), no standby mds
- gdb attach mds.0, wait until 'ceph -w' shows it's laggy
- restart mds.1, wait until it enter resolve state.
- continue execution of mds.0

steps to reproduce mds stuck at resolve state for luminous. (Migrator::{import_reverse_discovering,import_reverse_discovered} do not call maybe_send_pending_resolves)

- 2 active mds, assign mds to fixed ranks (mds_standby_for_rank config), no standby mds,
- mkdir 1; setfattr -n ceph.dir.pin -v 1 1
- gdb attach mds.0, make break point at Migrator::handle_export_discover, continue execution
- setfattr -n ceph.dir.pin -v 0 1, wait until above break point gets triggered
- restart mds.1, wait until it enter resolve state.
- continue execution of mds.0

#5 Updated by Patrick Donnelly over 5 years ago

  • Status changed from Fix Under Review to Pending Backport

#6 Updated by Patrick Donnelly over 5 years ago

#7 Updated by Patrick Donnelly over 5 years ago

  • Copied to Backport #25048: luminous: mds may get discontinuous mdsmap added

#8 Updated by Patrick Donnelly over 5 years ago

  • Related to Bug #25228: mds: recovering mds receive export_cancel message added

#9 Updated by Nathan Cutler over 5 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF