Project

General

Profile

Bug #24856

mds may get discontinuous mdsmap

Added by Zheng Yan 5 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
07/10/2018
Due date:
% Done:

0%

Source:
Development
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:

Description

steps to reproduce. two active mds, no standby

- gdb attach mds.0, wait until 'ceph w' shows it's laggy
restart mds.1, wait until it enter resolve state.
- continue execution of mds.0
- check log mds.0


Related issues

Related to fs - Bug #25228: mds: recovering mds receive export_cancel message Resolved 08/02/2018
Copied to fs - Backport #25047: mimic: mds may get discontinuous mdsmap Resolved
Copied to fs - Backport #25048: luminous: mds may get discontinuous mdsmap Resolved

History

#1 Updated by Patrick Donnelly 5 months ago

  • Subject changed from mds may get discontinuous mdsmap to mds: may get discontinuous mdsmap
  • Assignee set to Zheng Yan
  • Priority changed from Normal to Urgent
  • Target version set to v14.0.0
  • Source set to Development
  • Backport set to mimic,luminous
  • Component(FS) MDS added

#2 Updated by Zheng Yan 5 months ago

  • Subject changed from mds: may get discontinuous mdsmap to mds may get discontinuous mdsmap
  • Status changed from New to Need Review
  • Assignee deleted (Zheng Yan)
  • Priority changed from Urgent to Normal
  • Target version deleted (v14.0.0)
  • Source deleted (Development)
  • Component(FS) deleted (MDS)

#3 Updated by Patrick Donnelly 5 months ago

  • Assignee set to Zheng Yan
  • Priority changed from Normal to Urgent
  • Target version set to v14.0.0
  • Source set to Development
  • Component(FS) MDS added

#4 Updated by Zheng Yan 5 months ago

steps to reproduce mds stuck at reconnect state for mimic and later. (snap cache code gets confused)

- 2 active mds, assign mds to fixed ranks (mds_standby_for_rank config), no standby mds
- gdb attach mds.0, wait until 'ceph -w' shows it's laggy
- restart mds.1, wait until it enter resolve state.
- continue execution of mds.0

steps to reproduce mds stuck at resolve state for luminous. (Migrator::{import_reverse_discovering,import_reverse_discovered} do not call maybe_send_pending_resolves)

- 2 active mds, assign mds to fixed ranks (mds_standby_for_rank config), no standby mds,
- mkdir 1; setfattr -n ceph.dir.pin -v 1 1
- gdb attach mds.0, make break point at Migrator::handle_export_discover, continue execution
- setfattr -n ceph.dir.pin -v 0 1, wait until above break point gets triggered
- restart mds.1, wait until it enter resolve state.
- continue execution of mds.0

#5 Updated by Patrick Donnelly 5 months ago

  • Status changed from Need Review to Pending Backport

#6 Updated by Patrick Donnelly 5 months ago

#7 Updated by Patrick Donnelly 5 months ago

  • Copied to Backport #25048: luminous: mds may get discontinuous mdsmap added

#8 Updated by Patrick Donnelly 5 months ago

  • Related to Bug #25228: mds: recovering mds receive export_cancel message added

#9 Updated by Nathan Cutler 4 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF