Project

General

Profile

Bug #21222

MDS: standby-replay mds should avoid initiating subtree export

Added by Jianyu Li over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

For jewel-10.2.7 version, use two active mds and two related standby-replay mds.

When standby-replay replays the mdlog and trims its cache, it tries to re-export a empty subtree back to its origin place, below is the snip from online running log:
<snip>
...
2017-08-31 23:15:02.101027 7ff54b58a700 1 mds.1.0 standby_replay_restart (as standby)
2017-08-31 23:15:02.139591 7ff548d85700 7 mds.1.migrator export_empty_import [dir 1000005c621 /xxx/xxx_OfficialAccount/ [2,head] auth v=14131143 cv=0/0 dir_auth=1 state=1073741824 f(v0 m2017-08-31 00:01:20.733889 184=0+184) n(v919958 rc2017-08-31 23:11:32.712676 b5354147370371 61869748=61822461+47287) hs=0+0,ss=0+0 | child=0 subtree=1 dirty=0 0x7ff561640ff0]
2017-08-31 23:15:02.139650 7ff548d85700 7 mds.1.migrator really empty, exporting to -2
2017-08-31 23:15:02.139651 7ff548d85700 7 mds.1.migrator exporting to mds.-2 empty import [dir 1000005c621 /xxx/xxx_OfficialAccount/ [2,head] auth v=14131143 cv=0/0 dir_auth=1 state=1073741824 f(v0 m2017-08-31 00:01:20.733889 184=0+184) n(v919958 rc2017-08-31 23:11:32.712676 b5354147370371 61869748=61822461+47287) hs=0+0,ss=0+0 | child=0 subtree=1 dirty=0 0x7ff561640ff0]
2017-08-31 23:15:02.139682 7ff548d85700 7 mds.1.migrator export_dir [dir 1000005c621 /xxx/xxx_OfficialAccount/ [2,head] auth v=14131143 cv=0/0 dir_auth=1 state=1073741824 f(v0 m2017-08-31 00:01:20.733889 184=0+184) n(v919958 rc2017-08-31 23:11:32.712676 b5354147370371 61869748=61822461+47287) hs=0+0,ss=0+0 | child=0 subtree=1 dirty=0 0x7ff561640ff0] to -2
2017-08-31 23:15:02.139713 7ff548d85700 7 mds.1.migrator dispatch_export_dir mutation(0x7ff5648c9e00)
2017-08-31 23:15:02.160331 7ff546d81700 1 mds.1.0 replay_done (as standby)
...
</snip>

But as a standby mds, it couldn't finish the migration successfully, the above operation will be blocked forever, as shown by log:
<snip>
...
2017-09-01 01:23:02.602787 7ff54b58a700 0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 7680.463065 secs
2017-09-01 01:23:02.602794 7ff54b58a700 0 log_channel(cluster) log [WRN] : slow request 7680.463065 seconds old, received at 2017-08-31 23:15:02.139696: internal op exportdir:mds.1:1 currently requesting remote authpins
</snip>

Actually in order to ensure the correctness of whole system state, the standby mds shouldn't try to trigger any migration operation.


Related issues

Related to CephFS - Bug #21378: mds: up:stopping MDS cannot export directories Resolved 09/13/2017
Copied to CephFS - Backport #21322: luminous: MDS: standby-replay mds should avoid initiating subtree export Resolved

History

#1 Updated by Jianyu Li over 6 years ago

Although for the latest code in master branch, this issue could be avoided by the destination check in export_dir:

void Migrator::export_dir(CDir *dir, mds_rank_t dest) {
...

if (!mds->mdsmap->is_active(dest)) {
dout(7) << "dest not active, no exports for now" << dendl;
return;
}
...
}
due to the auth of parent inode is CDIR_AUTH_UNKNOWN(-2) for empty subtree root in standby-replay mds.

But it's more like a coincidence, we should prevent it from doing migration attempt more clearly, otherwise the above issue may appear again if the export target is an active mds, e.g. someone explicitly specify dest mds through export dir command:
ceph daemon mds.<standby_reply_mds> export dir /xxx <active_mds_id>

#2 Updated by Jianyu Li over 6 years ago

Here is a merge request for this bug fix: https://github.com/ceph/ceph/pull/17452, could you have a review? @Patrick

#3 Updated by Patrick Donnelly over 6 years ago

  • Status changed from New to Fix Under Review
  • Target version set to v12.2.1
  • Source set to Community (user)
  • Backport set to luminous

#4 Updated by Patrick Donnelly over 6 years ago

  • Status changed from Fix Under Review to Pending Backport

#5 Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #21322: luminous: MDS: standby-replay mds should avoid initiating subtree export added

#6 Updated by Patrick Donnelly over 6 years ago

  • Related to Bug #21378: mds: up:stopping MDS cannot export directories added

#9 Updated by Patrick Donnelly over 6 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF