Project

General

Profile

Bug #20376

last_epoch_(over|under) in MDBalancer should be updated if mds0 has failed

Added by Jianyu Li almost 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When mds0 has failed and started up again, it will reset beat_epoch to zero. In this case, other MDSes should update their last_epoch_(over|under) states, otherwise these stale state will block their balance attempts. Especially when previous last_epoch_under is very large, e.g. the mds cluster has run for a long time, it means balance activities will be delayed for a long time for the new beat_epoch catches up previous last_epoch_under:

// am i over long enough?
if (last_epoch_under && beat_epoch - last_epoch_under < 2) {
dout(5) << " i am overloaded, but only for " << (beat_epoch - last_epoch_under) << " epochs" << dendl;
return;
}
Here is a snip from the actual log which exposes this problem:
[ceph@c152 /var/log/ceph]$ grep 'i am overloaded, but only for' ceph-mds.c152.log-20170621
...
2017-06-20 22:08:03.964654 7f220598a700 5 mds.1.bal i am overloaded, but only for -79 epochs
2017-06-20 22:08:13.964962 7f220598a700 5 mds.1.bal i am overloaded, but only for -78 epochs
2017-06-20 22:08:23.965255 7f220598a700 5 mds.1.bal i am overloaded, but only for -77 epochs
...

History

#1 Updated by Patrick Donnelly almost 7 years ago

  • Status changed from New to 12
  • Assignee set to Patrick Donnelly

#2 Updated by Jianyu Li almost 7 years ago

There is a merge request for this bug fix: https://github.com/ceph/ceph/pull/15825, could you have a review? @Patrick

#3 Updated by Patrick Donnelly almost 7 years ago

  • Status changed from 12 to Fix Under Review

#4 Updated by Patrick Donnelly over 6 years ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF