Project

General

Profile

Bug #24440

common/DecayCounter: set last_decay to current time when decoding decay counter

Added by Zhi Zhang almost 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
mimic,luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Recently we found mds load might become zero on another MDS under multi-MDSes scenario. The ceph version is Luminous.

From below log, MDS.1 got its load and would send it to MDS.0.

2018-05-13 17:20:31.252804 7ffa471d7700  0 mds.1.bal mds.1 epoch 17 load mdsload<[0,6245.74 12491.5]/[0,33581.5 67163], req 13440, hr 0, qlen 18, cpu 0.04>

When MDS.0 handled the heartbeat and got MDS.1's load from message, the load became zero.

2018-05-13 17:20:30.988828 7f65d8b45700  0 mds.0.bal mds.0 epoch 17 load mdsload<[5580.31,18907.2 43394.8]/[33213.8,109221 251656], req 42543, hr 0, qlen 36, cpu 0.75>
2018-05-13 17:20:31.280096 7f65db34a700  0 mds.0.bal   mds.0 mdsload<[5580.31,18907.2 43394.8]/[33213.8,109221 251656], req 42543, hr 0, qlen 36, cpu 0.75> = 127950 ~ 43394.8
2018-05-13 17:20:31.280113 7f65db34a700  0 mds.0.bal   mds.1 mdsload<[0,0 0]/[0,0 0], req 13440, hr 0, qlen 18, cpu 0.04> = 37045.8 ~ 12564.2

We found the last_decay in this message is 0 (utime_t()), so the eclipse time is very large and the original value would be decayed to 0.


Related issues

Copied to CephFS - Backport #24537: mimic: common/DecayCounter: set last_decay to current time when decoding decay counter Resolved
Copied to CephFS - Backport #24538: luminous: common/DecayCounter: set last_decay to current time when decoding decay counter Resolved

History

#2 Updated by Patrick Donnelly almost 6 years ago

  • Status changed from New to Pending Backport
  • Assignee set to Zhi Zhang
  • Target version set to v14.0.0
  • Backport set to mimic,luminous

#3 Updated by Nathan Cutler almost 6 years ago

  • Copied to Backport #24537: mimic: common/DecayCounter: set last_decay to current time when decoding decay counter added

#4 Updated by Nathan Cutler almost 6 years ago

  • Copied to Backport #24538: luminous: common/DecayCounter: set last_decay to current time when decoding decay counter added

#5 Updated by Nathan Cutler over 5 years ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF