Project

General

Profile

Bug #53811

standby-replay mds is removed from MDSMap unexpectedly

Added by 玮文 胡 11 months ago. Updated 3 months ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Tags:
backport_processed
Backport:
quincy,pacific
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDSMonitor
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In `MDSMonitor::prepare_beacon`

...
} else if ((state == MDSMap::STATE_STANDBY || state == MDSMap::STATE_STANDBY_REPLAY)
        && info.rank != MDS_RANK_NONE)
{
  dout(4) << "mds_beacon MDS can't go back into standby after taking rank: " 
             "held rank " << info.rank << " while requesting state " 
          << ceph_mds_state_name(state) << dendl;
  goto evict;
}

This would evict standby-replay mds unexpectedly since standby-replay also has a rank.


Related issues

Copied to CephFS - Backport #57261: pacific: standby-replay mds is removed from MDSMap unexpectedly Resolved
Copied to CephFS - Backport #57262: octopus: standby-replay mds is removed from MDSMap unexpectedly Rejected
Copied to CephFS - Backport #57370: quincy: standby-replay mds is removed from MDSMap unexpectedly Resolved

History

#1 Updated by Venky Shankar 11 months ago

  • Category set to Correctness/Safety
  • Status changed from New to Fix Under Review
  • Target version set to v17.0.0
  • Backport set to pacific,octopus
  • Pull request ID set to 44501

#2 Updated by Patrick Donnelly 11 months ago

I think you probably found this when the standby-replay daemon was "laggy" and then came back, yes?

#3 Updated by 玮文 胡 11 months ago

Patrick Donnelly wrote:

I think you probably found this when the standby-replay daemon was "laggy" and then came back, yes?

No, I read the following log line from monitor:

mon.gpu024@0(leader).mds e86079 fail_mds_gid 8198787 mds.cephfs.gpu023.aetiph role 1

And this from the removed MDS:

mds.cephfs.gpu023.aetiph Updating MDS map to version 86080 from mon.2                                                                        
mds.cephfs.gpu023.aetiph Map removed me [mds.cephfs.gpu023.aetiph{1:8198787} state up:standby-replay seq 1 join_fscid=2 addr [REMOVED IPs] compat {c=[1],r=[1],i=[7ff]}] from cluster; respawning! See cluster/monitor logs for details.

#4 Updated by Venky Shankar 3 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Target version changed from v17.0.0 to v18.0.0

#5 Updated by Backport Bot 3 months ago

  • Copied to Backport #57261: pacific: standby-replay mds is removed from MDSMap unexpectedly added

#6 Updated by Backport Bot 3 months ago

  • Copied to Backport #57262: octopus: standby-replay mds is removed from MDSMap unexpectedly added

#7 Updated by Backport Bot 3 months ago

  • Tags set to backport_processed

#8 Updated by Patrick Donnelly 3 months ago

  • Tags deleted (backport_processed)
  • Backport changed from pacific,octopus to quincy,pacific

#9 Updated by Backport Bot 3 months ago

  • Copied to Backport #57370: quincy: standby-replay mds is removed from MDSMap unexpectedly added

#10 Updated by Backport Bot 3 months ago

  • Tags set to backport_processed

Also available in: Atom PDF