Project

General

Profile

Actions

Bug #53811

open

standby-replay mds is removed from MDSMap unexpectedly

Added by 玮文 胡 over 2 years ago. Updated over 1 year ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Tags:
backport_processed
Backport:
quincy,pacific
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDSMonitor
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In `MDSMonitor::prepare_beacon`

...
} else if ((state == MDSMap::STATE_STANDBY || state == MDSMap::STATE_STANDBY_REPLAY)
        && info.rank != MDS_RANK_NONE)
{
  dout(4) << "mds_beacon MDS can't go back into standby after taking rank: " 
             "held rank " << info.rank << " while requesting state " 
          << ceph_mds_state_name(state) << dendl;
  goto evict;
}

This would evict standby-replay mds unexpectedly since standby-replay also has a rank.


Related issues 3 (0 open3 closed)

Copied to CephFS - Backport #57261: pacific: standby-replay mds is removed from MDSMap unexpectedlyResolvedVenky ShankarActions
Copied to CephFS - Backport #57262: octopus: standby-replay mds is removed from MDSMap unexpectedlyRejectedActions
Copied to CephFS - Backport #57370: quincy: standby-replay mds is removed from MDSMap unexpectedlyResolvedVenky ShankarActions
Actions #1

Updated by Venky Shankar over 2 years ago

  • Category set to Correctness/Safety
  • Status changed from New to Fix Under Review
  • Target version set to v17.0.0
  • Backport set to pacific,octopus
  • Pull request ID set to 44501
Actions #2

Updated by Patrick Donnelly over 2 years ago

I think you probably found this when the standby-replay daemon was "laggy" and then came back, yes?

Actions #3

Updated by 玮文 胡 over 2 years ago

Patrick Donnelly wrote:

I think you probably found this when the standby-replay daemon was "laggy" and then came back, yes?

No, I read the following log line from monitor:

mon.gpu024@0(leader).mds e86079 fail_mds_gid 8198787 mds.cephfs.gpu023.aetiph role 1

And this from the removed MDS:

mds.cephfs.gpu023.aetiph Updating MDS map to version 86080 from mon.2                                                                        
mds.cephfs.gpu023.aetiph Map removed me [mds.cephfs.gpu023.aetiph{1:8198787} state up:standby-replay seq 1 join_fscid=2 addr [REMOVED IPs] compat {c=[1],r=[1],i=[7ff]}] from cluster; respawning! See cluster/monitor logs for details.

Actions #4

Updated by Venky Shankar over 1 year ago

  • Status changed from Fix Under Review to Pending Backport
  • Target version changed from v17.0.0 to v18.0.0
Actions #5

Updated by Backport Bot over 1 year ago

  • Copied to Backport #57261: pacific: standby-replay mds is removed from MDSMap unexpectedly added
Actions #6

Updated by Backport Bot over 1 year ago

  • Copied to Backport #57262: octopus: standby-replay mds is removed from MDSMap unexpectedly added
Actions #7

Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed
Actions #8

Updated by Patrick Donnelly over 1 year ago

  • Tags deleted (backport_processed)
  • Backport changed from pacific,octopus to quincy,pacific
Actions #9

Updated by Backport Bot over 1 year ago

  • Copied to Backport #57370: quincy: standby-replay mds is removed from MDSMap unexpectedly added
Actions #10

Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed
Actions

Also available in: Atom PDF