Project

General

Profile

Actions

Bug #52874

closed

Monitor might crash after upgrade from ceph to 16.2.6

Added by Igor Fedotov over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDSMonitor
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The following assertion might pop up
void FSMap::sanity() const {
...
if (info.state != MDSMap::STATE_STANDBY_REPLAY) {
...
} else {
ceph_assert(fs->mds_map.allows_standby_replay());
}

when allow-standby-replay flag is set to false but some MDS-es are still running in standby-replay mode.
The thing is that prior to Pacific setting the flag doesn't enforce MDS going out of the mode.
Hence one might put the cluster (and relevant MDS map) in an inconsistent state which triggers the monitor assertion on the upgrade.
Neither upgrade manual requires manual standby-replay MDS disablement PRIOR to monitor upgrade. According to the spec the latter to be performed at stage 2 while actions on MDS are at stage 5:

2.Upgrade monitors by installing the new packages and restarting the monitor daemons. For example, on each monitor host,:
...
5. Upgrade all CephFS MDS daemons. For each CephFS file system,

1. Disable standby_replay:

Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #52998: pacific: Monitor might crash after upgrade from ceph to 16.2.6ResolvedPatrick DonnellyActions
Actions #1

Updated by Venky Shankar over 2 years ago

  • Assignee set to Patrick Donnelly
Actions #2

Updated by Patrick Donnelly over 2 years ago

  • Status changed from New to Triaged
  • Priority changed from Normal to Urgent
  • Target version set to v17.0.0
  • Source set to Community (user)
  • Backport set to pacific
  • Component(FS) MDSMonitor added
Actions #3

Updated by Patrick Donnelly over 2 years ago

You can get around this problem by setting in ceph.conf (for the mons):

[mon]
    mon_mds_skip_sanity = true

Thanks for the helpful bug report, I will work on a fix.

Actions #4

Updated by Patrick Donnelly over 2 years ago

  • Status changed from Triaged to Fix Under Review
  • Pull request ID set to 43508
  • Labels (FS) crash added
Actions #5

Updated by Patrick Donnelly over 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #6

Updated by Backport Bot over 2 years ago

  • Copied to Backport #52998: pacific: Monitor might crash after upgrade from ceph to 16.2.6 added
Actions #7

Updated by Loïc Dachary over 2 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions

Also available in: Atom PDF