Project

General

Profile

Actions

Bug #55461

open

ceph osd crush swap-bucket {old_host} {new_host} where {old_host}={new_host} crashes monitors

Added by Josh Beaman about 2 years ago. Updated 20 days ago.

Status:
Fix Under Review
Priority:
Normal
Category:
Monitor
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
quincy,reef
Regression:
No
Severity:
2 - major
Reviewed:
04/26/2022
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If, when attempting to execute the ceph osd crush swap-bucket command, one accidentally puts the same host for the source and target, it will crash the monitors, and the only remedy found is to recover the monitor store.db from OSDs via https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds

These errors were found in logs:
Mar 8 03:10:44 pistoremon-as-d01-tier1 ceph-mon3654621: /build/ceph-14.2.22/src/crush/CrushWrapper.cc: In function 'int CrushWrapper::swap_bucket(CephContext*, int, int)' thread 7f878de42700 time 2022-03-08 03:10:44.945920
Mar 8 03:10:44 pistoremon-as-d01-tier1 ceph-mon3654621: /build/ceph-14.2.22/src/crush/CrushWrapper.cc: 1279: FAILED ceph_assert(b->size == bs)

I experienced this in production on a 14.2.22 Nautilus cluster, and have reproduced in a lab cluster of the same version, and another lab cluster on v16.2.7 Pacific.

It does not matter if the target host is linked to a root in the crush map.

I can provide some logs if requested, but this is very easy to reproduce.

Actions #1

Updated by Dan van der Ster 20 days ago

  • Status changed from New to Fix Under Review
  • Assignee set to Radoslaw Zarzynski
  • Backport set to quincy,reef
  • Pull request ID set to 57191
Actions

Also available in: Atom PDF