Project

General

Profile

Actions

Bug #55461

open

ceph osd crush swap-bucket {old_host} {new_host} where {old_host}={new_host} crashes monitors

Added by Josh Beaman almost 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Monitor
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
04/26/2022
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

If, when attempting to execute the ceph osd crush swap-bucket command, one accidentally puts the same host for the source and target, it will crash the monitors, and the only remedy found is to recover the monitor store.db from OSDs via https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds

These errors were found in logs:
Mar 8 03:10:44 pistoremon-as-d01-tier1 ceph-mon3654621: /build/ceph-14.2.22/src/crush/CrushWrapper.cc: In function 'int CrushWrapper::swap_bucket(CephContext*, int, int)' thread 7f878de42700 time 2022-03-08 03:10:44.945920
Mar 8 03:10:44 pistoremon-as-d01-tier1 ceph-mon3654621: /build/ceph-14.2.22/src/crush/CrushWrapper.cc: 1279: FAILED ceph_assert(b->size == bs)

I experienced this in production on a 14.2.22 Nautilus cluster, and have reproduced in a lab cluster of the same version, and another lab cluster on v16.2.7 Pacific.

It does not matter if the target host is linked to a root in the crush map.

I can provide some logs if requested, but this is very easy to reproduce.

No data to display

Actions

Also available in: Atom PDF