Bug #55461
openceph osd crush swap-bucket {old_host} {new_host} where {old_host}={new_host} crashes monitors
0%
Description
If, when attempting to execute the ceph osd crush swap-bucket command, one accidentally puts the same host for the source and target, it will crash the monitors, and the only remedy found is to recover the monitor store.db from OSDs via https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds
These errors were found in logs:
Mar 8 03:10:44 pistoremon-as-d01-tier1 ceph-mon3654621: /build/ceph-14.2.22/src/crush/CrushWrapper.cc: In function 'int CrushWrapper::swap_bucket(CephContext*, int, int)' thread 7f878de42700 time 2022-03-08 03:10:44.945920
Mar 8 03:10:44 pistoremon-as-d01-tier1 ceph-mon3654621: /build/ceph-14.2.22/src/crush/CrushWrapper.cc: 1279: FAILED ceph_assert(b->size == bs)
I experienced this in production on a 14.2.22 Nautilus cluster, and have reproduced in a lab cluster of the same version, and another lab cluster on v16.2.7 Pacific.
It does not matter if the target host is linked to a root in the crush map.
I can provide some logs if requested, but this is very easy to reproduce.
No data to display