Bug #39978
closedAdding OSD to Luminous Cluster will crash the active mon
0%
Description
I recently upgraded my cluster to Luminous v12.2.11. While adding a new OSD the active monitor crashes (attempt to free invalid pointer). The other mons are still running but the OSD is stuck in new state. Attempting to restart the OSD process will crash the monitor again.
Crash Log: https://pastebin.com/pMpth7dV
Binary: http://mirror.centos.org/centos/7/storage/x86_64/ceph-luminous/ceph-12.2.11-0.el7.x86_64.rpm
OSD Tree: https://pastebin.com/RZQX2zAz
I think it crashes at this point: https://github.com/ceph/ceph/blob/26dc3775efc7bb286a1d6d66faee0ba30ea23eee/src/crush/CrushWrapper.cc#L463
The OSD is added on a new node (not in the crush map yet). Could that be a problem?
Updated by Henry Spanka almost 5 years ago
Indeed the issue is related to adding a new host to the crush map.
I fixed it by manually adding the host to the crush map first and then activating the new OSD. Consider this solved but It would be good to still fix this bug as it may cause unexpected downtime if a monitor fails due to this.
Commands to fix the issue:
ceph osd crush add-bucket newhost host
ceph osd crush move-bucket newhost root=default
Updated by Greg Farnum almost 5 years ago
- Project changed from Ceph to RADOS
- Category deleted (
Monitor)
Updated by Greg Farnum almost 5 years ago
- Related to Bug #40029: ceph-mon: Caught signal (Aborted) in (CrushWrapper::update_choose_args(CephContext*)+0x2fa) [0x7f516505614a] added
Updated by Greg Farnum over 4 years ago
- Status changed from New to Duplicate
Closing in favor of the other since we've lost all the pastebins. :(