Adding OSD to Luminous Cluster will crash the active mon
I recently upgraded my cluster to Luminous v12.2.11. While adding a new OSD the active monitor crashes (attempt to free invalid pointer). The other mons are still running but the OSD is stuck in new state. Attempting to restart the OSD process will crash the monitor again.
I think it crashes at this point: https://github.com/ceph/ceph/blob/26dc3775efc7bb286a1d6d66faee0ba30ea23eee/src/crush/CrushWrapper.cc#L463
The OSD is added on a new node (not in the crush map yet). Could that be a problem?
#1 Updated by Henry Spanka over 2 years ago
Indeed the issue is related to adding a new host to the crush map.
I fixed it by manually adding the host to the crush map first and then activating the new OSD. Consider this solved but It would be good to still fix this bug as it may cause unexpected downtime if a monitor fails due to this.
Commands to fix the issue:
ceph osd crush add-bucket newhost host ceph osd crush move-bucket newhost root=default