Project

General

Profile

Bug #39978

Adding OSD to Luminous Cluster will crash the active mon

Added by Henry Spanka over 2 years ago. Updated over 2 years ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I recently upgraded my cluster to Luminous v12.2.11. While adding a new OSD the active monitor crashes (attempt to free invalid pointer). The other mons are still running but the OSD is stuck in new state. Attempting to restart the OSD process will crash the monitor again.

Crash Log: https://pastebin.com/pMpth7dV
Binary: http://mirror.centos.org/centos/7/storage/x86_64/ceph-luminous/ceph-12.2.11-0.el7.x86_64.rpm
OSD Tree: https://pastebin.com/RZQX2zAz

I think it crashes at this point: https://github.com/ceph/ceph/blob/26dc3775efc7bb286a1d6d66faee0ba30ea23eee/src/crush/CrushWrapper.cc#L463
The OSD is added on a new node (not in the crush map yet). Could that be a problem?


Related issues

Related to RADOS - Bug #40029: ceph-mon: Caught signal (Aborted) in (CrushWrapper::update_choose_args(CephContext*)+0x2fa) [0x7f516505614a] Resolved

History

#1 Updated by Henry Spanka over 2 years ago

Indeed the issue is related to adding a new host to the crush map.
I fixed it by manually adding the host to the crush map first and then activating the new OSD. Consider this solved but It would be good to still fix this bug as it may cause unexpected downtime if a monitor fails due to this.

Commands to fix the issue:

ceph osd crush add-bucket newhost host
ceph osd crush move-bucket newhost root=default

#2 Updated by Greg Farnum over 2 years ago

  • Project changed from Ceph to RADOS
  • Category deleted (Monitor)

#3 Updated by Neha Ojha over 2 years ago

  • Priority changed from Normal to Urgent

#4 Updated by Greg Farnum over 2 years ago

  • Related to Bug #40029: ceph-mon: Caught signal (Aborted) in (CrushWrapper::update_choose_args(CephContext*)+0x2fa) [0x7f516505614a] added

#5 Updated by Greg Farnum over 2 years ago

  • Status changed from New to Duplicate

Closing in favor of the other since we've lost all the pastebins. :(

Also available in: Atom PDF