Project

General

Profile

Bug #23467

ceph-disk: Destroyed OSDs keeps old CRUSH weight if new device is different size

Added by Wido den Hollander about 6 years ago. Updated almost 3 years ago.

Status:
Won't Fix
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
crush,osd,destroyed,destroy
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Before I go to the bug, let me explain where this is coming from.

In a cluster with 3.84TB Samsung SSDs a SSD failed. The Ceph operator asked the datacenter to replace the SSD and they did.

The employee in the datacenter made a mistake: Instead of replacing the SSD with a 3.84TB he installed a 960GB SSD in that slot.

The Ceph operator did not notice this mistake and he took the following steps:

- Destroy the old OSD with 'ceph osd destroy X'
- Prepare the new SSD as the old OSD ID using 'ceph-disk prepare --osd-id X /dev/sdX'

The OSD was added again, but it kept it's weight of 3.48700 in CRUSH while the OSD is only 960GB large.

Backfilling started and Ceph went to HEALTH_OK. The admin thought everything was going just fine and let the cluster continue.

Suddenly the system went into HEALTH_ERR and I/O stopped due to this OSD being 95% full. Expected behavior of Ceph, but not completely expected here.

The admin then found out this OSD was only 960GB in size. The OSD was stopped and the problem was resolved. Datacenter later swapped the SSD again with a properly sized SSD.

To verify this behavior of Luminous I build a very simple Ceph cluster with a few OSDs:

ID CLASS WEIGHT  TYPE NAME        STATUS REWEIGHT PRI-AFF 
-1       0.03918 root default                             
-3       0.01959     host alpha                           
 0   ssd 0.00980         osd.0        up  1.00000 1.00000 
 3   ssd 0.00980         osd.3        up  1.00000 1.00000 
-5       0.00980     host bravo                           
 1   ssd 0.00980         osd.1        up  1.00000 1.00000 
-7       0.00980     host charlie                         
 2   ssd 0.00980         osd.2        up  1.00000 1.00000

In this case all the OSDs are 10GB running inside Virtual Machines.

I stopped and destroyed osd.3 and re-added it with a 5GB disk.

The CRUSH tree however remained the same, but 'ceph df' shows the new OSD size:

ID CLASS WEIGHT  REWEIGHT SIZE   USE   AVAIL  %USE  VAR  PGS 
 0   ssd 0.00980  1.00000 10236M 1065M  9170M 10.41 0.88  68 
 3   ssd 0.00980  1.00000  5116M 1054M  4061M 20.61 1.74  60 
 1   ssd 0.00980  1.00000 10236M 1061M  9174M 10.37 0.88 128 
 2   ssd 0.00980  1.00000 10236M 1061M  9174M 10.37 0.88 128 
                    TOTAL 35824M 4242M 31581M 11.84          
MIN/MAX VAR: 0.88/1.74  STDDEV: 4.56

Take a close look, the size of the OSD is 5116M while it's weight is still 0.00980.

Looking at the logs of the MON I see that the OSD ran the correct crush create-or-move with the proper weights:

2018-03-26 16:14:09.622816 7faf99315700  0 mon.alpha@0(leader).osd e28 create-or-move crush item name 'osd.3' initial_weight 0.0098 at location {host=alpha,root=default}
2018-03-26 16:17:11.380765 7f9ee734f700  0 mon.alpha@0(leader).osd e41 create-or-move crush item name 'osd.3' initial_weight 0.0049 at location {host=alpha,root=default}

initial_weight is only used when the OSD is completely new, but it's not taken into account when the OSD is destroyed.

Shouldn't we accept initial_weight when the OSD is destroyed and when it's booted for the first time?

History

#1 Updated by Wido den Hollander about 6 years ago

Wido den Hollander wrote:

Backfilling started and Ceph went to HEALTH_OK. The admin thought everything was going just fine and let the cluster continue.

I mean HEALTH_WARN here, but I can not change the original post.

#2 Updated by Greg Farnum almost 6 years ago

  • Subject changed from Destroyed OSDs keeps old CRUSH weight if new device is different size to ceph-disk: Destroyed OSDs keeps old CRUSH weight if new device is different size

#3 Updated by Sage Weil almost 3 years ago

  • Status changed from New to Won't Fix

Also available in: Atom PDF