Project

General

Profile

Actions

Bug #64319

open

OSD does not move itself to crush_location on start, root=default is not applied

Added by Niklas Hambuechen 3 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm currently setting up a Ceph (16.2.7) cluster where it's important to reflect the `datacenter` location in CRUSH, and encountered some issues that may be code bugs or documentation bugs.

First, crush_location does not seem to have a manual reference entry anywhere.
It is mentioned in https://docs.ceph.com/en/pacific/rados/operations/crush-map/#custom-location-hooks but unlike other ceph.conf options its syntax is not fully explained, and it does not have a proper reference entry.
https://docs.ceph.com/en/pacific/rados/operations/crush-map/#crush-location describes the general bucket syntax, but doesn't say that this very syntax works for crush_location.

Second, https://docs.ceph.com/en/pacific/rados/operations/crush-map/#crush-location (and same for Reef) says that

3. Not all keys need to be specified. For example, by default, Ceph automatically sets an OSD’s location to be root=default host=HOSTNAME (based on the output from hostname -s).

I found that not to work: If I specify in ceph.conf:

[osd]
crush_location = region=HEL zone=HEL1 datacenter=HEL1-DC8 host=backupfs-1

then the `root=default` bucket is empty in ceph osd tree:

# ceph osd tree
ID   CLASS  WEIGHT     TYPE NAME                    STATUS  REWEIGHT  PRI-AFF
 -6         439.30875  region HEL                                            
 -5         439.30875      zone HEL1                                         
-23         146.43625          datacenter HEL1-DC3                           
-22         146.43625              host backupfs-3                           
 26    hdd   14.61089                  osd.26           up   1.00000  1.00000
 27    hdd   14.61089                  osd.27           up   1.00000  1.00000
 28    hdd   14.61089                  osd.28           up   1.00000  1.00000
...
 -1                 0  root default

In the above, region and root are on the same level, which is wrong (certainly unintended).

This causes errors: The default CRUSH rules do not place any data because they contain take default, and default is empty. Thus, this causes in ceph status:

100.000% pgs unknown

and acting = [].

So it seems that the "Not all keys need to be specified ... Ceph automatically sets an OSD’s location to be root=default" is either wrong or confusing (if it means "only applies unless you definied `crush_location = ..." explicitly, why would one say "not all keys need ...").

This could be fixed by running ceph osd crush move HEL root=default.

Third, I expected that simply changing ceph.conf to

[osd]
crush_location = root=default region=HEL zone=HEL1 datacenter=HEL1-DC8 host=backupfs-1

and restarting the OSD daemon should work, because the docs at https://docs.ceph.com/en/pacific/rados/operations/crush-map/#crush-location say each time the OSD starts, it verifies it is in the correct location in the CRUSH map and, if it is not, it moves itself.

My OSD did not move itself, the ceph osd tree stayed as it was above after restarting the OSD.

So in summary, the following issues:

1. crush_location has no explicit documentation.
2. "Not all keys need to be specified" is wrong.
3. The docs statement "verifies it is in the correct location in the CRUSH map and, if it is not, it moves itself" seems wrong.

It would be nice if somebody could point out what the intended behaviour is (docs wrong or code wrong?) so the docs can be fixed. Thank you!

No data to display

Actions

Also available in: Atom PDF