Project

General

Profile

Actions

Bug #38581

open

ceph-volume: chown -R on every activating

Added by Manuel Lausch about 5 years ago. Updated over 4 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi,

after rebooting a host I see there is a chown R ceph:ceph running on each OSD before the OSD daemon starts. This takes a lot of time (> millions of objects per OSD) and I think this is unneccessary on each startup. In my opinion chowning was a case with the update from hammer to jewel.

I found this commit: https://github.com/ceph/ceph/commit/100f2613a4659b3bd4e550250a41593860118010
there is no condition which would avoid unnecessary recursive chowning.

In the mailinglist I was told I should open bug.

Actions #1

Updated by Nathan Cutler about 5 years ago

  • Project changed from Ceph to ceph-volume
  • Category deleted (common)
Actions #2

Updated by Rishabh Dave over 4 years ago

Do we need to get this done? If so, I'll go ahead and assign it to myself.

Actions #3

Updated by Jan Fajerski over 4 years ago

Hmm yeah that would certainly help boot times. Not sure why this was implemented though. The ceph-osd systemd unit already checks for correct ownership and fails if its wrong. @alfredo do you have more insight?

Actions #4

Updated by Alfredo Deza over 4 years ago

How much is "a lot of time" ? The chowning wasn't put into place for the upgrade the ticket describes, ceph-volume was first released as part of Luminous. The process of "activating" an OSD needs to ensure that proper permissions are set always.

In some situations there is a race condition with UDEV that actually sets the device permissions as well. So ceph-volume has to ensure everything is correct (see: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1767087 for an example of that).

There are other cases (like this one https://tracker.ceph.com/issues/37486) where some step in the activation missed the proper call to chown and the OSD didn't start. It isn't easy to understand (for ceph-volume) if this is a first activation, or a reboot, or even if it does, if there are messed up permissions in the OSD. We also support moving devices around with OSDs in them, and not calling chown to ensure they come up in a new machine (or re-installed OS) would cause problems.

In summary: it is too complicated to foresee every scenario where a chown can be skipped, and it is much safer to chown always.

Actions #5

Updated by Manuel Lausch over 4 years ago

Hi Alfredo,
a lot of time means in our case one to two hours startup time. To get rid of it in our installations I removed the chown call from the ceph-volume script. Since now I don't have any issue anymore.

If you are concernd about wrong permissions due to udev foo. Why not only chown this instead of the whole data disk?

Actions #6

Updated by Alfredo Deza over 4 years ago

The thing is that it isn't just a single thing that may cause a permissions issue, it didn't take much digging to come up with two separate examples. 1 to 2 hours though is unacceptable, and we need to come up with some sort of middle-ground here. Perhaps a way to disable the chown process via configuration? Or attempt a system call that can chown only where/if it is needed?

What this ticket should not do is simply remove the chown call.

Actions

Also available in: Atom PDF