Bug #38581
openceph-volume: chown -R on every activating
0%
Description
Hi,
after rebooting a host I see there is a chown R ceph:ceph running on each OSD before the OSD daemon starts. This takes a lot of time (> millions of objects per OSD) and I think this is unneccessary on each startup. In my opinion chowning was a case with the update from hammer to jewel.
I found this commit: https://github.com/ceph/ceph/commit/100f2613a4659b3bd4e550250a41593860118010
there is no condition which would avoid unnecessary recursive chowning.
In the mailinglist I was told I should open bug.
Updated by Nathan Cutler about 5 years ago
- Project changed from Ceph to ceph-volume
- Category deleted (
common)
Updated by Rishabh Dave over 4 years ago
Do we need to get this done? If so, I'll go ahead and assign it to myself.
Updated by Jan Fajerski over 4 years ago
Hmm yeah that would certainly help boot times. Not sure why this was implemented though. The ceph-osd systemd unit already checks for correct ownership and fails if its wrong. @alfredo do you have more insight?
Updated by Alfredo Deza over 4 years ago
How much is "a lot of time" ? The chowning wasn't put into place for the upgrade the ticket describes, ceph-volume was first released as part of Luminous. The process of "activating" an OSD needs to ensure that proper permissions are set always.
In some situations there is a race condition with UDEV that actually sets the device permissions as well. So ceph-volume has to ensure everything is correct (see: https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1767087 for an example of that).
There are other cases (like this one https://tracker.ceph.com/issues/37486) where some step in the activation missed the proper call to chown and the OSD didn't start. It isn't easy to understand (for ceph-volume) if this is a first activation, or a reboot, or even if it does, if there are messed up permissions in the OSD. We also support moving devices around with OSDs in them, and not calling chown to ensure they come up in a new machine (or re-installed OS) would cause problems.
In summary: it is too complicated to foresee every scenario where a chown can be skipped, and it is much safer to chown always.
Updated by Manuel Lausch over 4 years ago
Hi Alfredo,
a lot of time means in our case one to two hours startup time. To get rid of it in our installations I removed the chown call from the ceph-volume script. Since now I don't have any issue anymore.
If you are concernd about wrong permissions due to udev foo. Why not only chown this instead of the whole data disk?
Updated by Alfredo Deza over 4 years ago
The thing is that it isn't just a single thing that may cause a permissions issue, it didn't take much digging to come up with two separate examples. 1 to 2 hours though is unacceptable, and we need to come up with some sort of middle-ground here. Perhaps a way to disable the chown process via configuration? Or attempt a system call that can chown only where/if it is needed?
What this ticket should not do is simply remove the chown call.