Bug #45129
closedsimple (ceph-disk) style OSDs adopted by cephadm don't start after reboot
0%
Description
When running cephadm adopt
against a simple (ceph-disk) style OSD, the adopt runs fine, and the OSD starts, but later when you reboot the system, the OSD won't start again. It fails with "bdev(0x559b02592000 /var/lib/ceph/osd/ceph-21/block) open open got: (13) Permission denied".
Looks like after the adopt is run, the device files backing the OSD are owned by ceph:ceph, but after a subsequent reboot, it reverts to being owned by root:root. I assume there's something in the old-style activation that's not happening anymore?
Updated by Tim Serong about 4 years ago
- Assignee set to Tim Serong
OK, here's what's going on: outside the container world, simple OSDs have a unit enabled named something like ceph-volume@simple-19-8ba4de01-ec7a-4b1c-be73-1d89431e3df5.service. This in turn maps to a call to ceph-volume simple activate
. Somewhere in ceph_volume/devices/simple/activate.py, we have calls to chown each device that backs the OSD.
Once an OSD is adopted, the constructed unit file never calls ceph-volume simple activate
. Instead, it always calls ceph-volume lvm activate
, which is effectively a noop for simple OSDs, so when it finally starts the OSD, the chown hasn't happened.
We can't change the constructed unit file to call ceph-volume simple activate
for simple OSDs, because that won't work anymore, as the JSON file this command needs (/etc/ceph/osd/19-8ba4de01-ec7a-4b1c-be73-1d89431e3df5.json) has been renamed to 19-8ba4de01-ec7a-4b1c-be73-1d89431e3df5.json.adopted-by-cephadm. And even if we were to bring that file back, it still wouldn't work, because we don't want the OSD to be mounted in the normal fashion, and anyway, all the necessary files have already been moved out of that OSD's data partition, to /var/lib/ceph/$FSID/osd.$ID.
One easy thing we could do is inject something like chown /var/lib/ceph/$FSID/osd.$ID/block*
into the generated unit file, before starting the OSD. Does anyone have an opinion on this?
Updated by Sebastian Wagner about 4 years ago
Hm, we're already injecting this lvm activate into the unit file:
adding a chown there would be feasible.
Is there an easy way to do that only for ceph-disk osds?
Updated by Tim Serong about 4 years ago
Sebastian Wagner wrote:
Hm, we're already injecting this lvm activate into the unit file:
adding a chown there would be feasible.
Is there an easy way to do that only for ceph-disk osds?
I'll think of something :-)
Updated by Tim Serong about 4 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 34703
Updated by Tim Serong almost 4 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Sebastian Wagner almost 4 years ago
- Status changed from Pending Backport to Resolved
- Target version set to v15.2.4
Updated by Tim Serong over 3 years ago
- Related to Bug #46833: simple (ceph-disk style) OSDs adopted by cephadm must not call `ceph-volume lvm activate` added