Project

General

Profile

Bug #45129

simple (ceph-disk) style OSDs adopted by cephadm don't start after reboot

Added by Tim Serong 6 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
cephadm
Target version:
% Done:

0%

Source:
Tags:
Backport:
octopus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

When running cephadm adopt against a simple (ceph-disk) style OSD, the adopt runs fine, and the OSD starts, but later when you reboot the system, the OSD won't start again. It fails with "bdev(0x559b02592000 /var/lib/ceph/osd/ceph-21/block) open open got: (13) Permission denied".

Looks like after the adopt is run, the device files backing the OSD are owned by ceph:ceph, but after a subsequent reboot, it reverts to being owned by root:root. I assume there's something in the old-style activation that's not happening anymore?


Related issues

Related to Orchestrator - Bug #46833: simple (ceph-disk style) OSDs adopted by cephadm must not call `ceph-volume lvm activate` Resolved

History

#1 Updated by Tim Serong 6 months ago

  • Assignee set to Tim Serong

OK, here's what's going on: outside the container world, simple OSDs have a unit enabled named something like . This in turn maps to a call to ceph-volume simple activate. Somewhere in ceph_volume/devices/simple/activate.py, we have calls to chown each device that backs the OSD.

Once an OSD is adopted, the constructed unit file never calls ceph-volume simple activate. Instead, it always calls ceph-volume lvm activate, which is effectively a noop for simple OSDs, so when it finally starts the OSD, the chown hasn't happened.

We can't change the constructed unit file to call ceph-volume simple activate for simple OSDs, because that won't work anymore, as the JSON file this command needs (/etc/ceph/osd/19-8ba4de01-ec7a-4b1c-be73-1d89431e3df5.json) has been renamed to 19-8ba4de01-ec7a-4b1c-be73-1d89431e3df5.json.adopted-by-cephadm. And even if we were to bring that file back, it still wouldn't work, because we don't want the OSD to be mounted in the normal fashion, and anyway, all the necessary files have already been moved out of that OSD's data partition, to /var/lib/ceph/$FSID/osd.$ID.

One easy thing we could do is inject something like chown /var/lib/ceph/$FSID/osd.$ID/block* into the generated unit file, before starting the OSD. Does anyone have an opinion on this?

#2 Updated by Sebastian Wagner 6 months ago

Hm, we're already injecting this lvm activate into the unit file:

adding a chown there would be feasible.

Is there an easy way to do that only for ceph-disk osds?

#3 Updated by Tim Serong 5 months ago

Sebastian Wagner wrote:

Hm, we're already injecting this lvm activate into the unit file:

adding a chown there would be feasible.

Is there an easy way to do that only for ceph-disk osds?

I'll think of something :-)

#4 Updated by Tim Serong 5 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 34703

#5 Updated by Tim Serong 5 months ago

  • Status changed from Fix Under Review to Pending Backport

#6 Updated by Sebastian Wagner 4 months ago

  • Status changed from Pending Backport to Resolved
  • Target version set to v15.2.4

#7 Updated by Tim Serong about 2 months ago

  • Related to Bug #46833: simple (ceph-disk style) OSDs adopted by cephadm must not call `ceph-volume lvm activate` added

Also available in: Atom PDF