Bug #57096
osd not restarting after upgrading to quincy due to podman args --cgroups=split
0%
Description
I'm trying to upgrade from octopus latest to 17.2.3.
Looking at the /var/lib/ceph/xxxx/osd.6/unit.run, there is an extra argument `--cgroups=split` that wasn't there in octopus.
With this argument, the osd daemon does not start. Removing this argument, osd daemon starts normally.
Below are the podman info and the jorunalctl startup logs
podman info
host: arch: amd64 buildahVersion: 1.26.2 cgroupControllers: - cpuset - cpu - cpuacct - blkio - memory - devices - freezer - net_cls - perf_event - net_prio - hugetlb - pids - rdma cgroupManager: systemd cgroupVersion: v1 conmon: package: conmon-2.1.2-2.module+el8.6.0+997+05c9d812.x86_64 path: /usr/bin/conmon version: 'conmon version 2.1.2, commit: 98e028a5804809ccb49bc099c0d53adc43ef8cc4' cpuUtilization: idlePercent: 97.58 systemPercent: 0.55 userPercent: 1.87 cpus: 48 distribution: distribution: '"rocky"' version: "8.6" eventLogger: file hostname: sys1.nodes.preferred.ai idMappings: gidmap: null uidmap: null kernel: 4.18.0-372.16.1.el8_6.x86_64 linkmode: dynamic logDriver: k8s-file memFree: 236812603392 memTotal: 269990506496 networkBackend: cni ociRuntime: name: runc package: runc-1.1.3-2.module+el8.6.0+997+05c9d812.x86_64 path: /usr/bin/runc version: |- runc version 1.1.3 spec: 1.0.2-dev go: go1.17.12 libseccomp: 2.5.2 os: linux remoteSocket: path: /run/podman/podman.sock security: apparmorEnabled: false capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT rootless: false seccompEnabled: true seccompProfilePath: /usr/share/containers/seccomp.json selinuxEnabled: true serviceIsRemote: false slirp4netns: executable: /usr/bin/slirp4netns package: slirp4netns-1.2.0-2.module+el8.6.0+997+05c9d812.x86_64 version: |- slirp4netns version 1.2.0 commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383 libslirp: 4.4.0 SLIRP_CONFIG_VERSION_MAX: 3 libseccomp: 2.5.2 swapFree: 4291006464 swapTotal: 4294963200 uptime: 16h 15m 10.45s (Approximately 0.67 days) plugins: log: - k8s-file - none - passthrough - journald network: - bridge - macvlan - ipvlan volume: - local registries: search: - registry.access.redhat.com - registry.redhat.io - docker.io store: configFile: /etc/containers/storage.conf containerStore: number: 6 paused: 0 running: 3 stopped: 3 graphDriverName: overlay graphOptions: overlay.mountopt: nodev,metacopy=on graphRoot: /var/lib/containers/storage graphRootAllocated: 233919479808 graphRootUsed: 93134196736 graphStatus: Backing Filesystem: xfs Native Overlay Diff: "false" Supports d_type: "true" Using metacopy: "true" imageCopyTmpDir: /var/tmp imageStore: number: 4 runRoot: /run/containers/storage volumePath: /var/lib/containers/storage/volumes version: APIVersion: 4.1.1 Built: 1659426794 BuiltTime: Tue Aug 2 15:53:14 2022 GitCommit: "" GoVersion: go1.17.12 Os: linux OsArch: linux/amd64 Version: 4.1.1
osd.6 start logs
Aug 11 11:14:20 x.com systemd[1]: Starting Ceph osd.6 for xxxx... -- Subject: Unit ceph-xxxx@osd.6.service has begun start-up -- Defined-By: systemd -- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit ceph-xxxx@osd.6.service has begun starting up. Aug 11 11:14:21 x.com systemd[1]: Started libcontainer container 436ff1dc9e04f07a58006178a1c1978d704e951d467e61299b9cdb755e83358d. -- Subject: Unit libpod-436ff1dc9e04f07a58006178a1c1978d704e951d467e61299b9cdb755e83358d.scope has finished start-up -- Defined-By: systemd -- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit libpod-436ff1dc9e04f07a58006178a1c1978d704e951d467e61299b9cdb755e83358d.scope has finished starting up. -- -- The start-up result is done. Aug 11 11:14:22 x.com bash[3742940]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-6 Aug 11 11:14:22 x.com bash[3742940]: Running command: /usr/bin/ceph-bluestore-tool prime-osd-dir --path /var/lib/ceph/osd/ceph-6 --no-mon-config --dev /dev/mapper/ceph--e140738d--f7fc--428f--9ecd--192c0db697dc-osd--block--77a96052--c916--492a--aa3f--bf6> Aug 11 11:14:22 x.com bash[3742940]: Running command: /usr/bin/chown -h ceph:ceph /dev/mapper/ceph--e140738d--f7fc--428f--9ecd--192c0db697dc-osd--block--77a96052--c916--492a--aa3f--bf6cccb0a802 Aug 11 11:14:22 x.com bash[3742940]: Running command: /usr/bin/chown -R ceph:ceph /dev/dm-2 Aug 11 11:14:22 x.com bash[3742940]: Running command: /usr/bin/ln -s /dev/mapper/ceph--e140738d--f7fc--428f--9ecd--192c0db697dc-osd--block--77a96052--c916--492a--aa3f--bf6cccb0a802 /var/lib/ceph/osd/ceph-6/block Aug 11 11:14:22 x.com bash[3742940]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-6 Aug 11 11:14:22 x.com bash[3742940]: --> ceph-volume raw activate successful for osd ID: 6 Aug 11 11:14:22 x.com systemd[1]: libpod-436ff1dc9e04f07a58006178a1c1978d704e951d467e61299b9cdb755e83358d.scope: Succeeded. -- Subject: Unit succeeded -- Defined-By: systemd -- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- The unit libpod-436ff1dc9e04f07a58006178a1c1978d704e951d467e61299b9cdb755e83358d.scope has successfully entered the 'dead' state. Aug 11 11:14:22 x.com systemd[1]: libpod-436ff1dc9e04f07a58006178a1c1978d704e951d467e61299b9cdb755e83358d.scope: Consumed 1.445s CPU time -- Subject: Resources consumed by unit runtime -- Defined-By: systemd -- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- The unit libpod-436ff1dc9e04f07a58006178a1c1978d704e951d467e61299b9cdb755e83358d.scope completed and consumed the indicated resources. Aug 11 11:14:22 x.com systemd[1]: var-lib-containers-storage-overlay-afe8eefb9c7e960f583589858644d28823a097316b5987e607f36cb401adaca4-merged.mount: Succeeded. -- Subject: Unit succeeded -- Defined-By: systemd -- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- The unit var-lib-containers-storage-overlay-afe8eefb9c7e960f583589858644d28823a097316b5987e607f36cb401adaca4-merged.mount has successfully entered the 'dead' state. Aug 11 11:14:22 x.com bash[3743336]: b205a5ef766e965ca9e72a189edc8974fa4d82d8d43819c7a6ae8b10a55a0fac Aug 11 11:14:23 x.com podman[3743417]: Aug 11 11:14:23 x.com bash[3743417]: Error: could not find cgroup mount in "/proc/self/cgroup" Aug 11 11:14:23 x.com systemd[1]: ceph-xxxx@osd.6.service: Control process exited, code=exited status=126
Related issues
History
#1 Updated by Adam King over 1 year ago
and this is only on OSDs? Afaik we're setting --cgroups=split for every container we deploy. It seems odd it would only break for OSDs
#2 Updated by Ween Jiann Lee over 1 year ago
Adam King wrote:
and this is only on OSDs? Afaik we're setting --cgroups=split for every container we deploy. It seems odd it would only break for OSDs
Unfortunately, only the OSD and rgw are placed on servers with podman. The OSDs are upgraded before the rgw so I can't tell whether the problem exists for the rest of the daemon types.
I can try to upgrade the rgw manually if you would like to know.
#3 Updated by Ween Jiann Lee over 1 year ago
Adam King wrote:
and this is only on OSDs? Afaik we're setting --cgroups=split for every container we deploy. It seems odd it would only break for OSDs
Unfortunately, only the OSD and rgw are placed on servers with podman. The OSDs are upgraded before the rgw so I can't tell whether the problem exists for rgw as well and or the rest of the daemon types.
I can try to upgrade the rgw manually if you would like to find out.
#4 Updated by Adam King over 1 year ago
Was discussing this with somebody today and they had a potential workaround idea. We have a way of specifying miscellaneous container args in newer version of cephadm https://docs.ceph.com/en/quincy/cephadm/services/#extra-container-arguments. We might be able to use this to pass `--cgroups=enabled` (which is the default behavior) to override the `--cgroups=split`. It would basically involve pausing the upgrade, modifying the service spec for the osds and rgws on this host to add
extra_container_args: - '--cgroups=enabled'
to the end, re-applying the spec, telling cephadm to redeploy the service "ceph orch redeploy <service-name>" for each service that has daemons on this host, and then starting the upgrade again. You could try that and see if it works. If not, I have https://github.com/ceph/ceph/pull/47640 open and could try to push to get it in the next quincy release. Personally I don't really like the patch so I'd like to first make sure this workaround doesn't help before moving forward with it.
#5 Updated by Ween Jiann Lee over 1 year ago
Adam King wrote:
It would basically involve pausing the upgrade, modifying the service spec for the osds and rgws on this host to add
Thanks for getting a PR ready. The OSDs have been deployed to the default unmanaged service spec (i.e. "osd" on `ceph orch ls`), which cannot be modified.
I can add another service spec with the args above but is there any way to migrate the OSDs to another service spec without destroying the data?
#6 Updated by Adam King over 1 year ago
Ween Jiann Lee wrote:
Adam King wrote:
It would basically involve pausing the upgrade, modifying the service spec for the osds and rgws on this host to add
Thanks for getting a PR ready. The OSDs have been deployed to the default unmanaged service spec (i.e. "osd" on `ceph orch ls`), which cannot be modified.
I can add another service spec with the args above but is there any way to migrate the OSDs to another service spec without destroying the data?
If the osd is attached to a service, you should be able to edit the existent service spec to include the args and re-apply it and cpehadm will redeploy the OSDs with the extra args. I tested that a set of OSDs deployed using
[ceph: root@vm-00 /]# cat osd.yml service_type: osd service_id: foo placement: hosts: - vm-00 - vm-01 spec: data_devices: paths: - /dev/vdb - /dev/vdc
were deployed without --cgroups=enabled but then after re-applying this spec with a modification (see https://docs.ceph.com/en/latest/cephadm/services/#updating-service-specifications if you're not sure what I mean) they were redeployed with --cgroups=enabled
[ceph: root@vm-00 /]# cat osd.yml service_type: osd service_id: foo service_name: osd.foo placement: hosts: - vm-00 - vm-01 extra_container_args: - '--cgroups=enabled' spec: data_devices: paths: - /dev/vdb - /dev/vdc filter_logic: AND objectstore: bluestore
Note that the OSDs affected by that were known to be part of osd.foo by cephadm
[ceph: root@vm-00 /]# ceph orch ps --service-name osd.foo NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID osd.0 vm-01 running (10m) 9m ago 13m 48.8M 11.1G 17.0.0-14573-gf5c21acf f05a7f2e4b06 45ccb83b661a osd.1 vm-00 running (11m) 5m ago 13m 54.8M 7588M 17.0.0-14573-gf5c21acf f05a7f2e4b06 3ae50021a8c1 osd.2 vm-01 running (10m) 9m ago 13m 45.4M 11.1G 17.0.0-14573-gf5c21acf f05a7f2e4b06 761d2fd4e5bf osd.3 vm-00 running (11m) 5m ago 13m 50.3M 7588M 17.0.0-14573-gf5c21acf f05a7f2e4b06 d26a1fbcce32
I don't think there's any way currently to do so with OSDs not tied to any service (that show up in `ceph orch ls --service-name osd`). I think there SHOULD be and I will try to add something in the future. In the meantime, I guess maybe I'll try to go forward with that other PR to address the issue.
#7 Updated by Adam King over 1 year ago
- Status changed from New to Pending Backport
- Backport set to quincy, pacific
- Pull request ID set to 47640
#8 Updated by Adam King over 1 year ago
- Backport changed from quincy, pacific to quincy
#9 Updated by Backport Bot over 1 year ago
- Copied to Backport #57526: quincy: osd not restarting after upgrading to quincy due to podman args --cgroups=split added
#10 Updated by Backport Bot over 1 year ago
- Tags set to backport_processed
#11 Updated by Adam King over 1 year ago
- Status changed from Pending Backport to Resolved
#12 Updated by Adam King over 1 year ago
@Ween I found another thing that may be helpful. For OSDs that are not attached to a spec you can make them attached to a spec (and therefore use extra_container_args) on them by editing the unit.meta file for the daemon.
For example, her eosd.0 was not attached to any spec
[ceph: root@vm-00 /]# ceph orch daemon add osd vm-00:/dev/vdc Created osd(s) 0 on host 'vm-00' [ceph: root@vm-00 /]# ceph orch ps NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID crash.vm-00 vm-00 running (4m) 2m ago 4m 6971k - 16.2.8-80.el8cp 4826c8c29ba2 0ef00e352938 crash.vm-01 vm-01 running (3m) 2m ago 3m 7168k - 16.2.8-80.el8cp 4826c8c29ba2 3417453e1d7d crash.vm-02 vm-02 running (3m) 2m ago 3m 7180k - 16.2.8-80.el8cp 4826c8c29ba2 e5f65e7122ac mgr.vm-00.ezzehn vm-00 *:9283 running (5m) 2m ago 5m 421M - 16.2.8-80.el8cp 4826c8c29ba2 f4cfac7ba198 mgr.vm-01.lgesel vm-01 *:8443 running (3m) 2m ago 3m 389M - 16.2.8-80.el8cp 4826c8c29ba2 0347fea26d7c mon.vm-00 vm-00 running (5m) 2m ago 5m 48.6M 2048M 16.2.8-80.el8cp 4826c8c29ba2 2bae16704025 mon.vm-01 vm-01 running (3m) 2m ago 3m 39.6M 2048M 16.2.8-80.el8cp 4826c8c29ba2 4303ce39f55c mon.vm-02 vm-02 running (3m) 2m ago 3m 37.3M 2048M 16.2.8-80.el8cp 4826c8c29ba2 a060d6e3546a osd.0 vm-00 running (2m) 2m ago 2m 13.1M 22.2G 16.2.8-80.el8cp 4826c8c29ba2 bdbdd6db176f [ceph: root@vm-00 /]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT crash 3/3 58s ago 3m * mgr 2/2 58s ago 3m count:2 mon 3/5 58s ago 3m count:5 osd 1 10s ago - <unmanaged>
and you can see it's unit.meta reported as such (service of just "osd")
[root@vm-00 ~]# cat /var/lib/ceph/3098b6e0-3904-11ed-8133-525400affcb5/osd.0/unit.meta { "service_name": "osd", "ports": [], "ip": null, "deployed_by": [ "quay.io/adk3798/ceph@sha256:4b100389b72cf985b4819fc145c7581d79783d1671f2b43bba5fb560ff83fac3" ], "rank": null, "rank_generation": null, "extra_container_args": null, "memory_request": null, "memory_limit": null }
but if I added a new osd spec and then changed the service_name for that osd's unit.meta file, it now falls under that spec
[ceph: root@vm-00 /]# cat osd.yml service_type: osd service_id: foo service_name: osd.foo placement: hosts: - vm-00 spec: data_devices: paths: - /dev/vdc filter_logic: AND objectstore: bluestore [ceph: root@vm-00 /]# ceph orch apply -i osd.yml Scheduled osd.foo update... [ceph: root@vm-00 /]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT crash 3/3 2m ago 4m * mgr 2/2 2m ago 4m count:2 mon 3/5 2m ago 4m count:5 osd 1 113s ago - <unmanaged> osd.foo 0 - 62s vm-00 [root@vm-00 ~]# vi /var/lib/ceph/3098b6e0-3904-11ed-8133-525400affcb5/osd.0/unit.meta [root@vm-00 ~]# cat /var/lib/ceph/3098b6e0-3904-11ed-8133-525400affcb5/osd.0/unit.meta { "service_name": "osd.foo", "ports": [], "ip": null, "deployed_by": [ "quay.io/adk3798/ceph@sha256:4b100389b72cf985b4819fc145c7581d79783d1671f2b43bba5fb560ff83fac3" ], "rank": null, "rank_generation": null, "extra_container_args": null, "memory_request": null, "memory_limit": null } Run a "ceph orch ps --refresh" here to get cephadm to update its daemon metadata. Then . . . [ceph: root@vm-00 /]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT crash 3/3 56s ago 10m * mgr 2/2 56s ago 10m count:2 mon 3/5 56s ago 10m count:5 osd.foo 1 33s ago 6m vm-00
and now I can add extra_container_args for osd.0 using the spec for osd.foo service. This should allow the workaround to be applied to osds not currently attached to any specs.
#13 Updated by Ween Jiann Lee over 1 year ago
The unit.meta file is not yet present in Octopus. I'll try to figure something out or wait for the PR release.
Thanks for the help, Adam!
#14 Updated by Ween Jiann Lee over 1 year ago
I manually created the unit.meta, and it seems to work. thanks again.