Project

General

Profile

Actions

Bug #65261

open

qa/cephfs: cephadm related failure on fs/upgrade job

Added by Rishabh Dave about 1 month ago. Updated 5 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
qa-failure
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

https://pulpito.ceph.com/rishabh-2024-03-29_18:05:24-fs-wip-rishabh-testing-20240327.051042-reef-testing-default-smithi/7629297

Failure reason -

Command failed on smithi153 with status 1: "sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:quincy shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 97c6c898-ee34-11ee-b647-cb9ed24678a4 -- bash -c 'ceph fs dump'" 

And following are error messages that the command above printed -

    2024-03-30T01:35:57.342 DEBUG:teuthology.orchestra.run.smithi153:> sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:quincy shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 97c6c898-ee34-11ee-b647-cb9ed24678a4 -- bash -c 'ceph fs dump'
    ...
    2024-03-30T01:35:58.812 INFO:journalctl@ceph.mon.smithi155.smithi155.stdout:Mar 30 01:35:58 smithi155 ceph-mon[44621]: osdmap e44: 6 total, 6 up, 6 in
    2024-03-30T01:35:59.697 INFO:teuthology.orchestra.run.smithi153.stderr:Non-zero exit code 127 from /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.ceph.io/ceph-ci/ceph:quincy -e NODE_NAME=smithi153 -e CEPH_USE_RANDOM_NONCE=1 quay.ceph.io/ceph-ci/ceph:quincy -c %u %g /var/lib/ceph
    2024-03-30T01:35:59.697 INFO:journalctl@ceph.mon.smithi153.smithi153.stdout:Mar 30 01:35:59 smithi153 ceph-mon[35159]: osdmap e45: 6 total, 6 up, 6 in
    2024-03-30T01:35:59.698 INFO:teuthology.orchestra.run.smithi153.stderr:stat: stderr Error: OCI runtime error: runc: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: Unit libpod-6c247d5691b53863988bd354b172dee32e47e6ce843b53fdbb3ddb977839bc87.scope not found.
    2024-03-30T01:35:59.698 INFO:teuthology.orchestra.run.smithi153.stderr:ERROR: Failed to extract uid/gid for path /var/lib/ceph: Failed command: /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.ceph.io/ceph-ci/ceph:quincy -e NODE_NAME=smithi153 -e CEPH_USE_RANDOM_NONCE=1 quay.ceph.io/ceph-ci/ceph:quincy -c %u %g /var/lib/ceph: Error: OCI runtime error: runc: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: Unit libpod-6c247d5691b53863988bd354b172dee32e47e6ce843b53fdbb3ddb977839bc87.scope not found.
    2024-03-30T01:35:59.698 INFO:teuthology.orchestra.run.smithi153.stderr:
    2024-03-30T01:35:59.726 DEBUG:teuthology.orchestra.run:got remote process result: 1
    2024-03-30T01:35:59.726 ERROR:teuthology.run_tasks:Saw exception from tasks.

The task list (copying from job description) is -

tasks/{0-from/quincy 1-volume/{0-create 1-ranks/1 2-allow_standby_replay/yes 3-inline/yes 4-verify} 2-client/kclient 3-upgrade-mgr-staggered 4-config-upgrade/{fail_fs} 5-upgrade-with-workload 6-verify}}

And failure occurred at 1-volume/{0-create, so the actual test didn't even begin running.


Related issues 1 (1 open0 closed)

Related to Orchestrator - Bug #49287: podman: setting cgroup config for procHooks process caused: Unit libpod-$hash.scope not foundNew

Actions
Actions #1

Updated by Rishabh Dave about 1 month ago

Copying insight Adam shared on this failure on Slack -

the Failed to extract uid/gid probably made it misleading, but the real failure
was unable to start container process: error during container init: error
setting cgroup config for procHooks process: Unit
libpod-6c247d5691b53863988bd354b172dee32e47e6ce843b53fdbb3ddb977839bc87.scope not
found which is actually a podman bug in certain versions that we've tracked in
https://tracker.ceph.com/issues/49287
yes, that failure just happens randomly, but not super often, so good chance it
will pass in a rerun
Actions #2

Updated by Rishabh Dave about 1 month ago

  • Related to Bug #49287: podman: setting cgroup config for procHooks process caused: Unit libpod-$hash.scope not found added
Actions #4

Updated by Matan Breizman 5 days ago

/a//teuthology/yuriw-2024-04-20_01:10:46-rados-wip-yuri7-testing-2024-04-18-1351-reef-distro-default-smithi/7664176

Actions

Also available in: Atom PDF