Bug #65261: qa/cephfs: cephadm related failure on fs/upgrade job - CephFS - Ceph

Actions

Copy link

Bug #65261

open

qa/cephfs: cephadm related failure on fs/upgrade job

Added by Rishabh Dave about 1 month ago. Updated 5 days ago.

Status:

New

Priority:

Normal

Assignee:

Adam King

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

qa-failure

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

https://pulpito.ceph.com/rishabh-2024-03-29_18:05:24-fs-wip-rishabh-testing-20240327.051042-reef-testing-default-smithi/7629297

Failure reason -

Command failed on smithi153 with status 1: "sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:quincy shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 97c6c898-ee34-11ee-b647-cb9ed24678a4 -- bash -c 'ceph fs dump'"

And following are error messages that the command above printed -

    2024-03-30T01:35:57.342 DEBUG:teuthology.orchestra.run.smithi153:> sudo /home/ubuntu/cephtest/cephadm --image quay.ceph.io/ceph-ci/ceph:quincy shell -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring --fsid 97c6c898-ee34-11ee-b647-cb9ed24678a4 -- bash -c 'ceph fs dump'
    ...
    2024-03-30T01:35:58.812 INFO:journalctl@ceph.mon.smithi155.smithi155.stdout:Mar 30 01:35:58 smithi155 ceph-mon[44621]: osdmap e44: 6 total, 6 up, 6 in
    2024-03-30T01:35:59.697 INFO:teuthology.orchestra.run.smithi153.stderr:Non-zero exit code 127 from /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.ceph.io/ceph-ci/ceph:quincy -e NODE_NAME=smithi153 -e CEPH_USE_RANDOM_NONCE=1 quay.ceph.io/ceph-ci/ceph:quincy -c %u %g /var/lib/ceph
    2024-03-30T01:35:59.697 INFO:journalctl@ceph.mon.smithi153.smithi153.stdout:Mar 30 01:35:59 smithi153 ceph-mon[35159]: osdmap e45: 6 total, 6 up, 6 in
    2024-03-30T01:35:59.698 INFO:teuthology.orchestra.run.smithi153.stderr:stat: stderr Error: OCI runtime error: runc: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: Unit libpod-6c247d5691b53863988bd354b172dee32e47e6ce843b53fdbb3ddb977839bc87.scope not found.
    2024-03-30T01:35:59.698 INFO:teuthology.orchestra.run.smithi153.stderr:ERROR: Failed to extract uid/gid for path /var/lib/ceph: Failed command: /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint stat --init -e CONTAINER_IMAGE=quay.ceph.io/ceph-ci/ceph:quincy -e NODE_NAME=smithi153 -e CEPH_USE_RANDOM_NONCE=1 quay.ceph.io/ceph-ci/ceph:quincy -c %u %g /var/lib/ceph: Error: OCI runtime error: runc: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: Unit libpod-6c247d5691b53863988bd354b172dee32e47e6ce843b53fdbb3ddb977839bc87.scope not found.
    2024-03-30T01:35:59.698 INFO:teuthology.orchestra.run.smithi153.stderr:
    2024-03-30T01:35:59.726 DEBUG:teuthology.orchestra.run:got remote process result: 1
    2024-03-30T01:35:59.726 ERROR:teuthology.run_tasks:Saw exception from tasks.

The task list (copying from job description) is -

tasks/{0-from/quincy 1-volume/{0-create 1-ranks/1 2-allow_standby_replay/yes 3-inline/yes 4-verify} 2-client/kclient 3-upgrade-mgr-staggered 4-config-upgrade/{fail_fs} 5-upgrade-with-workload 6-verify}}

And failure occurred at 1-volume/{0-create, so the actual test didn't even begin running.

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Updated by Rishabh Dave about 1 month ago

Copying insight Adam shared on this failure on Slack -

the Failed to extract uid/gid probably made it misleading, but the real failure
was unable to start container process: error during container init: error
setting cgroup config for procHooks process: Unit
libpod-6c247d5691b53863988bd354b172dee32e47e6ce843b53fdbb3ddb977839bc87.scope not
found which is actually a podman bug in certain versions that we've tracked in
https://tracker.ceph.com/issues/49287

yes, that failure just happens randomly, but not super often, so good chance it
will pass in a rerun

Actions

Copy link

Updated by Rishabh Dave about 1 month ago

Related to Bug #49287: podman: setting cgroup config for procHooks process caused: Unit libpod-$hash.scope not found added

Actions

Copy link

Updated by Rishabh Dave 25 days ago

https://pulpito.ceph.com/rishabh-2024-04-08_08:23:45-fs-wip-rishabh-testing-20240407.092921-reef-testing-default-smithi/7645472
https://pulpito.ceph.com/rishabh-2024-04-08_08:23:45-fs-wip-rishabh-testing-20240407.092921-reef-testing-default-smithi/7645586

Actions

Copy link

Updated by Matan Breizman 5 days ago

/a//teuthology/yuriw-2024-04-20_01:10:46-rados-wip-yuri7-testing-2024-04-18-1351-reef-distro-default-smithi/7664176

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » CephFS

Custom queries

Bug #65261

qa/cephfs: cephadm related failure on fs/upgrade job

Updated by Rishabh Dave about 1 month ago

Updated by Rishabh Dave about 1 month ago

Updated by Rishabh Dave 25 days ago

Updated by Matan Breizman 5 days ago