Bug #57300: ansible.cephlab task failing consistently on rhel 8.6 in the rgw suite - ceph-cm-ansible - Ceph

Actions

Copy link

Bug #57300

closed

ansible.cephlab task failing consistently on rhel 8.6 in the rgw suite

Added by Casey Bodley over 1 year ago. Updated over 1 year ago.

Status:

Resolved

Priority:

Normal

Assignee:

David Galloway

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description

2022-08-25T01:17:27.115 INFO:teuthology.run_tasks:Running task ansible.cephlab...
2022-08-25T01:17:27.133 INFO:teuthology.repo_utils:/home/teuthworker/src/git.ceph.com_git_ceph-cm-ansible_main was just updated or references a specific commit; assuming it is current
2022-08-25T01:17:27.135 INFO:teuthology.repo_utils:Resetting repo at /home/teuthworker/src/git.ceph.com_git_ceph-cm-ansible_main to origin/main
2022-08-25T01:17:27.166 INFO:teuthology.task.ansible:Playbook: [{'import_playbook': 'ansible_managed.yml'}, {'import_playbook': 'teuthology.yml'}, {'hosts': 'testnodes', 'tasks': [{'set_fact': {'ran_from_cephlab_playbook': True}}]}, {'import_playbook': 'testnodes.yml'}, {'import_playbook': 'container-host.yml'}, {'import_playbook': 'cobbler.yml'}, {'import_playbook': 'paddles.yml'}, {'import_playbook': 'pulpito.yml'}, {'hosts': 'testnodes', 'become': True, 'tasks': [{'name': 'Touch /ceph-qa-ready', 'file': {'path': '/ceph-qa-ready', 'state': 'touch'}, 'when': 'ran_from_cephlab_playbook|bool'}]}]
2022-08-25T01:17:27.169 DEBUG:teuthology.task.ansible:Running ansible-playbook -v --extra-vars '{"ansible_ssh_user": "ubuntu"}' -i /etc/ansible/hosts --limit smithi164.front.sepia.ceph.com /home/teuthworker/src/git.ceph.com_git_ceph-cm-ansible_main/cephlab.yml
2022-08-25T01:21:26.815 INFO:teuthology.task.ansible:Archiving ansible failure log at: /home/teuthworker/archive/cbodley-2022-08-24_22:54:44-rgw-wip-cbodley-testing-distro-default-smithi/6990992/ansible_failures.yaml
2022-08-25T01:21:26.822 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_b1d387f12b117399cb87c86aaa341398fa0c0919/teuthology/run_tasks.py", line 106, in run_tasks
    manager.__enter__()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_b1d387f12b117399cb87c86aaa341398fa0c0919/teuthology/task/__init__.py", line 123, in __enter__
    self.begin()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_b1d387f12b117399cb87c86aaa341398fa0c0919/teuthology/task/ansible.py", line 422, in begin
    super(CephLab, self).begin()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_b1d387f12b117399cb87c86aaa341398fa0c0919/teuthology/task/ansible.py", line 265, in begin
    self.execute_playbook()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_b1d387f12b117399cb87c86aaa341398fa0c0919/teuthology/task/ansible.py", line 291, in execute_playbook
    self._handle_failure(command, status)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_b1d387f12b117399cb87c86aaa341398fa0c0919/teuthology/task/ansible.py", line 316, in _handle_failure
    raise AnsibleFailedError(failures)

ex http://qa-proxy.ceph.com/teuthology/cbodley-2022-08-24_22:54:44-rgw-wip-cbodley-testing-distro-default-smithi/6990992/teuthology.log

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Casey Bodley over 1 year ago

Related to Bug #57301: rhel 8.6 jobs fail on ansible deployment added

Actions

Copy link

Updated by David Galloway over 1 year ago

Status changed from New to Resolved
Assignee set to David Galloway

[root@smithi022 ~]# yum -v install krb5-workstation
Loaded plugins: builddep, changelog, config-manager, copr, debug, debuginfo-install, download, generate_completion_cache, groups-manager, kpatch, needs-restarting, playground, product-id, repoclosure, repodiff, repograph, repomanage, reposync, subscription-manager, uploadprofile
Updating Subscription Management repositories.
YUM version: 4.7.0
cachedir: /var/cache/dnf
User-Agent: constructed: 'libdnf (Red Hat Enterprise Linux 8.6; generic; Linux.x86_64)'
repo: using cache for: copr:copr.fedorainfracloud.org:ceph:python3-asyncssh
copr:copr.fedorainfracloud.org:ceph:python3-asyncssh: using metadata from Tue 27 Jul 2021 07:25:39 PM UTC.
repo: using cache for: epel
epel: using metadata from Thu 25 Aug 2022 04:07:20 PM UTC.
repo: using cache for: lab-extras
lab-extras: using metadata from Thu 28 Jan 2021 11:16:50 PM UTC.
Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs)                                                                                                                                                                                            79 kB/s | 1.5 kB     00:00    
reviving: 'rhel-8-for-x86_64-baseos-rpms' can be revived - repomd matches.
loading repo 'rhel-8-for-x86_64-baseos-rpms' failure: loading of MD_TYPE_PRIMARY has failed.
Error: Loading repository 'rhel-8-for-x86_64-baseos-rpms' has failed

Absolutely no indication on the Satellite server whatsoever that something is wrong. No errors.

In the past, I've only seen this when the server hit OOM but that is not the case for yesterday/today's errors. I've rebooted the Satellite server and ran this hack to force resync the repos.

for id in $(hammer -c /etc/hammer/cli.modules.d/foreman.yml repository list | grep -i "baseos" | awk '{ print $1 }'); do hammer -c /etc/hammer/cli.modules.d/foreman.yml repository synchronize --async --validate-contents true --id $id; done

This is documented FFR: https://wiki.sepia.ceph.com/doku.php?id=services:satellite

Actions

Copy link

Updated by Ilya Dryomov over 1 year ago

I'm seeing https://tracker.ceph.com/issues/57332 which affects both centos 8.stream and rhel 8.6. Does not look related, but all test runs are affected...

I see a bunch of recent work on centos 9.stream in ceph-cm-ansible repo, perhaps there is a regression there? David, could you please take a look?

Actions

Copy link