Bug #57300
ansible.cephlab task failing consistently on rhel 8.6 in the rgw suite
0%
Description
2022-08-25T01:17:27.115 INFO:teuthology.run_tasks:Running task ansible.cephlab... 2022-08-25T01:17:27.133 INFO:teuthology.repo_utils:/home/teuthworker/src/git.ceph.com_git_ceph-cm-ansible_main was just updated or references a specific commit; assuming it is current 2022-08-25T01:17:27.135 INFO:teuthology.repo_utils:Resetting repo at /home/teuthworker/src/git.ceph.com_git_ceph-cm-ansible_main to origin/main 2022-08-25T01:17:27.166 INFO:teuthology.task.ansible:Playbook: [{'import_playbook': 'ansible_managed.yml'}, {'import_playbook': 'teuthology.yml'}, {'hosts': 'testnodes', 'tasks': [{'set_fact': {'ran_from_cephlab_playbook': True}}]}, {'import_playbook': 'testnodes.yml'}, {'import_playbook': 'container-host.yml'}, {'import_playbook': 'cobbler.yml'}, {'import_playbook': 'paddles.yml'}, {'import_playbook': 'pulpito.yml'}, {'hosts': 'testnodes', 'become': True, 'tasks': [{'name': 'Touch /ceph-qa-ready', 'file': {'path': '/ceph-qa-ready', 'state': 'touch'}, 'when': 'ran_from_cephlab_playbook|bool'}]}] 2022-08-25T01:17:27.169 DEBUG:teuthology.task.ansible:Running ansible-playbook -v --extra-vars '{"ansible_ssh_user": "ubuntu"}' -i /etc/ansible/hosts --limit smithi164.front.sepia.ceph.com /home/teuthworker/src/git.ceph.com_git_ceph-cm-ansible_main/cephlab.yml 2022-08-25T01:21:26.815 INFO:teuthology.task.ansible:Archiving ansible failure log at: /home/teuthworker/archive/cbodley-2022-08-24_22:54:44-rgw-wip-cbodley-testing-distro-default-smithi/6990992/ansible_failures.yaml 2022-08-25T01:21:26.822 ERROR:teuthology.run_tasks:Saw exception from tasks. Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_b1d387f12b117399cb87c86aaa341398fa0c0919/teuthology/run_tasks.py", line 106, in run_tasks manager.__enter__() File "/home/teuthworker/src/git.ceph.com_git_teuthology_b1d387f12b117399cb87c86aaa341398fa0c0919/teuthology/task/__init__.py", line 123, in __enter__ self.begin() File "/home/teuthworker/src/git.ceph.com_git_teuthology_b1d387f12b117399cb87c86aaa341398fa0c0919/teuthology/task/ansible.py", line 422, in begin super(CephLab, self).begin() File "/home/teuthworker/src/git.ceph.com_git_teuthology_b1d387f12b117399cb87c86aaa341398fa0c0919/teuthology/task/ansible.py", line 265, in begin self.execute_playbook() File "/home/teuthworker/src/git.ceph.com_git_teuthology_b1d387f12b117399cb87c86aaa341398fa0c0919/teuthology/task/ansible.py", line 291, in execute_playbook self._handle_failure(command, status) File "/home/teuthworker/src/git.ceph.com_git_teuthology_b1d387f12b117399cb87c86aaa341398fa0c0919/teuthology/task/ansible.py", line 316, in _handle_failure raise AnsibleFailedError(failures)
ex http://qa-proxy.ceph.com/teuthology/cbodley-2022-08-24_22:54:44-rgw-wip-cbodley-testing-distro-default-smithi/6990992/teuthology.log
Related issues
History
#1 Updated by Casey Bodley over 1 year ago
- Related to Bug #57301: rhel 8.6 jobs fail on ansible deployment added
#2 Updated by David Galloway over 1 year ago
- Status changed from New to Resolved
- Assignee set to David Galloway
[root@smithi022 ~]# yum -v install krb5-workstation Loaded plugins: builddep, changelog, config-manager, copr, debug, debuginfo-install, download, generate_completion_cache, groups-manager, kpatch, needs-restarting, playground, product-id, repoclosure, repodiff, repograph, repomanage, reposync, subscription-manager, uploadprofile Updating Subscription Management repositories. YUM version: 4.7.0 cachedir: /var/cache/dnf User-Agent: constructed: 'libdnf (Red Hat Enterprise Linux 8.6; generic; Linux.x86_64)' repo: using cache for: copr:copr.fedorainfracloud.org:ceph:python3-asyncssh copr:copr.fedorainfracloud.org:ceph:python3-asyncssh: using metadata from Tue 27 Jul 2021 07:25:39 PM UTC. repo: using cache for: epel epel: using metadata from Thu 25 Aug 2022 04:07:20 PM UTC. repo: using cache for: lab-extras lab-extras: using metadata from Thu 28 Jan 2021 11:16:50 PM UTC. Red Hat Enterprise Linux 8 for x86_64 - BaseOS (RPMs) 79 kB/s | 1.5 kB 00:00 reviving: 'rhel-8-for-x86_64-baseos-rpms' can be revived - repomd matches. loading repo 'rhel-8-for-x86_64-baseos-rpms' failure: loading of MD_TYPE_PRIMARY has failed. Error: Loading repository 'rhel-8-for-x86_64-baseos-rpms' has failed
Absolutely no indication on the Satellite server whatsoever that something is wrong. No errors.
In the past, I've only seen this when the server hit OOM but that is not the case for yesterday/today's errors. I've rebooted the Satellite server and ran this hack to force resync the repos.
for id in $(hammer -c /etc/hammer/cli.modules.d/foreman.yml repository list | grep -i "baseos" | awk '{ print $1 }'); do hammer -c /etc/hammer/cli.modules.d/foreman.yml repository synchronize --async --validate-contents true --id $id; done
This is documented FFR: https://wiki.sepia.ceph.com/doku.php?id=services:satellite
#3 Updated by Ilya Dryomov over 1 year ago
I'm seeing https://tracker.ceph.com/issues/57332 which affects both centos 8.stream and rhel 8.6. Does not look related, but all test runs are affected...
I see a bunch of recent work on centos 9.stream in ceph-cm-ansible repo, perhaps there is a regression there? David, could you please take a look?
#4 Updated by David Galloway over 1 year ago
Ilya Dryomov wrote:
I see a bunch of recent work on centos 9.stream in ceph-cm-ansible repo, perhaps there is a regression there? David, could you please take a look?