Bug #61159
openAssertionError: failed to install new kernel version within timeout
Added by Laura Flores 12 months ago. Updated about 1 month ago.
0%
Description
/a/yuriw-2023-05-11_15:01:38-rados-wip-yuri8-testing-2023-05-10-1402-distro-default-smithi/7271297
2023-05-11T19:35:34.582 DEBUG:teuthology.task.kernel:Distro of this test job: rhel
2023-05-11T19:35:34.582 ERROR:teuthology.task.kernel:Saw exception
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_69caab8ec6f44c61c991ae693e8e7b8cd859a896/teuthology/task/kernel.py", line 710, in wait_for_reboot
assert not need_to_install(ctx, client, need_install[client]), \
AssertionError: failed to install new kernel version within timeout
Updated by Laura Flores 12 months ago
This failure is seen across all distros.
See https://pulpito.ceph.com/yuriw-2023-05-11_15:01:38-rados-wip-yuri8-testing-2023-05-10-1402-distro-default-smithi/ for more examples.
Updated by Laura Flores 12 months ago
- Related to Bug #43335: AssertionError: failed to install new kernel version within timeout added
Updated by Laura Flores 12 months ago
- Related to Bug #38802: "failed to install new kernel version within timeout" on mimic added
Updated by Laura Flores 12 months ago
- Related to Bug #18701: failed to install new kernel version added
Updated by Laura Flores 12 months ago
- Related to Bug #12872: failed to install new kernel version within timeout added
Updated by Laura Flores 12 months ago
- Related to Bug #11957: 'AssertionError: failed to install new kernel version within timeout' added
Updated by Laura Flores 12 months ago
Laura Flores wrote:
/a/yuriw-2023-05-11_15:01:38-rados-wip-yuri8-testing-2023-05-10-1402-distro-default-smithi/7271297
[...]
Meant to include more of the log:
2023-05-11T19:35:34.563 INFO:teuthology.orchestra.remote:Successfully reconnected to host 'ubuntu@smithi103.front.sepia.ceph.com'
2023-05-11T19:35:34.564 INFO:teuthology.task.kernel:Checking kernel version of mon.b, want "4.18.0-372.32.1.el8_6.x86_64"...
2023-05-11T19:35:34.564 DEBUG:teuthology.orchestra.run.smithi103:> uname -r
2023-05-11T19:35:34.581 INFO:teuthology.orchestra.run.smithi103.stdout:4.18.0-372.9.1.el8.x86_64
2023-05-11T19:35:34.582 DEBUG:teuthology.task.kernel:current kernel version is 4.18.0-372.9.1.el8.x86_64 vs 4.18.0-372.32.1.el8_6.x86_64
2023-05-11T19:35:34.582 DEBUG:teuthology.task.kernel:Distro of this test job: rhel
2023-05-11T19:35:34.582 ERROR:teuthology.task.kernel:Saw exception
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_teuthology_69caab8ec6f44c61c991ae693e8e7b8cd859a896/teuthology/task/kernel.py", line 710, in wait_for_reboot
assert not need_to_install(ctx, client, need_install[client]), \
AssertionError: failed to install new kernel version within timeout
Updated by Ilya Dryomov 12 months ago
2023-05-11T19:38:58.798 DEBUG:teuthology.task.kernel:get_latest_image_version_rpm: 4.18.0-372.32.1.el8_6.x86_64 2023-05-11T19:38:58.798 INFO:teuthology.task.kernel:Not newest distro kernel. Current: 4.18.0-372.9.1.el8.x86_64 Expected: 4.18.0-372.32.1.el8_6.x86_64 2023-05-11T19:38:58.798 INFO:teuthology.task.kernel:Skipping firmware on distro kernel 2023-05-11T19:38:58.798 DEBUG:teuthology.task.kernel:src is distro, skipping download 2023-05-11T19:38:58.798 INFO:teuthology.task.kernel:Installing distro kernel on mon.a... 2023-05-11T19:38:58.799 DEBUG:teuthology.task.kernel:install_kernel(remote=ubuntu@smithi189.front.sepia.ceph.com, path=None, version=distro) 2023-05-11T19:38:58.799 DEBUG:teuthology.orchestra.run.smithi189:> rpm -q kernel --last | head -n 1 2023-05-11T19:38:58.936 INFO:teuthology.orchestra.run.smithi189.stdout:kernel-4.18.0-372.32.1.el8_6.x86_64 Thu 11 May 2023 07:38:43 PM UTC 2023-05-11T19:38:58.938 DEBUG:teuthology.task.kernel:get_latest_image_version_rpm: 4.18.0-372.32.1.el8_6.x86_64 2023-05-11T19:38:58.938 DEBUG:teuthology.orchestra.run.smithi189:> sudo rpm -qi grub2-tools 2023-05-11T19:38:58.970 INFO:teuthology.orchestra.run.smithi189.stdout:Name : grub2-tools 2023-05-11T19:38:58.970 INFO:teuthology.orchestra.run.smithi189.stdout:Epoch : 1 2023-05-11T19:38:58.971 INFO:teuthology.orchestra.run.smithi189.stdout:Version : 2.02 2023-05-11T19:38:58.971 INFO:teuthology.orchestra.run.smithi189.stdout:Release : 123.el8 2023-05-11T19:38:58.971 INFO:teuthology.orchestra.run.smithi189.stdout:Architecture: x86_64 2023-05-11T19:38:58.971 INFO:teuthology.orchestra.run.smithi189.stdout:Install Date: Thu 14 Jul 2022 06:22:19 PM UTC 2023-05-11T19:38:58.971 INFO:teuthology.orchestra.run.smithi189.stdout:Group : System Environment/Base 2023-05-11T19:38:58.971 INFO:teuthology.orchestra.run.smithi189.stdout:Size : 9265450 2023-05-11T19:38:58.971 INFO:teuthology.orchestra.run.smithi189.stdout:License : GPLv3+ 2023-05-11T19:38:58.972 INFO:teuthology.orchestra.run.smithi189.stdout:Signature : RSA/SHA256, Thu 31 Mar 2022 07:24:35 PM UTC, Key ID 199e2f91fd431d51 2023-05-11T19:38:58.972 INFO:teuthology.orchestra.run.smithi189.stdout:Source RPM : grub2-2.02-123.el8.src.rpm 2023-05-11T19:38:58.972 INFO:teuthology.orchestra.run.smithi189.stdout:Build Date : Mon 28 Mar 2022 02:12:09 PM UTC 2023-05-11T19:38:58.972 INFO:teuthology.orchestra.run.smithi189.stdout:Build Host : x86-vm-09.build.eng.bos.redhat.com 2023-05-11T19:38:58.972 INFO:teuthology.orchestra.run.smithi189.stdout:Relocations : (not relocatable) 2023-05-11T19:38:58.972 INFO:teuthology.orchestra.run.smithi189.stdout:Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> 2023-05-11T19:38:58.972 INFO:teuthology.orchestra.run.smithi189.stdout:Vendor : Red Hat, Inc. 2023-05-11T19:38:58.973 INFO:teuthology.orchestra.run.smithi189.stdout:URL : http://www.gnu.org/software/grub/ 2023-05-11T19:38:58.973 INFO:teuthology.orchestra.run.smithi189.stdout:Summary : Support tools for GRUB. 2023-05-11T19:38:58.973 INFO:teuthology.orchestra.run.smithi189.stdout:Description : 2023-05-11T19:38:58.973 INFO:teuthology.orchestra.run.smithi189.stdout: 2023-05-11T19:38:58.973 INFO:teuthology.orchestra.run.smithi189.stdout:The GRand Unified Bootloader (GRUB) is a highly configurable and 2023-05-11T19:38:58.973 INFO:teuthology.orchestra.run.smithi189.stdout:customizable bootloader with modular architecture. It supports a rich 2023-05-11T19:38:58.973 INFO:teuthology.orchestra.run.smithi189.stdout:variety of kernel formats, file systems, computer architectures and 2023-05-11T19:38:58.974 INFO:teuthology.orchestra.run.smithi189.stdout:hardware devices. 2023-05-11T19:38:58.974 INFO:teuthology.orchestra.run.smithi189.stdout: 2023-05-11T19:38:58.974 INFO:teuthology.orchestra.run.smithi189.stdout:This subpackage provides tools for support of all platforms. 2023-05-11T19:38:58.975 INFO:teuthology.task.kernel:Updating Grub Version: grub2 2023-05-11T19:38:58.975 INFO:teuthology.task.kernel:Updating grub on smithi189 to boot 4.18.0-372.32.1.el8_6.x86_64 2023-05-11T19:38:58.975 DEBUG:teuthology.orchestra.run.smithi189:> sudo grub2-mkconfig -o /boot/grub2/grub.cfg 2023-05-11T19:39:00.548 INFO:teuthology.orchestra.run.smithi189.stderr:Generating grub configuration file ... 2023-05-11T19:39:02.380 INFO:teuthology.orchestra.run.smithi189.stderr:done 2023-05-11T19:39:02.381 DEBUG:teuthology.orchestra.run.smithi189:> mktemp 2023-05-11T19:39:02.398 INFO:teuthology.orchestra.run.smithi189.stdout:/tmp/tmp.7krGLTVuFj 2023-05-11T19:39:02.399 DEBUG:teuthology.orchestra.run.smithi189:> sudo cp /boot/grub2/grub.cfg /tmp/tmp.7krGLTVuFj 2023-05-11T19:39:02.481 DEBUG:teuthology.orchestra.run.smithi189:> sudo chmod 0666 /tmp/tmp.7krGLTVuFj 2023-05-11T19:39:02.641 DEBUG:teuthology.orchestra.remote:smithi189:/tmp/tmp.7krGLTVuFj is 6KB 2023-05-11T19:39:02.689 DEBUG:teuthology.orchestra.run.smithi189:> rm -fr /tmp/tmp.7krGLTVuFj 2023-05-11T19:39:02.705 DEBUG:teuthology.orchestra.run.smithi189:> sudo /bin/ls /boot/loader/entries || true 2023-05-11T19:39:02.794 INFO:teuthology.orchestra.run.smithi189.stdout:5a84457b7b52416e95d47d02e2e4a60f-0-rescue.conf 2023-05-11T19:39:02.795 INFO:teuthology.orchestra.run.smithi189.stdout:5a84457b7b52416e95d47d02e2e4a60f-4.18.0-372.9.1.el8.x86_64.conf 2023-05-11T19:39:02.796 WARNING:teuthology.task.kernel:Unable to update grub2 order
2023-05-11T19:41:08.578 INFO:teuthology.task.kernel:Checking kernel version of mon.a, want "4.18.0-372.32.1.el8_6.x86_64"... 2023-05-11T19:41:08.578 DEBUG:teuthology.orchestra.run.smithi189:> uname -r 2023-05-11T19:41:08.596 INFO:teuthology.orchestra.run.smithi189.stdout:4.18.0-372.9.1.el8.x86_64 2023-05-11T19:41:08.596 DEBUG:teuthology.task.kernel:current kernel version is 4.18.0-372.9.1.el8.x86_64 vs 4.18.0-372.32.1.el8_6.x86_64
I think the "Unable to update grub2 order" error is the culprit. The node gets rebooted into the same kernel (one it was running before the reboot) and the new kernel version check just does its job...
Updated by Laura Flores 12 months ago
Thanks Ilya! Looks like the teuthology task is failing around here:
https://github.com/ceph/teuthology/blob/69caab8ec6f44c61c991ae693e8e7b8cd859a896/teuthology/task/kernel.py#L942-L961
if '\nmenuentry ' not in grub2conf:
# okay, do the newer (el8) grub2 thing
grub2conf = remote.sh('sudo /bin/ls /boot/loader/entries || true')
entry = None
for line in grub2conf.split(null):
if line.endswith('.conf') and newversion in line:
entry = line[:-5] # drop .conf suffix
break
else:
# do old menuitem counting thing
for line in grub2conf.split(null):
if line.startswith('menuentry '):
if newversion in line:
break
entry_num += 1
entry = str(entry_num)
if entry is None:
log.warning('Unable to update grub2 order')
else:
remote.run(args=['sudo', grubset, entry])
Updated by Dan Mick 12 months ago
Turn over a rock...
The fundamental problem is that, while the kernel package was installed, its postinstall script that updates /boot/loader/entries, which is what builds the grub menu, was silently failing. It turns out it was silently failing because it demands to have a machine-id to use for the filename in /boot/loader/entries (for some reason; I don't know how there could be any confusion about which machine you're booting, but, there you go), and code added in January to ceph-cm-ansible by https://github.com/ceph/ceph-cm-ansible/pull/717 was deleting and not restoring /etc/machine-id. So fix 1 is recreating /etc/machine-id (and /var/lib/dbus/machine-id) so that the kernel package install works.
Fix 2 is that when more than one kernel package exists, the "rpm -q --last kernel" command is used to identify the newest kernel (based on installation time), but since the packages may well have just been reinstalled by "yum reinstall kernel", there's no guarantee which package was installed first. What we want is the latest version kernel, which we can get with rpm -q kernel | sort -V.
However, a newly-provisioned RHEL machine cannot install kernel packages because it hasn't yet subscribed to a package server (like our local Satellite instance). So the ansible roles have to run on the host before they can successfully install a new package. However, the ansible roles aren't run before the kernel task executes.
So far, I haven't been able to untangle what ought to be happening to resolve the ordering above.
Updated by Dan Mick 12 months ago
A bit more info:
The situation only occurs when 1) a RHEL machine is able to get at its subscription-manager repo, and 2) there is a later kernel package available. This isn't normally the case; it was occurring because of some kind of confusion in the provisioning process that allowed a host to get past the lock operation without actually reimaging. In one case the job was requesting ubuntu, but it got a host running RHEL 8.6 from a past job.
We're looking at making the lock operation validate the requested distro/version, but this is a "shouldn't happen" path. There's still a thought that there's something not quite right with our reimaging paths; failures are happening and it's not yet clear why, and this was fallout from a particularly-specific sort of failure.
Updated by Dan Mick 12 months ago
https://github.com/ceph/teuthology/pull/1843 addresses the "wrong sorting of rpm -q results" aspect.
Updated by Dan Mick 11 months ago
https://github.com/ceph/teuthology/pull/1851 handles the "verify installed OS version". We've had one known failure so far.
Updated by Laura Flores about 1 month ago
/a/teuthology-2024-03-29_02:08:11-upgrade-squid-distro-default-smithi/7629101