Project

General

Profile

Actions

Bug #57092

closed

teuthology reimage Ubuntu 22.04 fails w/ ssh_keyscan reached maximum tries

Added by David Galloway over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

2022-08-10 13:32:50,385.385 INFO:teuthology.provision.fog.smithi100:Scheduling deploy of ubuntu 22.04
2022-08-10 13:32:50,907.907 INFO:teuthology.orchestra.console:Power off smithi100
2022-08-10 13:33:00,388.388 INFO:teuthology.orchestra.console:Power off for smithi100 completed
2022-08-10 13:33:00,488.488 INFO:teuthology.orchestra.console:Power on smithi100
2022-08-10 13:33:06,197.197 INFO:teuthology.orchestra.console:Power on for smithi100 completed
2022-08-10 13:33:06,299.299 INFO:teuthology.provision.fog.smithi100:Waiting for deploy to finish
2022-08-10 13:38:28,179.179 INFO:teuthology.orchestra.run:Running command with timeout 600
2022-08-10 13:38:28,723.723 INFO:teuthology.provision.fog.smithi100:Node is ready
2022-08-10 13:38:28,789.789 INFO:teuthology.orchestra.run.smithi100.stdout:smithi100.front.sepia.ceph.com
2022-08-10 13:38:28,892.892 INFO:teuthology.orchestra.run.smithi100.stdout:172.21.15.100 smithi100.front.sepia.ceph.com smithi100
2022-08-10 13:38:29,420.420 INFO:teuthology.provision.fog.smithi100:Deploy complete!
Traceback (most recent call last):
  File "/home/dgalloway/git/ceph/teuthology/virtualenv/bin/teuthology-lock", line 33, in <module>
    sys.exit(load_entry_point('teuthology', 'console_scripts', 'teuthology-lock')())
  File "/home/dgalloway/git/ceph/teuthology/scripts/lock.py", line 18, in main
    sys.exit(teuthology.lock.cli.main(parse_args(sys.argv[1:])))
  File "/home/dgalloway/git/ceph/teuthology/teuthology/lock/cli.py", line 211, in main
    ctx.desc, ctx.os_type, ctx.os_version, ctx.arch)
  File "/home/dgalloway/git/ceph/teuthology/teuthology/lock/ops.py", line 142, in lock_many
    return reimage_machines(ctx, machines, machine_type)
  File "/home/dgalloway/git/ceph/teuthology/teuthology/lock/ops.py", line 325, in reimage_machines
    reimaged = do_update_keys(list(reimaged.keys()))[1]
  File "/home/dgalloway/git/ceph/teuthology/teuthology/lock/ops.py", line 288, in do_update_keys
    keys_dict = misc.ssh_keyscan(machines, _raise=_raise)
  File "/home/dgalloway/git/ceph/teuthology/teuthology/misc.py", line 1108, in ssh_keyscan
    while proceed():
  File "/home/dgalloway/git/ceph/teuthology/teuthology/contextutil.py", line 133, in __call__
    raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: 'ssh_keyscan smithi100.front.sepia.ceph.com' reached maximum tries (5) after waiting for 5 seconds

I can reimage and ssh directly though.

See http://qa-proxy.ceph.com/teuthology/dgalloway-2022-08-10_16:43:02-fs-main-distro-default-smithi/6965885/console_logs/

Actions #1

Updated by Zack Cerza over 1 year ago

  • Status changed from New to Resolved
  • Assignee set to Zack Cerza
Actions #2

Updated by Matan Breizman over 1 year ago

Also failed on RHEL:
/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973876/

Actions #3

Updated by David Galloway over 1 year ago

Matan Breizman wrote:

Also failed on RHEL:
/a/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973876/

I think this is different. It seems smithi165 should have actually been available according to the reimage log.

http://qa-proxy.ceph.com/teuthology/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973876/console_logs/smithi165_reimage.log

+ TheTimeIs
2022-08-16T00:05:14.063
+ touch /.cephlab_net_configured
+ break
+ set +e
+ attempts=0
+ myips=
+ '[' '' '!=' '' ']'
+ '[' 0 -ge 10 ']'
++ ip -4 addr
++ grep -oP '(?<=inet\s)\d+(\.\d+){3}'
++ grep -v '127.0.0.1\|127.0.1.1'
+ myips=172.21.15.165
+ attempts=1
+ sleep 1
+ '[' 172.21.15.165 '!=' '' ']'
+ set -e
+ '[' -n 172.21.15.165 ']'
+ for ip in $myips
+ timeout 1s ping -I 172.21.15.165 -nq -c1 172.21.0.1
++ dig +short -x 172.21.15.165 @172.21.0.1
++ sed 's/\.com.*/\.com/g'
+ newhostname=smithi165.front.sepia.ceph.com
+ '[' -n smithi165.front.sepia.ceph.com ']'
+ hostname smithi165.front.sepia.ceph.com
++ hostname -d
+ newdomain=front.sepia.ceph.com
++ hostname -s
+ shorthostname=smithi165
+ echo smithi165
+ grep -q front.sepia.ceph.com /etc/hosts
+ sed -i 's/.*front.sepia.ceph.com.*/172.21.15.165 smithi165.front.sepia.ceph.com smithi165/g' /etc/hosts
+ break
+ command -v zypper
+ command -v apt-get
+ '[' -e /.cephlab_rc_local ']'
+ exit 0
[[0;32m  OK  [0m] Started /etc/rc.d/rc.local Compatibility.

         Starting Terminate Plymouth Boot Screen...

         Starting Hold until boot process finishes up...

Red Hat Enterprise Linux 8.4 (Ootpa)
Kernel 4.18.0-372.19.1.el8_6.x86_64 on an x86_64

Activate the web console with: systemctl enable --now cockpit.socket

smithi165 login:

Yet

http://qa-proxy.ceph.com/teuthology/yuriw-2022-08-15_17:54:08-rados-wip-yuri2-testing-2022-08-15-0848-quincy-distro-default-smithi/6973876/console_logs/smithi165_reimage.log

2022-08-16T00:00:29.746 INFO:teuthology.provision.fog.smithi074:Waiting for deploy to finish
2022-08-16T00:00:29.794 INFO:teuthology.orchestra.console:Power on for smithi165 completed
2022-08-16T00:00:29.896 INFO:teuthology.provision.fog.smithi165:Waiting for deploy to finish
2022-08-16T00:00:37.897 INFO:teuthology.orchestra.console:Power on for smithi066 completed
2022-08-16T00:00:37.998 INFO:teuthology.provision.fog.smithi066:Waiting for deploy to finish
2022-08-16T00:03:24.571 ERROR:teuthology.orchestra.connection:Error authenticating with smithi066.front.sepia.ceph.com: Authentication failed.
2022-08-16T00:05:01.640 INFO:teuthology.orchestra.run:Running command with timeout 600
2022-08-16T00:05:01.861 INFO:teuthology.provision.fog.smithi074:Node is ready
2022-08-16T00:05:01.877 INFO:teuthology.orchestra.run.smithi074.stdout:smithi074.front.sepia.ceph.com
2022-08-16T00:05:01.933 INFO:teuthology.orchestra.run.smithi074.stdout:172.21.15.74 smithi074.front.sepia.ceph.com smithi074
2022-08-16T00:05:02.252 INFO:teuthology.provision.fog.smithi074:Deploy complete!
2022-08-16T00:05:28.320 INFO:teuthology.orchestra.run:Running command with timeout 600
2022-08-16T00:05:28.389 INFO:teuthology.provision.fog.smithi165:Node is ready
2022-08-16T00:05:28.445 INFO:teuthology.orchestra.run.smithi165.stdout:smithi165.front.sepia.ceph.com
2022-08-16T00:05:28.501 INFO:teuthology.orchestra.run.smithi165.stdout:172.21.15.165 smithi165.front.sepia.ceph.com smithi165
2022-08-16T00:05:28.846 INFO:teuthology.provision.fog.smithi165:Deploy complete!
2022-08-16T00:05:55.963 INFO:teuthology.orchestra.run:Running command with timeout 600
2022-08-16T00:05:56.193 INFO:teuthology.provision.fog.smithi066:Node is ready
2022-08-16T00:05:56.209 INFO:teuthology.orchestra.run.smithi066.stdout:smithi066.front.sepia.ceph.com
2022-08-16T00:05:56.265 INFO:teuthology.orchestra.run.smithi066.stdout:172.21.15.66 smithi066.front.sepia.ceph.com smithi066
2022-08-16T00:05:56.591 INFO:teuthology.provision.fog.smithi066:Deploy complete!
2022-08-16T00:06:05.834 ERROR:teuthology.dispatcher.supervisor:Reimaging error. Nuking machines...
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e7483cc68a9eb6b54dacbb0bec3bf23a5d32425/teuthology/dispatcher/supervisor.py", line 209, in reimage
    reimaged = reimage_machines(ctx, targets, job_config['machine_type'])
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e7483cc68a9eb6b54dacbb0bec3bf23a5d32425/teuthology/lock/ops.py", line 325, in reimage_machines
    reimaged = do_update_keys(list(reimaged.keys()))[1]
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e7483cc68a9eb6b54dacbb0bec3bf23a5d32425/teuthology/lock/ops.py", line 288, in do_update_keys
    keys_dict = misc.ssh_keyscan(machines, _raise=_raise)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e7483cc68a9eb6b54dacbb0bec3bf23a5d32425/teuthology/misc.py", line 1108, in ssh_keyscan
    while proceed():
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_9e7483cc68a9eb6b54dacbb0bec3bf23a5d32425/teuthology/contextutil.py", line 133, in __call__
    raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: 'ssh_keyscan smithi165.front.sepia.ceph.com' reached maximum tries (5) after waiting for 5 seconds

Maybe it just needed another minute?

Actions

Also available in: Atom PDF