Bug #21317: Update VPS with latest distro: RuntimeError: Could not reconnect to ubuntu@vpm129.front.sepia.ceph.com - sepia - Ceph

Actions

Copy link

Bug #21317

open

Update VPS with latest distro: RuntimeError: Could not reconnect to ubuntu@vpm129.front.sepia.ceph.com

Added by Vasu Kulkarni over 6 years ago. Updated over 6 years ago.

Status:

New

Priority:

High

Assignee:

David Galloway

Category:

Infrastructure Service

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description

We see this quite frequently in vps jobs, where the task tries to update the kernel and eventually after kernel update(mostly xenial) it fails to reconnect. Ilya has been looking at something similar to this
but can we update the VPS images to just use the latest distro so that kernel update task can then be skipped?


2017-09-08T17:08:53.310 INFO:teuthology.orchestra.run.vpm139:Running: 'sudo python -c \'import shutil, sys; shutil.copyfileobj(sys.stdin, file(sys.argv[1], "wb"))\' /etc/grub.d/01_ceph_kernel && sudo chmod 755 /etc/grub.d/01_ceph_kernel'
2017-09-08T17:08:53.430 INFO:teuthology.task.kernel:Distro Kernel Version: 4.4.0-93-generic
2017-09-08T17:08:53.439 INFO:teuthology.orchestra.run.vpm139:Running: 'sudo update-grub'
2017-09-08T17:08:53.575 INFO:teuthology.orchestra.run.vpm139.stderr:Generating grub configuration file ...
2017-09-08T17:08:53.624 INFO:teuthology.orchestra.run.vpm139.stderr:Found linux image: /boot/vmlinuz-4.4.0-93-generic
2017-09-08T17:08:53.630 INFO:teuthology.orchestra.run.vpm139.stderr:Found initrd image: /boot/initrd.img-4.4.0-93-generic
2017-09-08T17:08:53.735 INFO:teuthology.orchestra.run.vpm139.stderr:Found linux image: /boot/vmlinuz-4.4.0-34-generic
2017-09-08T17:08:53.741 INFO:teuthology.orchestra.run.vpm139.stderr:Found initrd image: /boot/initrd.img-4.4.0-34-generic
2017-09-08T17:08:53.846 INFO:teuthology.orchestra.run.vpm139.stderr:done
2017-09-08T17:08:53.851 INFO:teuthology.orchestra.run.vpm139:Running: 'sudo shutdown -r now'
2017-09-08T17:08:53.859 INFO:teuthology.misc:Re-opening connections...
2017-09-08T17:08:53.864 INFO:teuthology.misc:trying to connect to ubuntu@vpm139.front.sepia.ceph.com
2017-09-08T17:08:53.870 DEBUG:teuthology.orchestra.connection:{'username': 'ubuntu', 'hostname': 'vpm139.front.sepia.ceph.com', 'key_filename': ['/home/teuthworker/.ssh/id_rsa'], 'timeout': 60}
2017-09-08T17:08:54.367 INFO:teuthology.orchestra.run.vpm139:Running: 'true'
2017-09-08T17:08:54.714 INFO:teuthology.misc:trying to connect to ubuntu@vpm045.front.sepia.ceph.com
2017-09-08T17:08:54.718 DEBUG:teuthology.orchestra.connection:{'username': 'ubuntu', 'hostname': 'vpm045.front.sepia.ceph.com', 'key_filename': ['/home/teuthworker/.ssh/id_rsa'], 'timeout': 60}

Actions

Copy link

Updated by Vasu Kulkarni over 6 years ago

http://qa-proxy.ceph.com/teuthology/vasu-2017-09-08_17:02:15-ceph-ansible-luminous-distro-basic-vps/1609649/teuthology.log

Actions

Copy link

Updated by David Galloway over 6 years ago

Category set to Infrastructure Service

Vasu Kulkarni wrote:

http://qa-proxy.ceph.com/teuthology/vasu-2017-09-08_17:02:15-ceph-ansible-luminous-distro-basic-vps/1609649/teuthology.log

That VPS eventually came back up at Sep 8 17:19:34.

Are you asking for CentOS 7.4? I'm not sure what you're asking for here.

Actions

Copy link

Updated by Vasu Kulkarni over 6 years ago

David Galloway wrote:

Vasu Kulkarni wrote:

http://qa-proxy.ceph.com/teuthology/vasu-2017-09-08_17:02:15-ceph-ansible-luminous-distro-basic-vps/1609649/teuthology.log

That VPS eventually came back up at Sep 8 17:19:34.

Are you asking for CentOS 7.4? I'm not sure what you're asking for here.

Sorry I am not asking for 7.4, I am asking is to refresh the current VPS so that it contains the latest distro kernel and the kernel taks would be skipped since the kernel is already latest. right now it tries to update kernel and eventually fails during reconnect.

If you look at the smoke jobs which use vps, you will see atleast couple of jobs that fail due to reconnect issue
ex: http://pulpito.ceph.com/teuthology-2017-09-08_05:00:13-smoke-master-testing-basic-vps/

Actions

Copy link

Updated by Vasu Kulkarni over 6 years ago

Related to : http://tracker.ceph.com/issues/19918

Actions

Copy link

Updated by David Galloway over 6 years ago

Is 4.4.0-92-generic the kernel you're looking for?

EDIT: Determined even the latest 16.04 cloud image ships with 4.4.0-92-generic. 4.4.0-93-generic is the latest. I've asked Vasu to test these jobs on OVH nodes instead of VPSes.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Infrastructure » sepia

Custom queries

Bug #21317

Update VPS with latest distro: RuntimeError: Could not reconnect to ubuntu@vpm129.front.sepia.ceph.com

Updated by Vasu Kulkarni over 6 years ago

Updated by David Galloway over 6 years ago

Updated by Vasu Kulkarni over 6 years ago

Updated by Vasu Kulkarni over 6 years ago

Updated by David Galloway over 6 years ago