Bug #45902
thrashosds hits watchdog_daemon_timeout during powercycle
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Description
/a/teuthology-2020-06-02_11:15:03-powercycle-master-testing-basic-smithi/5111856
The run fails with "teuthology.exceptions.CommandFailedError: Command failed (workunit test suites/ffsb.sh) on smithi116 with status 1" but earlier we see the following.
2020-06-04T02:30:30.708 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.1 is failed for ~303s 2020-06-04T02:30:30.709 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons
The last entry in osd.1s log is it being told to shut down at 2020-06-04T02:25:24
2020-06-04T02:25:24.909+0000 7f5f38726700 -1 osd.1 31 *** Got signal Terminated *** 2020-06-04T02:25:24.909+0000 7f5f38726700 -1 osd.1 31 *** Immediate shutdown (osd_fast_shutdown=true) ***
It looks like it was shutdown due to a power cycle on smithi036.
2020-06-04T02:25:20.803 INFO:tasks.ceph.ceph_manager.ceph:kill_osd on osd.1 doing powercycle of ubuntu@smithi036.front.sepia.ceph.com
Power on completed at 02:29:02.
2020-06-04T02:29:02.878 INFO:teuthology.orchestra.console:Power on for smithi036 completed
But it's not until 02:30:43 that we try to start osd.1 and by then we've already timed out.
2020-06-04T02:30:43.072 DEBUG:teuthology.orchestra.console:expect after: b'smithi036 login: ' 2020-06-04T02:30:43.226 INFO:teuthology.misc:Re-opening connections... 2020-06-04T02:30:43.226 INFO:teuthology.misc:trying to connect to ubuntu@smithi036.front.sepia.ceph.com 2020-06-04T02:30:43.227 INFO:teuthology.orchestra.remote:Trying to reconnect to host 2020-06-04T02:30:43.228 DEBUG:teuthology.orchestra.connection:{'hostname': 'smithi036.front.sepia.ceph.com', 'username': 'ubuntu', 'timeout': 60} 2020-06-04T02:30:43.676 INFO:teuthology.orchestra.run.smithi036:> true 2020-06-04T02:30:44.245 DEBUG:teuthology.misc:waited 1.0186738967895508 2020-06-04T02:30:45.247 DEBUG:tasks.ceph_manager:Mounting data for osd.1 on ubuntu@smithi036.front.sepia.ceph.com
I think we should adjust the watchdog_daemon_timeout for the powercycle tests.
Related issues
History
#1 Updated by Brad Hubbard almost 4 years ago
- Related to Bug #45900: "ERROR: (22) Invalid argument" in powercycle added
#2 Updated by Brad Hubbard almost 4 years ago
@Neha, I'll take care of this one.
#3 Updated by Neha Ojha almost 4 years ago
Brad Hubbard wrote:
@Neha, I'll take care of this one.
Thanks Brad!
#4 Updated by Deepika Upadhyay over 3 years ago
- Related to Bug #47743: Error ENXIO: problem getting command descriptions from mon added