Bug #58098: qa/workunits/rados/test_crash.sh: crashes are never posted - RADOS - Ceph

Actions

Copy link

Bug #58098

closed

qa/workunits/rados/test_crash.sh: crashes are never posted

Added by Laura Flores over 1 year ago. Updated over 1 year ago.

Status:

Resolved

Priority:

High

Assignee:

Tim Serong

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Yes

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

49314

Crash signature (v1):

Crash signature (v2):

Description

/a/yuriw-2022-11-23_15:09:06-rados-wip-yuri10-testing-2022-11-22-1711-distro-default-smithi/7087281

2022-11-23T18:09:31.261 INFO:tasks.workunit.client.0.smithi149.stderr:+ sudo systemctl restart ceph-crash
2022-11-23T18:09:31.261 INFO:tasks.workunit.client.0.smithi149.stderr:+ sleep 30
2022-11-23T18:09:31.994 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.0 is failed for ~5s
2022-11-23T18:09:31.995 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.1 is failed for ~5s
2022-11-23T18:09:31.995 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.2 is failed for ~5s
2022-11-23T18:09:37.197 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.0 is failed for ~10s
2022-11-23T18:09:37.198 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.1 is failed for ~10s
2022-11-23T18:09:37.198 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.2 is failed for ~10s
2022-11-23T18:09:42.400 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.0 is failed for ~16s
2022-11-23T18:09:42.400 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.1 is failed for ~16s
2022-11-23T18:09:42.401 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.2 is failed for ~16s
2022-11-23T18:09:47.603 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.0 is failed for ~21s
2022-11-23T18:09:47.603 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.1 is failed for ~21s
2022-11-23T18:09:47.603 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.2 is failed for ~21s
2022-11-23T18:09:52.805 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.0 is failed for ~26s
2022-11-23T18:09:52.805 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.1 is failed for ~26s
2022-11-23T18:09:52.805 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.2 is failed for ~26s
2022-11-23T18:09:58.008 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.0 is failed for ~31s
2022-11-23T18:09:58.008 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.1 is failed for ~31s
2022-11-23T18:09:58.008 INFO:tasks.daemonwatchdog.daemon_watchdog:daemon ceph.osd.2 is failed for ~31s
2022-11-23T18:10:01.262 INFO:tasks.workunit.client.0.smithi149.stderr:++ ceph crash ls
2022-11-23T18:10:01.263 INFO:tasks.workunit.client.0.smithi149.stderr:++ wc -l
2022-11-23T18:10:01.699 DEBUG:teuthology.orchestra.run:got remote process result: 1
2022-11-23T18:10:01.700 INFO:tasks.workunit.client.0.smithi149.stderr:+ '[' 0 = 4 ']'
2022-11-23T18:10:01.700 INFO:tasks.workunit.client.0.smithi149.stderr:+ exit 1

The issue here seems to be that we are checking for crashes too early. After inducing crashes, we give ceph-crash 30 seconds to restart, but it sometimes takes longer than that for osds to restart. In this case, osds 0, 1, and 2 took 31 seconds to restart.

A possible fix is to sleep for longer, or check for crashes in a time loop.

Files

journalctl-b0.gz (97.5 KB) journalctl-b0.gz

Laura Flores, 12/06/2022 03:42 PM

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #58098

qa/workunits/rados/test_crash.sh: crashes are never posted

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Matan Breizman over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Tim Serong over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Tim Serong over 1 year ago

Updated by Tim Serong over 1 year ago

Updated by Tim Serong over 1 year ago

Updated by Tim Serong over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Tim Serong over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Neha Ojha over 1 year ago