Project

General

Profile

Actions

Bug #48871

open

nautilus: rados/test_crash.sh: "kill ceph-osd" times out

Added by Deepika Upadhyay over 3 years ago. Updated over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

failure reason:

failure_reason: '"2021-01-11 20:11:57.011650 osd.2 (osd.2) 1 : cluster [WRN] Monitor
  daemon marked osd.2 down, but it is still running" in cluster log'

gdb traceback:

2021-01-11T20:10:48.855 INFO:tasks.workunit.client.0.smithi026.stderr:Program terminated with signal SIGABRT, Aborted.
2021-01-11T20:10:48.856 INFO:tasks.workunit.client.0.smithi026.stderr:#0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:51
2021-01-11T20:10:48.856 INFO:tasks.workunit.client.0.smithi026.stderr:[Current thread is 1 (Thread 0x7f1ee902ac00 (LWP 10980))]
2021-01-11T20:10:48.856 INFO:tasks.workunit.client.0.smithi026.stderr:(gdb)  .*terminated.*signal 6.*

2021-01-11T20:10:41.116 INFO:tasks.ceph.osd.1.smithi026.stderr: ceph version 14.2.16-149-ga45d03d (a45d03d1f4274e5799c8f24743ee4f16aa657dff) nautilus (stable)
2021-01-11T20:10:41.117 INFO:tasks.ceph.osd.1.smithi026.stderr: 1: (()+0x12980) [0x7f90a22e8980]
2021-01-11T20:10:41.117 INFO:tasks.ceph.osd.1.smithi026.stderr: 2: (pthread_cond_wait()+0x243) [0x7f90a22e3ad3]
2021-01-11T20:10:41.117 INFO:tasks.ceph.osd.1.smithi026.stderr: 3: (SimpleMessenger::wait()+0x3ef) [0x55830194631f]
2021-01-11T20:10:41.117 INFO:tasks.ceph.osd.1.smithi026.stderr: 4: (main()+0x5310) [0x5583010de1a0]
2021-01-11T20:10:41.118 INFO:tasks.ceph.osd.1.smithi026.stderr: 5: (__libc_start_main()+0xe7) [0x7f90a0f7dbf7]
2021-01-11T20:10:41.118 INFO:tasks.ceph.osd.1.smithi026.stderr: 6: (_start()+0x2a) [0x55830110fc1a]
2021-01-11T20:10:41.119 INFO:tasks.workunit.client.0.smithi026.stderr:+ sleep 5

no core dump collected.
on rerun was not able to reproduce the issue.

http://qa-proxy.ceph.com/teuthology/yuriw-2021-01-08_21:07:06-rados-wip-yuri8-testing-2021-01-08-0939-nautilus-distro-basic-smithi/5766842/teuthology.log

Actions #1

Updated by Neha Ojha over 3 years ago

  • Subject changed from nautilus: rados/test_crash.sh fails with SIGABRT in SimpleMessenger::wait() to nautilus: rados/test_crash.sh: "kill ceph-osd" times out
2021-01-11T20:11:19.625 INFO:tasks.workunit.client.0.smithi026.stdout:RECENT_CRASH 3 daemons have recently crashed
2021-01-11T20:11:19.626 INFO:tasks.workunit.client.0.smithi026.stderr:+ ceph crash archive-all
2021-01-11T20:11:20.043 INFO:tasks.workunit.client.0.smithi026.stderr:+ sleep 30
2021-01-11T20:11:50.043 INFO:tasks.workunit.client.0.smithi026.stderr:+ ceph health detail
2021-01-11T20:11:50.044 INFO:tasks.workunit.client.0.smithi026.stderr:+ grep -c RECENT_CRASH
2021-01-11T20:11:50.044 INFO:tasks.workunit.client.0.smithi026.stderr:+ grep 0
2021-01-11T20:11:50.448 INFO:tasks.workunit.client.0.smithi026.stdout:0
2021-01-11T20:11:50.449 INFO:teuthology.orchestra.run:Running command with timeout 3600
2021-01-11T20:11:50.449 DEBUG:teuthology.orchestra.run.smithi026:> sudo rm -rf -- /home/ubuntu/cephtest/mnt.0/client.0/tmp
2021-01-11T20:11:50.463 INFO:tasks.workunit:Stopping ['rados/test_crash.sh'] on client.0...
2021-01-11T20:11:50.464 DEBUG:teuthology.orchestra.run.smithi026:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0
2021-01-11T20:11:50.695 DEBUG:teuthology.parallel:result is None
2021-01-11T20:11:50.696 DEBUG:teuthology.orchestra.run.smithi026:> sudo rm -rf -- /home/ubuntu/cephtest/mnt.0/client.0
2021-01-11T20:11:50.710 INFO:tasks.workunit:Deleted dir /home/ubuntu/cephtest/mnt.0/client.0
2021-01-11T20:11:50.711 DEBUG:teuthology.orchestra.run.smithi026:> rmdir -- /home/ubuntu/cephtest/mnt.0
2021-01-11T20:11:50.772 INFO:tasks.workunit:Deleted artificial mount point /home/ubuntu/cephtest/mnt.0/client.0
2021-01-11T20:11:50.772 INFO:teuthology.run_tasks:Running task ceph.restart...
2021-01-11T20:11:50.783 INFO:tasks.ceph.osd.0:Restarting daemon
2021-01-11T20:11:50.784 INFO:tasks.ceph.osd.0:Stopping old one...
2021-01-11T20:11:50.784 DEBUG:tasks.ceph.osd.0:waiting for process to exit
2021-01-11T20:11:50.784 INFO:teuthology.orchestra.run:waiting for 300
2021-01-11T20:11:50.785 DEBUG:teuthology.orchestra.run:got remote process result: 1
2021-01-11T20:11:50.785 ERROR:teuthology.orchestra.daemon.state:Error while waiting for process to exit
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/daemon/state.py", line 139, in stop
    run.wait([self.proc], timeout=timeout)
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 479, in wait
    proc.wait()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 161, in wait
    self._raise_for_status()
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 183, in _raise_for_status
    node=self.hostname, label=self.label
teuthology.exceptions.CommandFailedError: Command failed on smithi026 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f --cluster ceph -i 0'

The Simple Messenger crash is expected, this is the problem.

Actions

Also available in: Atom PDF