Actions
Bug #48871
opennautilus: rados/test_crash.sh: "kill ceph-osd" times out
Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
failure reason:
failure_reason: '"2021-01-11 20:11:57.011650 osd.2 (osd.2) 1 : cluster [WRN] Monitor daemon marked osd.2 down, but it is still running" in cluster log'
gdb traceback:
2021-01-11T20:10:48.855 INFO:tasks.workunit.client.0.smithi026.stderr:Program terminated with signal SIGABRT, Aborted. 2021-01-11T20:10:48.856 INFO:tasks.workunit.client.0.smithi026.stderr:#0 raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:51 2021-01-11T20:10:48.856 INFO:tasks.workunit.client.0.smithi026.stderr:[Current thread is 1 (Thread 0x7f1ee902ac00 (LWP 10980))] 2021-01-11T20:10:48.856 INFO:tasks.workunit.client.0.smithi026.stderr:(gdb) .*terminated.*signal 6.*
2021-01-11T20:10:41.116 INFO:tasks.ceph.osd.1.smithi026.stderr: ceph version 14.2.16-149-ga45d03d (a45d03d1f4274e5799c8f24743ee4f16aa657dff) nautilus (stable) 2021-01-11T20:10:41.117 INFO:tasks.ceph.osd.1.smithi026.stderr: 1: (()+0x12980) [0x7f90a22e8980] 2021-01-11T20:10:41.117 INFO:tasks.ceph.osd.1.smithi026.stderr: 2: (pthread_cond_wait()+0x243) [0x7f90a22e3ad3] 2021-01-11T20:10:41.117 INFO:tasks.ceph.osd.1.smithi026.stderr: 3: (SimpleMessenger::wait()+0x3ef) [0x55830194631f] 2021-01-11T20:10:41.117 INFO:tasks.ceph.osd.1.smithi026.stderr: 4: (main()+0x5310) [0x5583010de1a0] 2021-01-11T20:10:41.118 INFO:tasks.ceph.osd.1.smithi026.stderr: 5: (__libc_start_main()+0xe7) [0x7f90a0f7dbf7] 2021-01-11T20:10:41.118 INFO:tasks.ceph.osd.1.smithi026.stderr: 6: (_start()+0x2a) [0x55830110fc1a] 2021-01-11T20:10:41.119 INFO:tasks.workunit.client.0.smithi026.stderr:+ sleep 5
no core dump collected.
on rerun was not able to reproduce the issue.
Updated by Neha Ojha over 3 years ago
- Subject changed from nautilus: rados/test_crash.sh fails with SIGABRT in SimpleMessenger::wait() to nautilus: rados/test_crash.sh: "kill ceph-osd" times out
2021-01-11T20:11:19.625 INFO:tasks.workunit.client.0.smithi026.stdout:RECENT_CRASH 3 daemons have recently crashed 2021-01-11T20:11:19.626 INFO:tasks.workunit.client.0.smithi026.stderr:+ ceph crash archive-all 2021-01-11T20:11:20.043 INFO:tasks.workunit.client.0.smithi026.stderr:+ sleep 30 2021-01-11T20:11:50.043 INFO:tasks.workunit.client.0.smithi026.stderr:+ ceph health detail 2021-01-11T20:11:50.044 INFO:tasks.workunit.client.0.smithi026.stderr:+ grep -c RECENT_CRASH 2021-01-11T20:11:50.044 INFO:tasks.workunit.client.0.smithi026.stderr:+ grep 0 2021-01-11T20:11:50.448 INFO:tasks.workunit.client.0.smithi026.stdout:0 2021-01-11T20:11:50.449 INFO:teuthology.orchestra.run:Running command with timeout 3600 2021-01-11T20:11:50.449 DEBUG:teuthology.orchestra.run.smithi026:> sudo rm -rf -- /home/ubuntu/cephtest/mnt.0/client.0/tmp 2021-01-11T20:11:50.463 INFO:tasks.workunit:Stopping ['rados/test_crash.sh'] on client.0... 2021-01-11T20:11:50.464 DEBUG:teuthology.orchestra.run.smithi026:> sudo rm -rf -- /home/ubuntu/cephtest/workunits.list.client.0 /home/ubuntu/cephtest/clone.client.0 2021-01-11T20:11:50.695 DEBUG:teuthology.parallel:result is None 2021-01-11T20:11:50.696 DEBUG:teuthology.orchestra.run.smithi026:> sudo rm -rf -- /home/ubuntu/cephtest/mnt.0/client.0 2021-01-11T20:11:50.710 INFO:tasks.workunit:Deleted dir /home/ubuntu/cephtest/mnt.0/client.0 2021-01-11T20:11:50.711 DEBUG:teuthology.orchestra.run.smithi026:> rmdir -- /home/ubuntu/cephtest/mnt.0 2021-01-11T20:11:50.772 INFO:tasks.workunit:Deleted artificial mount point /home/ubuntu/cephtest/mnt.0/client.0 2021-01-11T20:11:50.772 INFO:teuthology.run_tasks:Running task ceph.restart... 2021-01-11T20:11:50.783 INFO:tasks.ceph.osd.0:Restarting daemon 2021-01-11T20:11:50.784 INFO:tasks.ceph.osd.0:Stopping old one... 2021-01-11T20:11:50.784 DEBUG:tasks.ceph.osd.0:waiting for process to exit 2021-01-11T20:11:50.784 INFO:teuthology.orchestra.run:waiting for 300 2021-01-11T20:11:50.785 DEBUG:teuthology.orchestra.run:got remote process result: 1 2021-01-11T20:11:50.785 ERROR:teuthology.orchestra.daemon.state:Error while waiting for process to exit Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/daemon/state.py", line 139, in stop run.wait([self.proc], timeout=timeout) File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 479, in wait proc.wait() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 161, in wait self._raise_for_status() File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", line 183, in _raise_for_status node=self.hostname, label=self.label teuthology.exceptions.CommandFailedError: Command failed on smithi026 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f --cluster ceph -i 0'
The Simple Messenger crash is expected, this is the problem.
Actions