Actions
Bug #53767
closedqa/workunits/cls/test_cls_2pc_queue.sh: killing an osd during thrashing causes timeout
Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Q/A
Tags:
Backport:
reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Description: rados/verify/{centos_latest ceph clusters/{fixed-2 openstack} d-thrash/default/{default thrashosds-health} mon_election/connectivity msgr-failures/few msgr/async objectstore/bluestore-comp-zstd rados tasks/rados_cls_all validater/valgrind}
Failure reason:
Command failed (workunit test cls/test_cls_2pc_queue.sh) on smithi023 with status 124: 'mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=923a78b748f3bb78722c7300318f17cf5730a2ce TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cls/test_cls_2pc_queue.sh'
/a/yuriw-2021-12-23_16:50:03-rados-wip-yuri6-testing-2021-12-22-1410-distro-default-smithi/6582533
2021-12-23T20:26:35.417 DEBUG:teuthology.orchestra.run.smithi023:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 30 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok dump_ops_in_flight
2021-12-23T20:26:35.418 INFO:teuthology.orchestra.run.smithi023.stderr:nodeep-scrub is unset
2021-12-23T20:26:35.419 INFO:tasks.ceph.osd.6.smithi125.stderr:2021-12-23T20:26:35.419+0000 ee6a700 -1 received signal: Hangup from /usr/bin/python3 /bin/daemon-helper term env OPENSSL_ia32cap=~0x1000000000000000 valgrind --trace-childre
n=no --child-silent-after-fork=yes --soname-synonyms=somalloc=*tcmalloc* --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.6.log --time-stamp=yes --vgdb=yes --exit-on-first
-error=yes --error-exitcode=42 --tool=memcheck ceph-osd -f --cluster ceph -i 6 (PID: 35347) UID: 0
2021-12-23T20:26:35.518 INFO:tasks.ceph.osd.6.smithi125.stderr:2021-12-23T20:26:35.520+0000 ee6a700 -1 received signal: Hangup from /usr/bin/python3 /bin/daemon-helper term env OPENSSL_ia32cap=~0x1000000000000000 valgrind --trace-childre
n=no --child-silent-after-fork=yes --soname-synonyms=somalloc=*tcmalloc* --num-callers=50 --suppressions=/home/ubuntu/cephtest/valgrind.supp --xml=yes --xml-file=/var/log/ceph/valgrind/osd.6.log --time-stamp=yes --vgdb=yes --exit-on-first
-error=yes --error-exitcode=42 --tool=memcheck ceph-osd -f --cluster ceph -i 6 (PID: 35347) UID: 0
2021-12-23T20:26:35.590 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
File "/home/teuthworker/src/github.com_ceph_ceph-c_923a78b748f3bb78722c7300318f17cf5730a2ce/qa/tasks/ceph_manager.py", line 189, in wrapper
return func(self)
File "/home/teuthworker/src/github.com_ceph_ceph-c_923a78b748f3bb78722c7300318f17cf5730a2ce/qa/tasks/ceph_manager.py", line 1412, in _do_thrash
self.choose_action()()
File "/home/teuthworker/src/github.com_ceph_ceph-c_923a78b748f3bb78722c7300318f17cf5730a2ce/qa/tasks/ceph_manager.py", line 347, in kill_osd
self.ceph_manager.kill_osd(osd)
File "/home/teuthworker/src/github.com_ceph_ceph-c_923a78b748f3bb78722c7300318f17cf5730a2ce/qa/tasks/ceph_manager.py", line 2977, in kill_osd
self.ctx.daemons.get_daemon('osd', osd, self.cluster).stop()
File "/home/teuthworker/src/git.ceph.com_git_teuthology_95a7d4799b562f3bbb5ec66107094963abd62fa1/teuthology/orchestra/daemon/state.py", line 139, in stop
run.wait([self.proc], timeout=timeout)
File "/home/teuthworker/src/git.ceph.com_git_teuthology_95a7d4799b562f3bbb5ec66107094963abd62fa1/teuthology/orchestra/run.py", line 473, in wait
check_time()
File "/home/teuthworker/src/git.ceph.com_git_teuthology_95a7d4799b562f3bbb5ec66107094963abd62fa1/teuthology/contextutil.py", line 133, in __call__
raise MaxWhileTries(error_msg)
teuthology.exceptions.MaxWhileTries: reached maximum tries (50) after waiting for 300 seconds
Actions