Project

General

Profile

Actions

Bug #53314

closed

qa: fs/upgrade/mds_upgrade_sequence test timeout

Added by Kotresh Hiremath Ravishankar over 2 years ago. Updated over 2 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(FS):
MDS
Labels (FS):
crash, qa
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The qa suite mds_upgrade_sequence becomes dead with job timeout because of mds crash.


ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe5) [0x7f1a4dbe3cdc]
2: (operator<<(std::ostream&, ClientMetricType const&)+0x10e) [0x7f1a4de6842e]
3: (MClientMetrics::print(std::ostream&) const+0x1a1) [0x7f1a4de68601]
4: (DispatchQueue::pre_dispatch(boost::intrusive_ptr&lt;Message&gt; const&)+0x710) [0x7f1a4de1bc30]
5: (DispatchQueue::entry()+0xdeb) [0x7f1a4de1d69b]
6: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f1a4decdb71]
7: /lib64/libpthread.so.0(+0x814a) [0x7f1a4c98514a]
8: clone()
2021-11-09T02:23:05.306+0000 7f1a44f82700 -1 ** Caught signal (Aborted) *
in thread 7f1a44f82700 thread_name:ms_dispatch
ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
1: /lib64/libpthread.so.0(+0x12b20) [0x7f1a4c98fb20]
2: gsignal()
3: abort()
4: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; > const&)+0x1b6) [0x7f1a4dbe3dad]
5: (operator<<(std::ostream&, ClientMetricType const&)+0x10e) [0x7f1a4de6842e]
6: (MClientMetrics::print(std::ostream&) const+0x1a1) [0x7f1a4de68601]
7: (DispatchQueue::pre_dispatch(boost::intrusive_ptr&lt;Message&gt; const&)+0x710) [0x7f1a4de1bc30]
8: (DispatchQueue::entry()+0xdeb) [0x7f1a4de1d69b]
9: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f1a4decdb71]
10: /lib64/libpthread.so.0(+0x814a) [0x7f1a4c98514a]
11: clone()

On the re-run of the suite, the following traceback is seen.


ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; > const&)+0xe5) [0x7f1c61c15cdc]
2: (operator<<(std::ostream&, ClientMetricType const&)+0x10e) [0x7f1c61e9a42e]
3: (MClientMetrics::print(std::ostream&) const+0x1a1) [0x7f1c61e9a601]
4: (DispatchQueue::pre_dispatch(boost::intrusive_ptr&lt;Message&gt; const&)+0x710) [0x7f1c61e4dc30]
5: (DispatchQueue::fast_dispatch(boost::intrusive_ptr&lt;Message&gt; const&)+0x32) [0x7f1c61e4e3f2]
6: (ProtocolV2::handle_message()+0x142a) [0x7f1c61f2e63a]
7: (ProtocolV2::handle_read_frame_dispatch()+0x258) [0x7f1c61f40658]
8: (ProtocolV2::_handle_read_frame_epilogue_main()+0x95) [0x7f1c61f40755]
9: (ProtocolV2::handle_read_frame_epilogue_main(std::unique_ptr&lt;ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer&gt;&&, int)+0x204) [0x7f1c61f41cb4]
10: (ProtocolV2::run_continuation(Ct&lt;ProtocolV2&gt;&)+0x3c) [0x7f1c61f29b8c]
11: (AsyncConnection::process()+0x789) [0x7f1c61ef20a9]
12: (EventCenter::process_events(unsigned int, std::chrono::duration&lt;unsigned long, std::ratio&lt;1l, 1000000000l&gt; >*)+0xcb7) [0x7f1c61f4c427]
13: /usr/lib64/ceph/libceph-common.so.2(0x5b393c) [0x7f1c61f5293c]
14: /lib64/libstdc
+.so.6(+0xc2ba3) [0x7f1c5fdfaba3]
15: /lib64/libpthread.so.0(+0x814a) [0x7f1c609b714a]
16: clone()

The teuthology runs are as below:
http://pulpito.front.sepia.ceph.com/yuriw-2021-11-08_15:19:37-fs-wip-yuri2-testing-2021-11-06-1322-pacific-distro-basic-smithi/6491152/
http://pulpito.front.sepia.ceph.com/yuriw-2021-11-08_15:19:37-fs-wip-yuri2-testing-2021-11-06-1322-pacific-distro-basic-smithi/6491196/

http://pulpito.front.sepia.ceph.com/yuriw-2021-11-17_14:51:21-fs-wip-yuri2-testing-2021-11-06-1322-pacific-distro-basic-smithi/6509583/
http://pulpito.front.sepia.ceph.com/yuriw-2021-11-17_14:51:21-fs-wip-yuri2-testing-2021-11-06-1322-pacific-distro-basic-smithi/6509593/

Logs:
/ceph/teuthology-archive/yuriw-2021-11-08_15:19:37-fs-wip-yuri2-testing-2021-11-06-1322-pacific-distro-basic-smithi/6491152/
/ceph/teuthology-archive/yuriw-2021-11-08_15:19:37-fs-wip-yuri2-testing-2021-11-06-1322-pacific-distro-basic-smithi/6491196/

/ceph/teuthology-archive/yuriw-2021-11-17_14:51:21-fs-wip-yuri2-testing-2021-11-06-1322-pacific-distro-basic-smithi/6509583/
/ceph/teuthology-archive/yuriw-2021-11-17_14:51:21-fs-wip-yuri2-testing-2021-11-06-1322-pacific-distro-basic-smithi/6509593/


Related issues 1 (0 open1 closed)

Is duplicate of CephFS - Bug #53293: qa: v16.2.4 mds crash caused by centos stream kernelResolvedPatrick Donnelly

Actions
Actions #1

Updated by Kotresh Hiremath Ravishankar over 2 years ago

@Xiubo, I think the PR https://github.com/ceph/ceph/pull/43784 is causing this.

Actions #2

Updated by Kotresh Hiremath Ravishankar over 2 years ago

  • Description updated (diff)
Actions #3

Updated by Patrick Donnelly over 2 years ago

  • Is duplicate of Bug #53293: qa: v16.2.4 mds crash caused by centos stream kernel added
Actions #4

Updated by Patrick Donnelly over 2 years ago

  • Status changed from New to Duplicate
Actions #5

Updated by Kotresh Hiremath Ravishankar over 2 years ago

  • Subject changed from cephfs: fs/upgrade/mds_upgrade_sequence test dead with job timeout to cephfs: fs/upgrade/mds_upgrade_sequence test timeout
Actions #6

Updated by Kotresh Hiremath Ravishankar over 2 years ago

  • Subject changed from cephfs: fs/upgrade/mds_upgrade_sequence test timeout to qa: fs/upgrade/mds_upgrade_sequence test timeout
Actions

Also available in: Atom PDF