Project

General

Profile

Actions

Bug #64814

open

[monclient::_reopen_session()] Client crash if there is a monitor with a weight of 0 and others with a weight > 0

Added by David Casier about 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
MonClient
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Before 2019, monitors had a weight of 10 => https://github.com/ceph/ceph/commit/2d113dedf851995e000d3cce136b69
Since then, monitors have a weight of 0.

The MGRs crash and do not recover:

Mar 07 17:06:47 pprod-mon1 ceph-mgr564045: monclient: _reopen_session rank -1
Mar 07 17:06:47 pprod-mon1 ceph-mgr564045: ** Caught signal (Aborted) *
in thread 7f9a07a27640 thread_name:mgr-fin

ceph version 17.2.6-170.el9cp (59bbeb8815ec3aeb3c8bba1e1866f8f6729eb840) quincy (stable)
1: /lib64/libc.so.6(+0x54db0) [0x7f9a2364ddb0]
2: /lib64/libc.so.6(+0xa154c) [0x7f9a2369a54c]
3: raise()
4: abort()
5: /usr/lib64/ceph/libceph-common.so.2(+0x1c1fa8) [0x7f9a23ce2fa8]
6: /usr/lib64/ceph/libceph-common.so.2(+0x444425) [0x7f9a23f65425]
7: /usr/lib64/ceph/libceph-common.so.2(+0x4442e0) [0x7f9a23f652e0]
8: /usr/lib64/ceph/libceph-common.so.2(+0x4442e0) [0x7f9a23f652e0]
9: (MonClient::_add_conns()+0x242) [0x7f9a23f5fa42]
10: (MonClient::_reopen_session(int)+0x428) [0x7f9a23f60518]
11: (Mgr::init()+0x384) [0x5604667a6434]
12: /usr/bin/ceph-mgr(+0x1af271) [0x5604667ae271]
13: /usr/bin/ceph-mgr(+0x11364d) [0x56046671264d]
14: (Finisher::finisher_thread_entry()+0x175) [0x7f9a23d10645]
15: /lib64/libc.so.6(+0x9f802) [0x7f9a23698802]
16: /lib64/libc.so.6(+0x3f450) [0x7f9a23638450]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

The OSDs fall but manage to start again:

-2> 2024-03-08T11:19:45.559+0000 7f1c76d0a640 10 monclient: _reopen_session rank -1
-1> 2024-03-08T11:19:45.573+0000 7f1c788f1640 5 prioritycache tune_memory target: 4294967296 mapped: 371105792 unmapped: 1810432 heap: 372916224 old mem: 2845415832 new mem: 2845415832
0> 2024-03-08T11:19:45.575+0000 7f1c76d0a640 -1 ** Caught signal (Aborted) *
in thread 7f1c76d0a640 thread_name:safe_timer
ceph version 17.2.6-170.el9cp (59bbeb8815ec3aeb3c8bba1e1866f8f6729eb840) quincy (stable)
1: /lib64/libc.so.6(+0x54db0) [0x7f1c8848adb0]
2: /lib64/libc.so.6(+0xa154c) [0x7f1c884d754c]
3: raise()
4: abort()
5: /usr/bin/ceph-osd(+0x4dafb8) [0x5638ea881fb8]
6: /usr/bin/ceph-osd(+0xc543b5) [0x5638eaffb3b5]
7: /usr/bin/ceph-osd(+0xc54270) [0x5638eaffb270]
8: /usr/bin/ceph-osd(+0xc54270) [0x5638eaffb270]
9: /usr/bin/ceph-osd(+0xc54270) [0x5638eaffb270]
10: (MonClient::_add_conns()+0x262) [0x5638eafe8672]
11: (MonClient::_reopen_session(int)+0x488) [0x5638eafe9198]
12: (MonClient::tick()+0x638) [0x5638eafedaf8]
13: /usr/bin/ceph-osd(+0x4d6cbd) [0x5638ea87dcbd]
14: (CommonSafeTimer<std::mutex>::timer_thread()+0x11a) [0x5638eae646ea]
15: /usr/bin/ceph-osd(+0xabdfa1) [0x5638eae64fa1]
16: /lib64/libc.so.6(+0x9f802) [0x7f1c884d5802]
17: /lib64/libc.so.6(+0x3f450) [0x7f1c88475450]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

No data to display

Actions

Also available in: Atom PDF