Project

General

Profile

Actions

Bug #64478

closed

Upgrading mon from v18.2.1 to latest-reef-devel image is causing mon to fail when decoding the MDSMap

Added by Travis Nielsen 3 months ago. Updated 2 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Yes
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The Rook daily CI creates a v18.2.1 cluster with CephFS enabled, then upgrades to the latest-reef-devel image. As soon as the mon is upgraded, this failure is seen in the mon log.

debug -1> 2024-02-16T21:27:13.319+0000 7fca997d4c80 1 mon.a@-1(???) e1 preinit fsid 8a85cb29-13cb-4904-b9e3-6b692e7bfffb
debug 0> 2024-02-16T21:27:13.343+0000 7fca997d4c80 -1 ** Caught signal (Aborted) *
in thread 7fca997d4c80 thread_name:ceph-mon

ceph version 18.2.1-593-g744c573d (744c573dfc29e50959567861c524f9e6c038171f) reef (stable)
1: /lib64/libpthread.so.0(0x12d20) [0x7fca9642dd20]
2: gsignal()
3: abort()
4: /lib64/libstdc
+.so.6(0x9009b) [0x7fca95a3f09b]
5: /lib64/libstdc
+.so.6(0x9654c) [0x7fca95a4554c]
6: /lib64/libstdc
+.so.6(0x965a7) [0x7fca95a455a7]
7: /lib64/libstdc
+.so.6(+0x96808) [0x7fca95a45808]
8: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, char*)+0xa5) [0x7fca98fd2385]
9: (MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xa64) [0x7fca991ede94]
10: (Filesystem::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x162) [0x7fca991fbb42]
11: (void ceph::decode<Filesystem, std::allocator<std::shared_ptr<Filesystem> > >(std::vector<std::shared_ptr<Filesystem>, std::allocator<std::shared_ptr<Filesystem> > >&, ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x145) [0x7fca99208195]
12: (FSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x161) [0x7fca991fda01]
13: (MDSMonitor::update_from_paxos(bool*)+0x26b) [0x555c9c6e55eb]
14: (Monitor::refresh_from_paxos(bool*)+0x104) [0x555c9c473764]
15: (Monitor::preinit()+0xa2b) [0x555c9c4a1b6b]
16: main()
17: __libc_start_main()
18: _start()

See attached for the full mon log.

The purpose of the test is to confirm that Rook upgrades are passing on the latest Ceph images before the next release comes out.

See also the Rook CI issue: https://github.com/rook/rook/issues/13785


Files

mon-a-crash.log (119 KB) mon-a-crash.log Travis Nielsen, 02/16/2024 10:10 PM

Related issues 1 (1 open0 closed)

Is duplicate of CephFS - Bug #64440: mds: reversed encoding of MDSMap max_xattr_size/bal_rank_mask v18.2.1 <-> mainPending BackportPatrick Donnelly

Actions
Actions

Also available in: Atom PDF