Project

General

Profile

Actions

Bug #64478

closed

Upgrading mon from v18.2.1 to latest-reef-devel image is causing mon to fail when decoding the MDSMap

Added by Travis Nielsen 3 months ago. Updated 3 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Yes
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

The Rook daily CI creates a v18.2.1 cluster with CephFS enabled, then upgrades to the latest-reef-devel image. As soon as the mon is upgraded, this failure is seen in the mon log.

debug -1> 2024-02-16T21:27:13.319+0000 7fca997d4c80 1 mon.a@-1(???) e1 preinit fsid 8a85cb29-13cb-4904-b9e3-6b692e7bfffb
debug 0> 2024-02-16T21:27:13.343+0000 7fca997d4c80 -1 ** Caught signal (Aborted) *
in thread 7fca997d4c80 thread_name:ceph-mon

ceph version 18.2.1-593-g744c573d (744c573dfc29e50959567861c524f9e6c038171f) reef (stable)
1: /lib64/libpthread.so.0(0x12d20) [0x7fca9642dd20]
2: gsignal()
3: abort()
4: /lib64/libstdc
+.so.6(0x9009b) [0x7fca95a3f09b]
5: /lib64/libstdc
+.so.6(0x9654c) [0x7fca95a4554c]
6: /lib64/libstdc
+.so.6(0x965a7) [0x7fca95a455a7]
7: /lib64/libstdc
+.so.6(+0x96808) [0x7fca95a45808]
8: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, char*)+0xa5) [0x7fca98fd2385]
9: (MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xa64) [0x7fca991ede94]
10: (Filesystem::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x162) [0x7fca991fbb42]
11: (void ceph::decode<Filesystem, std::allocator<std::shared_ptr<Filesystem> > >(std::vector<std::shared_ptr<Filesystem>, std::allocator<std::shared_ptr<Filesystem> > >&, ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x145) [0x7fca99208195]
12: (FSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x161) [0x7fca991fda01]
13: (MDSMonitor::update_from_paxos(bool*)+0x26b) [0x555c9c6e55eb]
14: (Monitor::refresh_from_paxos(bool*)+0x104) [0x555c9c473764]
15: (Monitor::preinit()+0xa2b) [0x555c9c4a1b6b]
16: main()
17: __libc_start_main()
18: _start()

See attached for the full mon log.

The purpose of the test is to confirm that Rook upgrades are passing on the latest Ceph images before the next release comes out.

See also the Rook CI issue: https://github.com/rook/rook/issues/13785


Files

mon-a-crash.log (119 KB) mon-a-crash.log Travis Nielsen, 02/16/2024 10:10 PM

Related issues 1 (1 open0 closed)

Is duplicate of CephFS - Bug #64440: mds: reversed encoding of MDSMap max_xattr_size/bal_rank_mask v18.2.1 <-> mainPending BackportPatrick Donnelly

Actions
Actions #1

Updated by Patrick Donnelly 3 months ago

  • Is duplicate of Bug #64440: mds: reversed encoding of MDSMap max_xattr_size/bal_rank_mask v18.2.1 <-> main added
Actions #2

Updated by Patrick Donnelly 3 months ago

Hi Travis, I just opened an issue for this yesterday. It's great to see Rook would have caught it as there was a gap in our testing. I only found it while upgrading a cluster by accident.

Please track the issue in #64440. Thanks for your report!

Actions #3

Updated by Patrick Donnelly 3 months ago

  • Status changed from New to Duplicate
Actions #4

Updated by Travis Nielsen 3 months ago

Great to hear you're already investigating, thanks!

Actions #5

Updated by Ilya Dryomov 3 months ago

  • Target version deleted (v18.2.2)
Actions

Also available in: Atom PDF