Bug #64478
closedUpgrading mon from v18.2.1 to latest-reef-devel image is causing mon to fail when decoding the MDSMap
0%
Description
The Rook daily CI creates a v18.2.1 cluster with CephFS enabled, then upgrades to the latest-reef-devel image. As soon as the mon is upgraded, this failure is seen in the mon log.
debug -1> 2024-02-16T21:27:13.319+0000 7fca997d4c80 1 mon.a@-1(???) e1 preinit fsid 8a85cb29-13cb-4904-b9e3-6b692e7bfffb
debug 0> 2024-02-16T21:27:13.343+0000 7fca997d4c80 -1 ** Caught signal (Aborted) *
in thread 7fca997d4c80 thread_name:ceph-mon
ceph version 18.2.1-593-g744c573d (744c573dfc29e50959567861c524f9e6c038171f) reef (stable)
1: /lib64/libpthread.so.0(0x12d20) [0x7fca9642dd20]
2: gsignal()
3: abort()
4: /lib64/libstdc+.so.6(0x9009b) [0x7fca95a3f09b]
5: /lib64/libstdc+.so.6(0x9654c) [0x7fca95a4554c]
6: /lib64/libstdc+.so.6(0x965a7) [0x7fca95a455a7]
7: /lib64/libstdc+.so.6(+0x96808) [0x7fca95a45808]
8: (ceph::buffer::v15_2_0::list::iterator_impl<true>::copy(unsigned int, char*)+0xa5) [0x7fca98fd2385]
9: (MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xa64) [0x7fca991ede94]
10: (Filesystem::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x162) [0x7fca991fbb42]
11: (void ceph::decode<Filesystem, std::allocator<std::shared_ptr<Filesystem> > >(std::vector<std::shared_ptr<Filesystem>, std::allocator<std::shared_ptr<Filesystem> > >&, ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x145) [0x7fca99208195]
12: (FSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x161) [0x7fca991fda01]
13: (MDSMonitor::update_from_paxos(bool*)+0x26b) [0x555c9c6e55eb]
14: (Monitor::refresh_from_paxos(bool*)+0x104) [0x555c9c473764]
15: (Monitor::preinit()+0xa2b) [0x555c9c4a1b6b]
16: main()
17: __libc_start_main()
18: _start()
See attached for the full mon log.
The purpose of the test is to confirm that Rook upgrades are passing on the latest Ceph images before the next release comes out.
See also the Rook CI issue: https://github.com/rook/rook/issues/13785
Files
Updated by Patrick Donnelly 3 months ago
- Is duplicate of Bug #64440: mds: reversed encoding of MDSMap max_xattr_size/bal_rank_mask v18.2.1 <-> main added
Updated by Patrick Donnelly 3 months ago
Hi Travis, I just opened an issue for this yesterday. It's great to see Rook would have caught it as there was a gap in our testing. I only found it while upgrading a cluster by accident.
Please track the issue in #64440. Thanks for your report!
Updated by Travis Nielsen 2 months ago
Great to hear you're already investigating, thanks!