Project

General

Profile

Bug #55140

quincy OSD won't start: what(): void pg_stat_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input

Added by Tomasz Torcz 8 months ago. Updated 8 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

My cluster has 3 control nodes running rawhide (mons, mgrs, mds).
1 physical server with 6 HDDs running 6 OSDs (fedora rawhide).
I'm using CephFS and RGW.

Two of the OSDs refuse to start:

2022-03-30T18:52:10.904+0200 7f61801af180 -1 Falling back to public interface
2022-03-30T18:52:13.305+0200 7f61801af180 -1 bluestore::NCB::__restore_allocator::Failed open_for_read with error-code -2
terminate called after throwing an instance of 'ceph::buffer::v15_2_0::malformed_input'
  what():  void pg_stat_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input
*** Caught signal (Aborted) **
 in thread 7f61801af180 thread_name:ceph-osd
 ceph version 17.1.0-123-g14f44feb (14f44febaa74e8f8931e156f1a921292708ad47a) quincy (stable)
 1: /lib64/libc.so.6(+0x42a30) [0x7f61803fba30]
 2: /lib64/libc.so.6(+0x92a8c) [0x7f618044ba8c]
 3: raise()
 4: abort()
 5: /lib64/libstdc++.so.6(+0xa2b1b) [0x7f618075bb1b]
 6: /lib64/libstdc++.so.6(+0xae33c) [0x7f618076733c]
 7: /lib64/libstdc++.so.6(+0xae3a7) [0x7f61807673a7]
 8: /lib64/libstdc++.so.6(+0xae608) [0x7f6180767608]
 9: /usr/bin/ceph-osd(+0x27bea5) [0x558bc8b58ea5]
 10: (pg_info_t::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x1a6) [0x558bc8e886f6]
 11: (PG::read_info(ObjectStore*, spg_t, coll_t const&, pg_info_t&, PastIntervals&, unsigned char&)+0x504) [0x558bc8d2aaa4]
 12: (PG::read_state(ObjectStore*)+0x9b) [0x558bc8d2b31b]
 13: (OSD::load_pgs()+0x6f4) [0x558bc8cac8e4]
 14: (OSD::init()+0x1edf) [0x558bc8ca562f]
 15: main()
 16: /lib64/libc.so.6(+0x2d550) [0x7f61803e6550]
 17: __libc_start_main()
 18: _start()
2022-03-30T18:53:45.641+0200 7f61801af180 -1 *** Caught signal (Aborted) **
 in thread 7f61801af180 thread_name:ceph-osd
 ceph version 17.1.0-123-g14f44feb (14f44febaa74e8f8931e156f1a921292708ad47a) quincy (stable)
 1: /lib64/libc.so.6(+0x42a30) [0x7f61803fba30]
 2: /lib64/libc.so.6(+0x92a8c) [0x7f618044ba8c]
 3: raise()
 4: abort()
 5: /lib64/libstdc++.so.6(+0xa2b1b) [0x7f618075bb1b]
 6: /lib64/libstdc++.so.6(+0xae33c) [0x7f618076733c]
 7: /lib64/libstdc++.so.6(+0xae3a7) [0x7f61807673a7]
 8: /lib64/libstdc++.so.6(+0xae608) [0x7f6180767608]
 9: /usr/bin/ceph-osd(+0x27bea5) [0x558bc8b58ea5]
 10: (pg_info_t::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x1a6) [0x558bc8e886f6]
 11: (PG::read_info(ObjectStore*, spg_t, coll_t const&, pg_info_t&, PastIntervals&, unsigned char&)+0x504) [0x558bc8d2aaa4]
 12: (PG::read_state(ObjectStore*)+0x9b) [0x558bc8d2b31b]
 13: (OSD::load_pgs()+0x6f4) [0x558bc8cac8e4]
 14: (OSD::init()+0x1edf) [0x558bc8ca562f]
 15: main()
 16: /lib64/libc.so.6(+0x2d550) [0x7f61803e6550]
 17: __libc_start_main()
 18: _start()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
     0> 2022-03-30T18:53:45.641+0200 7f61801af180 -1 *** Caught signal (Aborted) **
 in thread 7f61801af180 thread_name:ceph-osd
 ceph version 17.1.0-123-g14f44feb (14f44febaa74e8f8931e156f1a921292708ad47a) quincy (stable)
 1: /lib64/libc.so.6(+0x42a30) [0x7f61803fba30]
 2: /lib64/libc.so.6(+0x92a8c) [0x7f618044ba8c]
 3: raise()
 4: abort()
 5: /lib64/libstdc++.so.6(+0xa2b1b) [0x7f618075bb1b]
 6: /lib64/libstdc++.so.6(+0xae33c) [0x7f618076733c]
 7: /lib64/libstdc++.so.6(+0xae3a7) [0x7f61807673a7]
 8: /lib64/libstdc++.so.6(+0xae608) [0x7f6180767608]
 9: /usr/bin/ceph-osd(+0x27bea5) [0x558bc8b58ea5]
 10: (pg_info_t::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x1a6) [0x558bc8e886f6]
 11: (PG::read_info(ObjectStore*, spg_t, coll_t const&, pg_info_t&, PastIntervals&, unsigned char&)+0x504) [0x558bc8d2aaa4]
 12: (PG::read_state(ObjectStore*)+0x9b) [0x558bc8d2b31b]
 13: (OSD::load_pgs()+0x6f4) [0x558bc8cac8e4]
 14: (OSD::init()+0x1edf) [0x558bc8ca562f]
 15: main()
 16: /lib64/libc.so.6(+0x2d550) [0x7f61803e6550]
 17: __libc_start_main()
 18: _start()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
  -463> 2022-03-30T18:53:45.641+0200 7f61801af180 -1 *** Caught signal (Aborted) **
 in thread 7f61801af180 thread_name:ceph-osd
 ceph version 17.1.0-123-g14f44feb (14f44febaa74e8f8931e156f1a921292708ad47a) quincy (stable)
 1: /lib64/libc.so.6(+0x42a30) [0x7f61803fba30]
 2: /lib64/libc.so.6(+0x92a8c) [0x7f618044ba8c]
 3: raise()
 4: abort()
 5: /lib64/libstdc++.so.6(+0xa2b1b) [0x7f618075bb1b]
 6: /lib64/libstdc++.so.6(+0xae33c) [0x7f618076733c]
 7: /lib64/libstdc++.so.6(+0xae3a7) [0x7f61807673a7]
 8: /lib64/libstdc++.so.6(+0xae608) [0x7f6180767608]
 9: /usr/bin/ceph-osd(+0x27bea5) [0x558bc8b58ea5]
 10: (pg_info_t::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x1a6) [0x558bc8e886f6]
 11: (PG::read_info(ObjectStore*, spg_t, coll_t const&, pg_info_t&, PastIntervals&, unsigned char&)+0x504) [0x558bc8d2aaa4]
 12: (PG::read_state(ObjectStore*)+0x9b) [0x558bc8d2b31b]
 13: (OSD::load_pgs()+0x6f4) [0x558bc8cac8e4]
 14: (OSD::init()+0x1edf) [0x558bc8ca562f]
 15: main()
 16: /lib64/libc.so.6(+0x2d550) [0x7f61803e6550]
 17: __libc_start_main()
 18: _start()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

OSD log is at https://pipebreaker.pl/z/ceph-osd.4.log.zst (uncompresses to 860MB).


Related issues

Related to RADOS - Bug #53923: [Upgrade] mgr FAILED to decode MSG_PGSTATS Resolved

History

#1 Updated by Aishwarya Mathuria 8 months ago

The fix for this went in yesterday https://tracker.ceph.com/issues/53923. If you upgrade to the latest Quincy version, you should no longer see this issue.

#2 Updated by Aishwarya Mathuria 8 months ago

  • Status changed from New to Duplicate
  • Parent task set to #53923

#3 Updated by Aishwarya Mathuria 8 months ago

  • Parent task deleted (#53923)

#4 Updated by Aishwarya Mathuria 8 months ago

  • Related to Bug #53923: [Upgrade] mgr FAILED to decode MSG_PGSTATS added

Also available in: Atom PDF