Actions
Bug #55140
closedquincy OSD won't start: what(): void pg_stat_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input
Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
My cluster has 3 control nodes running rawhide (mons, mgrs, mds).
1 physical server with 6 HDDs running 6 OSDs (fedora rawhide).
I'm using CephFS and RGW.
Two of the OSDs refuse to start:
2022-03-30T18:52:10.904+0200 7f61801af180 -1 Falling back to public interface 2022-03-30T18:52:13.305+0200 7f61801af180 -1 bluestore::NCB::__restore_allocator::Failed open_for_read with error-code -2 terminate called after throwing an instance of 'ceph::buffer::v15_2_0::malformed_input' what(): void pg_stat_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input *** Caught signal (Aborted) ** in thread 7f61801af180 thread_name:ceph-osd ceph version 17.1.0-123-g14f44feb (14f44febaa74e8f8931e156f1a921292708ad47a) quincy (stable) 1: /lib64/libc.so.6(+0x42a30) [0x7f61803fba30] 2: /lib64/libc.so.6(+0x92a8c) [0x7f618044ba8c] 3: raise() 4: abort() 5: /lib64/libstdc++.so.6(+0xa2b1b) [0x7f618075bb1b] 6: /lib64/libstdc++.so.6(+0xae33c) [0x7f618076733c] 7: /lib64/libstdc++.so.6(+0xae3a7) [0x7f61807673a7] 8: /lib64/libstdc++.so.6(+0xae608) [0x7f6180767608] 9: /usr/bin/ceph-osd(+0x27bea5) [0x558bc8b58ea5] 10: (pg_info_t::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x1a6) [0x558bc8e886f6] 11: (PG::read_info(ObjectStore*, spg_t, coll_t const&, pg_info_t&, PastIntervals&, unsigned char&)+0x504) [0x558bc8d2aaa4] 12: (PG::read_state(ObjectStore*)+0x9b) [0x558bc8d2b31b] 13: (OSD::load_pgs()+0x6f4) [0x558bc8cac8e4] 14: (OSD::init()+0x1edf) [0x558bc8ca562f] 15: main() 16: /lib64/libc.so.6(+0x2d550) [0x7f61803e6550] 17: __libc_start_main() 18: _start() 2022-03-30T18:53:45.641+0200 7f61801af180 -1 *** Caught signal (Aborted) ** in thread 7f61801af180 thread_name:ceph-osd ceph version 17.1.0-123-g14f44feb (14f44febaa74e8f8931e156f1a921292708ad47a) quincy (stable) 1: /lib64/libc.so.6(+0x42a30) [0x7f61803fba30] 2: /lib64/libc.so.6(+0x92a8c) [0x7f618044ba8c] 3: raise() 4: abort() 5: /lib64/libstdc++.so.6(+0xa2b1b) [0x7f618075bb1b] 6: /lib64/libstdc++.so.6(+0xae33c) [0x7f618076733c] 7: /lib64/libstdc++.so.6(+0xae3a7) [0x7f61807673a7] 8: /lib64/libstdc++.so.6(+0xae608) [0x7f6180767608] 9: /usr/bin/ceph-osd(+0x27bea5) [0x558bc8b58ea5] 10: (pg_info_t::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x1a6) [0x558bc8e886f6] 11: (PG::read_info(ObjectStore*, spg_t, coll_t const&, pg_info_t&, PastIntervals&, unsigned char&)+0x504) [0x558bc8d2aaa4] 12: (PG::read_state(ObjectStore*)+0x9b) [0x558bc8d2b31b] 13: (OSD::load_pgs()+0x6f4) [0x558bc8cac8e4] 14: (OSD::init()+0x1edf) [0x558bc8ca562f] 15: main() 16: /lib64/libc.so.6(+0x2d550) [0x7f61803e6550] 17: __libc_start_main() 18: _start() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 0> 2022-03-30T18:53:45.641+0200 7f61801af180 -1 *** Caught signal (Aborted) ** in thread 7f61801af180 thread_name:ceph-osd ceph version 17.1.0-123-g14f44feb (14f44febaa74e8f8931e156f1a921292708ad47a) quincy (stable) 1: /lib64/libc.so.6(+0x42a30) [0x7f61803fba30] 2: /lib64/libc.so.6(+0x92a8c) [0x7f618044ba8c] 3: raise() 4: abort() 5: /lib64/libstdc++.so.6(+0xa2b1b) [0x7f618075bb1b] 6: /lib64/libstdc++.so.6(+0xae33c) [0x7f618076733c] 7: /lib64/libstdc++.so.6(+0xae3a7) [0x7f61807673a7] 8: /lib64/libstdc++.so.6(+0xae608) [0x7f6180767608] 9: /usr/bin/ceph-osd(+0x27bea5) [0x558bc8b58ea5] 10: (pg_info_t::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x1a6) [0x558bc8e886f6] 11: (PG::read_info(ObjectStore*, spg_t, coll_t const&, pg_info_t&, PastIntervals&, unsigned char&)+0x504) [0x558bc8d2aaa4] 12: (PG::read_state(ObjectStore*)+0x9b) [0x558bc8d2b31b] 13: (OSD::load_pgs()+0x6f4) [0x558bc8cac8e4] 14: (OSD::init()+0x1edf) [0x558bc8ca562f] 15: main() 16: /lib64/libc.so.6(+0x2d550) [0x7f61803e6550] 17: __libc_start_main() 18: _start() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. -463> 2022-03-30T18:53:45.641+0200 7f61801af180 -1 *** Caught signal (Aborted) ** in thread 7f61801af180 thread_name:ceph-osd ceph version 17.1.0-123-g14f44feb (14f44febaa74e8f8931e156f1a921292708ad47a) quincy (stable) 1: /lib64/libc.so.6(+0x42a30) [0x7f61803fba30] 2: /lib64/libc.so.6(+0x92a8c) [0x7f618044ba8c] 3: raise() 4: abort() 5: /lib64/libstdc++.so.6(+0xa2b1b) [0x7f618075bb1b] 6: /lib64/libstdc++.so.6(+0xae33c) [0x7f618076733c] 7: /lib64/libstdc++.so.6(+0xae3a7) [0x7f61807673a7] 8: /lib64/libstdc++.so.6(+0xae608) [0x7f6180767608] 9: /usr/bin/ceph-osd(+0x27bea5) [0x558bc8b58ea5] 10: (pg_info_t::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0x1a6) [0x558bc8e886f6] 11: (PG::read_info(ObjectStore*, spg_t, coll_t const&, pg_info_t&, PastIntervals&, unsigned char&)+0x504) [0x558bc8d2aaa4] 12: (PG::read_state(ObjectStore*)+0x9b) [0x558bc8d2b31b] 13: (OSD::load_pgs()+0x6f4) [0x558bc8cac8e4] 14: (OSD::init()+0x1edf) [0x558bc8ca562f] 15: main() 16: /lib64/libc.so.6(+0x2d550) [0x7f61803e6550] 17: __libc_start_main() 18: _start() NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
OSD log is at https://pipebreaker.pl/z/ceph-osd.4.log.zst (uncompresses to 860MB).
Updated by Aishwarya Mathuria about 2 years ago
The fix for this went in yesterday https://tracker.ceph.com/issues/53923. If you upgrade to the latest Quincy version, you should no longer see this issue.
Updated by Aishwarya Mathuria about 2 years ago
- Status changed from New to Duplicate
- Parent task set to #53923
Updated by Aishwarya Mathuria about 2 years ago
- Related to Bug #53923: [Upgrade] mgr FAILED to decode MSG_PGSTATS added
Actions