Project

General

Profile

Bug #42570

mgr: qa: upgrade mimic-master "src/osd/osd_types.h: 2313: FAILED ceph_assert(pos <= end)"

Added by Patrick Donnelly 5 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
fs
Component(RADOS):
Pull request ID:
Crash signature:

Description

2019-10-30T19:19:31.669 INFO:tasks.ceph.ceph_manager.ceph:need seq 38654705758 got 0 for osd.0
2019-10-30T19:19:32.213 INFO:tasks.ceph.mon.a.smithi201.stderr:2019-10-30T19:19:32.209+0000 7ff3a30a0700 -1 mon.a@0(leader).mgrstat failed to decode mgrstat state; luminous dev version? buffer::end_of_buffer
2019-10-30T19:19:32.214 INFO:tasks.ceph.mon.c.smithi201.stderr:2019-10-30T19:19:32.209+0000 7fbee3b6f700 -1 mon.c@2(peon).mgrstat failed to decode mgrstat state; luminous dev version? buffer::end_of_buffer
2019-10-30T19:19:32.215 INFO:tasks.ceph.mon.b.smithi201.stderr:2019-10-30T19:19:32.213+0000 7f8fe3ec1700 -1 mon.b@1(peon).mgrstat failed to decode mgrstat state; luminous dev version? buffer::end_of_buffer
2019-10-30T19:19:32.225 INFO:tasks.ceph.mon.c.smithi201.stderr:2019-10-30T19:19:32.217+0000 7fbee3b6f700 -1 mon.c@2(peon).mgrstat failed to decode mgrstat state; luminous dev version? buffer::end_of_buffer
2019-10-30T19:19:32.225 INFO:tasks.ceph.mon.a.smithi201.stderr:2019-10-30T19:19:32.217+0000 7ff3a30a0700 -1 mon.a@0(leader).mgrstat failed to decode mgrstat state; luminous dev version? buffer::end_of_buffer
2019-10-30T19:19:32.225 INFO:tasks.ceph.mon.b.smithi201.stderr:2019-10-30T19:19:32.221+0000 7f8fe3ec1700 -1 mon.b@1(peon).mgrstat failed to decode mgrstat state; luminous dev version? buffer::end_of_buffer
2019-10-30T19:19:32.226 INFO:tasks.ceph.mgr.y.smithi201.stderr:/build/ceph-15.0.0-6608-gfb29d02/src/osd/osd_types.h: In function 'static void store_statfs_t::_denc_finish(ceph::buffer::v14_2_0::ptr::const_iterator&, __u8*, __u8*, char**, uint32_t*)' thread 7f0ebff38700 time 2019-10-30T19:19:32.223103+0000
2019-10-30T19:19:32.226 INFO:tasks.ceph.mgr.y.smithi201.stderr:/build/ceph-15.0.0-6608-gfb29d02/src/osd/osd_types.h: 2313: FAILED ceph_assert(pos <= end)
2019-10-30T19:19:32.234 INFO:tasks.ceph.mgr.y.smithi201.stderr:/build/ceph-15.0.0-6608-gfb29d02/src/osd/osd_types.h: In function 'static void store_statfs_t::_denc_finish(ceph::buffer::v14_2_0::ptr::const_iterator&, __u8*, __u8*, char**, uint32_t*)' thread 7f0ec0739700 time 2019-10-30T19:19:32.226306+0000
2019-10-30T19:19:32.234 INFO:tasks.ceph.mgr.y.smithi201.stderr:/build/ceph-15.0.0-6608-gfb29d02/src/osd/osd_types.h: 2313: FAILED ceph_assert(pos <= end)
2019-10-30T19:19:32.240 INFO:tasks.ceph.mgr.y.smithi201.stderr: ceph version 15.0.0-6608-gfb29d02 (fb29d023301243910681de545c877cc84d2ca724) octopus (dev)
2019-10-30T19:19:32.241 INFO:tasks.ceph.mgr.y.smithi201.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f0ec5791e02]
2019-10-30T19:19:32.241 INFO:tasks.ceph.mgr.y.smithi201.stderr: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f0ec5791fdd]
2019-10-30T19:19:32.241 INFO:tasks.ceph.mgr.y.smithi201.stderr: 3: (std::enable_if<denc_traits<store_statfs_t, void>::supported&&denc_traits<store_statfs_t, void>::need_contiguous, void>::type ceph::decode<store_statfs_t, denc_traits<store_statfs_t, void> >(store_statfs_t&, ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x1d1) [0x7f0ec5bd92c1]
2019-10-30T19:19:32.241 INFO:tasks.ceph.mgr.y.smithi201.stderr: 4: (osd_stat_t::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x4a6) [0x7f0ec5bbfc16]
2019-10-30T19:19:32.241 INFO:tasks.ceph.mgr.y.smithi201.stderr: 5: (MPGStats::decode_payload()+0x6b) [0x7f0ec59e8e5b]
2019-10-30T19:19:32.241 INFO:tasks.ceph.mgr.y.smithi201.stderr: 6: (decode_message(CephContext*, int, ceph_msg_header&, ceph_msg_footer&, ceph::buffer::v14_2_0::list&, ceph::buffer::v14_2_0::list&, ceph::buffer::v14_2_0::list&, boost::intrusive_ptr<Connection>)+0x1097) [0x7f0ec59759b7]
2019-10-30T19:19:32.242 INFO:tasks.ceph.mgr.y.smithi201.stderr: 7: (ProtocolV1::handle_message_footer(char*, int)+0x129) [0x7f0ec5a17f39]
2019-10-30T19:19:32.242 INFO:tasks.ceph.mgr.y.smithi201.stderr: 8: (()+0x4eeb0d) [0x7f0ec5a10b0d]
2019-10-30T19:19:32.242 INFO:tasks.ceph.mgr.y.smithi201.stderr: 9: (AsyncConnection::process()+0x5fc) [0x7f0ec59ff85c]
2019-10-30T19:19:32.242 INFO:tasks.ceph.mgr.y.smithi201.stderr: 10: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0x7dd) [0x7f0ec5a56b5d]
2019-10-30T19:19:32.242 INFO:tasks.ceph.mgr.y.smithi201.stderr: 11: (()+0x53c968) [0x7f0ec5a5e968]
2019-10-30T19:19:32.243 INFO:tasks.ceph.mgr.y.smithi201.stderr: 12: (()+0xbd66f) [0x7f0ec464666f]
2019-10-30T19:19:32.243 INFO:tasks.ceph.mgr.y.smithi201.stderr: 13: (()+0x76db) [0x7f0ec4b1d6db]
2019-10-30T19:19:32.243 INFO:tasks.ceph.mgr.y.smithi201.stderr: 14: (clone()+0x3f) [0x7f0ec3d0388f]
2019-10-30T19:19:32.243 INFO:tasks.ceph.mgr.y.smithi201.stderr: ceph version 15.0.0-6608-gfb29d02 (fb29d023301243910681de545c877cc84d2ca724) octopus (dev)
2019-10-30T19:19:32.243 INFO:tasks.ceph.mgr.y.smithi201.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f0ec5791e02]
2019-10-30T19:19:32.243 INFO:tasks.ceph.mgr.y.smithi201.stderr: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x7f0ec5791fdd]
2019-10-30T19:19:32.244 INFO:tasks.ceph.mgr.y.smithi201.stderr: 3: (std::enable_if<denc_traits<store_statfs_t, void>::supported&&denc_traits<store_statfs_t, void>::need_contiguous, void>::type ceph::decode<store_statfs_t, denc_traits<store_statfs_t, void> >(store_statfs_t&, ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x1d1) [0x7f0ec5bd92c1]
2019-10-30T19:19:32.244 INFO:tasks.ceph.mgr.y.smithi201.stderr: 4: (osd_stat_t::decode(ceph::buffer::v14_2_0::list::iterator_impl<true>&)+0x4a6) [0x7f0ec5bbfc16]
2019-10-30T19:19:32.244 INFO:tasks.ceph.mgr.y.smithi201.stderr: 5: (MPGStats::decode_payload()+0x6b) [0x7f0ec59e8e5b]
2019-10-30T19:19:32.244 INFO:tasks.ceph.mgr.y.smithi201.stderr: 6: (decode_message(CephContext*, int, ceph_msg_header&, ceph_msg_footer&, ceph::buffer::v14_2_0::list&, ceph::buffer::v14_2_0::list&, ceph::buffer::v14_2_0::list&, boost::intrusive_ptr<Connection>)+0x1097) [0x7f0ec59759b7]
2019-10-30T19:19:32.244 INFO:tasks.ceph.mgr.y.smithi201.stderr: 7: (ProtocolV1::handle_message_footer(char*, int)+0x129) [0x7f0ec5a17f39]
2019-10-30T19:19:32.244 INFO:tasks.ceph.mgr.y.smithi201.stderr: 8: (()+0x4eeb0d) [0x7f0ec5a10b0d]
2019-10-30T19:19:32.245 INFO:tasks.ceph.mgr.y.smithi201.stderr: 9: (AsyncConnection::process()+0x5fc) [0x7f0ec59ff85c]
2019-10-30T19:19:32.245 INFO:tasks.ceph.mgr.y.smithi201.stderr: 10: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0x7dd) [0x7f0ec5a56b5d]
2019-10-30T19:19:32.245 INFO:tasks.ceph.mgr.y.smithi201.stderr: 11: (()+0x53c968) [0x7f0ec5a5e968]
2019-10-30T19:19:32.245 INFO:tasks.ceph.mgr.y.smithi201.stderr: 12: (()+0xbd66f) [0x7f0ec464666f]
2019-10-30T19:19:32.245 INFO:tasks.ceph.mgr.y.smithi201.stderr: 13: (()+0x76db) [0x7f0ec4b1d6db]
2019-10-30T19:19:32.245 INFO:tasks.ceph.mgr.y.smithi201.stderr: 14: (clone()+0x3f) [0x7f0ec3d0388f]
2019-10-30T19:19:32.245 INFO:tasks.ceph.mgr.y.smithi201.stderr:*** Caught signal (Aborted) **
2019-10-30T19:19:32.246 INFO:tasks.ceph.mgr.y.smithi201.stderr: in thread 7f0ebff38700 thread_name:msgr-worker-2

From: /ceph/teuthology-archive/pdonnell-2019-10-30_18:50:00-fs:upgrade-master-distro-basic-smithi/4456235/teuthology.log


Related issues

Related to RADOS - Feature #40640: Network ping monitoring Resolved 07/02/2019
Related to RADOS - Backport #41696: mimic: Network ping monitoring Resolved
Related to RADOS - Backport #41697: luminous: Network ping monitoring Resolved

History

#1 Updated by Sage Weil 5 months ago

db84d9ea8f3d1d46ba4cc3116aea052e8554261d from pr 30951 [1] is the problem. it adds ping times, which are at osd_stat_t struct_v=14 in master, as struct_v=9 in mimic. the osd_stat_t encoded by mimic looks like gibberish to nautilus and master.

[1] Errata: db84d9ea8f3d1d46ba4cc3116aea052e8554261d is in https://github.com/ceph/ceph/pull/30225

#2 Updated by Sage Weil 5 months ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 31267

reverting in mimic for now. the nautilus PR https://github.com/ceph/ceph/pull/30195 is similarly broken but hasn't merged yet.

#3 Updated by Patrick Donnelly 5 months ago

  • Project changed from mgr to RADOS
  • Assignee set to Sage Weil
  • Source set to Q/A

#4 Updated by Patrick Donnelly 5 months ago

  • Assignee changed from Sage Weil to David Zafman
  • Pull request ID changed from 31267 to 31275

#5 Updated by Patrick Donnelly 5 months ago

  • Target version changed from v15.0.0 to v13.2.7

#7 Updated by David Zafman 5 months ago

  • Status changed from Fix Under Review to Resolved
  • Backport set to luminous

I put in the backports, but they were already completed without creating backport trackers.

Nautilus: Separate fix not needed because it was rolled into the feature pull request (https://github.com/ceph/ceph/pull/30195 -> bd29e44a653d67f344fe0fd915a207aa6feae6e6)
Mimic: This fix
Luminous: https://github.com/ceph/ceph/pull/31277

#8 Updated by David Zafman 5 months ago

#9 Updated by David Zafman 5 months ago

#10 Updated by David Zafman 5 months ago

Also available in: Atom PDF