Project

General

Profile

Bug #38295

luminous->(mimic,nautilus): PGMapDigest decode error on luminous end

Added by Sage Weil 11 months ago. Updated 11 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature:

Description

2019-02-13T03:49:18.605 INFO:tasks.ceph.mon.c.smithi012.stderr:terminate called after throwing an instance of 'ceph::buffer::end_of_buffer'
2019-02-13T03:49:18.607 INFO:tasks.ceph.mon.c.smithi012.stderr:  what():  buffer::end_of_buffer
2019-02-13T03:49:18.608 INFO:tasks.ceph.mon.c.smithi012.stderr:*** Caught signal (Aborted) **
2019-02-13T03:49:18.608 INFO:tasks.ceph.mon.c.smithi012.stderr: in thread 7fab1647e700 thread_name:ms_dispatch
2019-02-13T03:49:18.610 INFO:tasks.ceph.mon.c.smithi012.stderr: ceph version 12.2.11-32-ge18688f (e18688fa4ed3217e454662037127b03cb8e34394) luminous (stable)
2019-02-13T03:49:18.610 INFO:tasks.ceph.mon.c.smithi012.stderr: 1: (()+0x964c88) [0x55cde8901c88]
2019-02-13T03:49:18.610 INFO:tasks.ceph.mon.c.smithi012.stderr: 2: (()+0x12890) [0x7fab1e772890]
2019-02-13T03:49:18.610 INFO:tasks.ceph.mon.c.smithi012.stderr: 3: (gsignal()+0xc7) [0x7fab1cbf6e97]
2019-02-13T03:49:18.610 INFO:tasks.ceph.mon.c.smithi012.stderr: 4: (abort()+0x141) [0x7fab1cbf8801]
2019-02-13T03:49:18.610 INFO:tasks.ceph.mon.c.smithi012.stderr: 5: (()+0x8c8fb) [0x7fab1d5eb8fb]
2019-02-13T03:49:18.611 INFO:tasks.ceph.mon.c.smithi012.stderr: 6: (()+0x92d3a) [0x7fab1d5f1d3a]
2019-02-13T03:49:18.611 INFO:tasks.ceph.mon.c.smithi012.stderr: 7: (()+0x92d95) [0x7fab1d5f1d95]
2019-02-13T03:49:18.611 INFO:tasks.ceph.mon.c.smithi012.stderr: 8: (()+0x92fe8) [0x7fab1d5f1fe8]
2019-02-13T03:49:18.611 INFO:tasks.ceph.mon.c.smithi012.stderr: 9: (()+0x5b9d42) [0x55cde8556d42]
2019-02-13T03:49:18.611 INFO:tasks.ceph.mon.c.smithi012.stderr: 10: (()+0x5c377d) [0x55cde856077d]
2019-02-13T03:49:18.611 INFO:tasks.ceph.mon.c.smithi012.stderr: 11: (PGMapDigest::decode(ceph::buffer::list::iterator&)+0x2e8) [0x55cde83eda68]
2019-02-13T03:49:18.611 INFO:tasks.ceph.mon.c.smithi012.stderr: 12: (MgrStatMonitor::prepare_report(boost::intrusive_ptr<MonOpRequest>)+0x125) [0x55cde853c085]
2019-02-13T03:49:18.612 INFO:tasks.ceph.mon.c.smithi012.stderr: 13: (MgrStatMonitor::prepare_update(boost::intrusive_ptr<MonOpRequest>)+0xe3) [0x55cde853c553]
2019-02-13T03:49:18.612 INFO:tasks.ceph.mon.c.smithi012.stderr: 14: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x9ed) [0x55cde846471d]
2019-02-13T03:49:18.612 INFO:tasks.ceph.mon.c.smithi012.stderr: 15: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x5d2) [0x55cde8331792]
2019-02-13T03:49:18.612 INFO:tasks.ceph.mon.c.smithi012.stderr: 16: (Monitor::_ms_dispatch(Message*)+0x46b) [0x55cde8332a6b]
2019-02-13T03:49:18.613 INFO:tasks.ceph.mon.c.smithi012.stderr: 17: (Monitor::ms_dispatch(Message*)+0x23) [0x55cde8362823]
2019-02-13T03:49:18.613 INFO:tasks.ceph.mon.c.smithi012.stderr: 18: (DispatchQueue::entry()+0xe8a) [0x55cde88aa00a]
2019-02-13T03:49:18.613 INFO:tasks.ceph.mon.c.smithi012.stderr: 19: (DispatchQueue::DispatchThread::entry()+0xd) [0x55cde864cefd]
2019-02-13T03:49:18.613 INFO:tasks.ceph.mon.c.smithi012.stderr: 20: (()+0x76db) [0x7fab1e7676db]
2019-02-13T03:49:18.613 INFO:tasks.ceph.mon.c.smithi012.stderr: 21: (clone()+0x3f) [0x7fab1ccd988f]

/a/sage-2019-02-13_03:14:34-upgrade:luminous-x-wip-v2-upgrade-distro-basic-smithi/3582463

or,

2019-02-13T04:22:45.901 INFO:tasks.ceph.mon.c.smithi025.stderr:terminate called after throwing an instance of 'ceph::buffer::malformed_input'
2019-02-13T04:22:45.901 INFO:tasks.ceph.mon.c.smithi025.stderr:  what():  buffer::malformed_input: void pool_stat_t::decode(ceph::buffer::list::iterator&) no longer understand old encoding version 6 < 253
2019-02-13T04:22:45.902 INFO:tasks.ceph.mon.c.smithi025.stderr:*** Caught signal (Aborted) **
2019-02-13T04:22:45.902 INFO:tasks.ceph.mon.c.smithi025.stderr: in thread 7f5895161700 thread_name:ms_dispatch
2019-02-13T04:22:45.904 INFO:tasks.ceph.mon.c.smithi025.stderr: ceph version 12.2.11-35-g1f910bc (1f910bc2cde041d4c472ed9fde8b1c1ab21826f1) luminous (stable)
2019-02-13T04:22:45.904 INFO:tasks.ceph.mon.c.smithi025.stderr: 1: (()+0x95f281) [0x55ab89ea4281]
2019-02-13T04:22:45.904 INFO:tasks.ceph.mon.c.smithi025.stderr: 2: (()+0xf5d0) [0x7f589dedb5d0]
2019-02-13T04:22:45.904 INFO:tasks.ceph.mon.c.smithi025.stderr: 3: (gsignal()+0x37) [0x7f589b000207]
2019-02-13T04:22:45.904 INFO:tasks.ceph.mon.c.smithi025.stderr: 4: (abort()+0x148) [0x7f589b0018f8]
2019-02-13T04:22:45.904 INFO:tasks.ceph.mon.c.smithi025.stderr: 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f589b90f7d5]
2019-02-13T04:22:45.905 INFO:tasks.ceph.mon.c.smithi025.stderr: 6: (()+0x5e746) [0x7f589b90d746]
2019-02-13T04:22:45.905 INFO:tasks.ceph.mon.c.smithi025.stderr: 7: (()+0x5e773) [0x7f589b90d773]
2019-02-13T04:22:45.905 INFO:tasks.ceph.mon.c.smithi025.stderr: 8: (()+0x5e993) [0x7f589b90d993]
2019-02-13T04:22:45.905 INFO:tasks.ceph.mon.c.smithi025.stderr: 9: (pool_stat_t::decode(ceph::buffer::list::iterator&)+0x4d2) [0x55ab89cc0d02]
2019-02-13T04:22:45.905 INFO:tasks.ceph.mon.c.smithi025.stderr: 10: (PGMapDigest::decode(ceph::buffer::list::iterator&)+0x50c) [0x55ab899fb60c]
2019-02-13T04:22:45.905 INFO:tasks.ceph.mon.c.smithi025.stderr: 11: (MgrStatMonitor::prepare_report(boost::intrusive_ptr<MonOpRequest>)+0x72) [0x55ab89b38a92]
2019-02-13T04:22:45.905 INFO:tasks.ceph.mon.c.smithi025.stderr: 12: (MgrStatMonitor::prepare_update(boost::intrusive_ptr<MonOpRequest>)+0xbf) [0x55ab89b38ecf]
2019-02-13T04:22:45.906 INFO:tasks.ceph.mon.c.smithi025.stderr: 13: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0xaf8) [0x55ab89a695a8]
2019-02-13T04:22:45.906 INFO:tasks.ceph.mon.c.smithi025.stderr: 14: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x51f) [0x55ab899430ff]
2019-02-13T04:22:45.906 INFO:tasks.ceph.mon.c.smithi025.stderr: 15: (Monitor::_ms_dispatch(Message*)+0x7eb) [0x55ab8994477b]
2019-02-13T04:22:45.906 INFO:tasks.ceph.mon.c.smithi025.stderr: 16: (Monitor::handle_forward(boost::intrusive_ptr<MonOpRequest>)+0xa8d) [0x55ab8994608d]
2019-02-13T04:22:45.906 INFO:tasks.ceph.mon.c.smithi025.stderr: 17: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0xdbd) [0x55ab8994399d]
2019-02-13T04:22:45.907 INFO:tasks.ceph.mon.c.smithi025.stderr: 18: (Monitor::_ms_dispatch(Message*)+0x7eb) [0x55ab8994477b]
2019-02-13T04:22:45.907 INFO:tasks.ceph.mon.c.smithi025.stderr: 19: (Monitor::ms_dispatch(Message*)+0x23) [0x55ab89970fc3]
2019-02-13T04:22:45.907 INFO:tasks.ceph.mon.c.smithi025.stderr: 20: (DispatchQueue::entry()+0x792) [0x55ab89e4e402]

/a/sage-2019-02-13_03:14:34-upgrade:luminous-x-wip-v2-upgrade-distro-basic-smithi/3582463

Related issues

Copied to RADOS - Backport #38342: mimic: luminous->(mimic,nautilus): PGMapDigest decode error on luminous end Resolved

History

#1 Updated by Sage Weil 11 months ago

  • Subject changed from luminous->nautilus: PGMapDigest decode error on luminous end to luminous->(mimic,nautilus): PGMapDigest decode error on luminous end

This appears to be broken since mimic, and triggers if you upgrade a mgr before all mons are upgrade.

We call encode_digest in mgr/DaemonServer.cc, with all features:

      // FIXME: no easy way to get mon features here.  this will do for
      // now, though, as long as we don't make a backward-incompat change.
      pg_map.encode_digest(osdmap, m->get_data(), CEPH_FEATURES_ALL);
      dout(10) << pg_map << dendl;

...but the digest encoding changed luminous->mimic:
  if (v >= 2) {
    encode(num_pg_by_state, bl);
  } else {
    uint32_t n = num_pg_by_state.size();
    encode(n, bl);
    for (auto p : num_pg_by_state) {
      encode((uint32_t)p.first, bl);
      encode(p.second, bl);
    }
  }

because the num_pg_by_state key went from int32_t to int64_t. lame!

#2 Updated by Sage Weil 11 months ago

  • Status changed from 12 to Fix Under Review
  • Backport set to mimic

fixed by a patch to MMonMgrDigest in https://github.com/ceph/ceph/pull/26389

#3 Updated by Sage Weil 11 months ago

  • Status changed from Fix Under Review to Pending Backport

the commit to backport to mimic is e4ae368ff7a5396194f8bdd5692429af5457998b

#4 Updated by Nathan Cutler 11 months ago

  • Copied to Backport #38342: mimic: luminous->(mimic,nautilus): PGMapDigest decode error on luminous end added

#5 Updated by Sage Weil 11 months ago

  • Status changed from Pending Backport to Fix Under Review

#6 Updated by Nathan Cutler 11 months ago

  • Status changed from Fix Under Review to Resolved

Also available in: Atom PDF