Bug #38295
luminous->(mimic,nautilus): PGMapDigest decode error on luminous end
Status:
Resolved
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2019-02-13T03:49:18.605 INFO:tasks.ceph.mon.c.smithi012.stderr:terminate called after throwing an instance of 'ceph::buffer::end_of_buffer' 2019-02-13T03:49:18.607 INFO:tasks.ceph.mon.c.smithi012.stderr: what(): buffer::end_of_buffer 2019-02-13T03:49:18.608 INFO:tasks.ceph.mon.c.smithi012.stderr:*** Caught signal (Aborted) ** 2019-02-13T03:49:18.608 INFO:tasks.ceph.mon.c.smithi012.stderr: in thread 7fab1647e700 thread_name:ms_dispatch 2019-02-13T03:49:18.610 INFO:tasks.ceph.mon.c.smithi012.stderr: ceph version 12.2.11-32-ge18688f (e18688fa4ed3217e454662037127b03cb8e34394) luminous (stable) 2019-02-13T03:49:18.610 INFO:tasks.ceph.mon.c.smithi012.stderr: 1: (()+0x964c88) [0x55cde8901c88] 2019-02-13T03:49:18.610 INFO:tasks.ceph.mon.c.smithi012.stderr: 2: (()+0x12890) [0x7fab1e772890] 2019-02-13T03:49:18.610 INFO:tasks.ceph.mon.c.smithi012.stderr: 3: (gsignal()+0xc7) [0x7fab1cbf6e97] 2019-02-13T03:49:18.610 INFO:tasks.ceph.mon.c.smithi012.stderr: 4: (abort()+0x141) [0x7fab1cbf8801] 2019-02-13T03:49:18.610 INFO:tasks.ceph.mon.c.smithi012.stderr: 5: (()+0x8c8fb) [0x7fab1d5eb8fb] 2019-02-13T03:49:18.611 INFO:tasks.ceph.mon.c.smithi012.stderr: 6: (()+0x92d3a) [0x7fab1d5f1d3a] 2019-02-13T03:49:18.611 INFO:tasks.ceph.mon.c.smithi012.stderr: 7: (()+0x92d95) [0x7fab1d5f1d95] 2019-02-13T03:49:18.611 INFO:tasks.ceph.mon.c.smithi012.stderr: 8: (()+0x92fe8) [0x7fab1d5f1fe8] 2019-02-13T03:49:18.611 INFO:tasks.ceph.mon.c.smithi012.stderr: 9: (()+0x5b9d42) [0x55cde8556d42] 2019-02-13T03:49:18.611 INFO:tasks.ceph.mon.c.smithi012.stderr: 10: (()+0x5c377d) [0x55cde856077d] 2019-02-13T03:49:18.611 INFO:tasks.ceph.mon.c.smithi012.stderr: 11: (PGMapDigest::decode(ceph::buffer::list::iterator&)+0x2e8) [0x55cde83eda68] 2019-02-13T03:49:18.611 INFO:tasks.ceph.mon.c.smithi012.stderr: 12: (MgrStatMonitor::prepare_report(boost::intrusive_ptr<MonOpRequest>)+0x125) [0x55cde853c085] 2019-02-13T03:49:18.612 INFO:tasks.ceph.mon.c.smithi012.stderr: 13: (MgrStatMonitor::prepare_update(boost::intrusive_ptr<MonOpRequest>)+0xe3) [0x55cde853c553] 2019-02-13T03:49:18.612 INFO:tasks.ceph.mon.c.smithi012.stderr: 14: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x9ed) [0x55cde846471d] 2019-02-13T03:49:18.612 INFO:tasks.ceph.mon.c.smithi012.stderr: 15: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x5d2) [0x55cde8331792] 2019-02-13T03:49:18.612 INFO:tasks.ceph.mon.c.smithi012.stderr: 16: (Monitor::_ms_dispatch(Message*)+0x46b) [0x55cde8332a6b] 2019-02-13T03:49:18.613 INFO:tasks.ceph.mon.c.smithi012.stderr: 17: (Monitor::ms_dispatch(Message*)+0x23) [0x55cde8362823] 2019-02-13T03:49:18.613 INFO:tasks.ceph.mon.c.smithi012.stderr: 18: (DispatchQueue::entry()+0xe8a) [0x55cde88aa00a] 2019-02-13T03:49:18.613 INFO:tasks.ceph.mon.c.smithi012.stderr: 19: (DispatchQueue::DispatchThread::entry()+0xd) [0x55cde864cefd] 2019-02-13T03:49:18.613 INFO:tasks.ceph.mon.c.smithi012.stderr: 20: (()+0x76db) [0x7fab1e7676db] 2019-02-13T03:49:18.613 INFO:tasks.ceph.mon.c.smithi012.stderr: 21: (clone()+0x3f) [0x7fab1ccd988f]
/a/sage-2019-02-13_03:14:34-upgrade:luminous-x-wip-v2-upgrade-distro-basic-smithi/3582463
or,
2019-02-13T04:22:45.901 INFO:tasks.ceph.mon.c.smithi025.stderr:terminate called after throwing an instance of 'ceph::buffer::malformed_input' 2019-02-13T04:22:45.901 INFO:tasks.ceph.mon.c.smithi025.stderr: what(): buffer::malformed_input: void pool_stat_t::decode(ceph::buffer::list::iterator&) no longer understand old encoding version 6 < 253 2019-02-13T04:22:45.902 INFO:tasks.ceph.mon.c.smithi025.stderr:*** Caught signal (Aborted) ** 2019-02-13T04:22:45.902 INFO:tasks.ceph.mon.c.smithi025.stderr: in thread 7f5895161700 thread_name:ms_dispatch 2019-02-13T04:22:45.904 INFO:tasks.ceph.mon.c.smithi025.stderr: ceph version 12.2.11-35-g1f910bc (1f910bc2cde041d4c472ed9fde8b1c1ab21826f1) luminous (stable) 2019-02-13T04:22:45.904 INFO:tasks.ceph.mon.c.smithi025.stderr: 1: (()+0x95f281) [0x55ab89ea4281] 2019-02-13T04:22:45.904 INFO:tasks.ceph.mon.c.smithi025.stderr: 2: (()+0xf5d0) [0x7f589dedb5d0] 2019-02-13T04:22:45.904 INFO:tasks.ceph.mon.c.smithi025.stderr: 3: (gsignal()+0x37) [0x7f589b000207] 2019-02-13T04:22:45.904 INFO:tasks.ceph.mon.c.smithi025.stderr: 4: (abort()+0x148) [0x7f589b0018f8] 2019-02-13T04:22:45.904 INFO:tasks.ceph.mon.c.smithi025.stderr: 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f589b90f7d5] 2019-02-13T04:22:45.905 INFO:tasks.ceph.mon.c.smithi025.stderr: 6: (()+0x5e746) [0x7f589b90d746] 2019-02-13T04:22:45.905 INFO:tasks.ceph.mon.c.smithi025.stderr: 7: (()+0x5e773) [0x7f589b90d773] 2019-02-13T04:22:45.905 INFO:tasks.ceph.mon.c.smithi025.stderr: 8: (()+0x5e993) [0x7f589b90d993] 2019-02-13T04:22:45.905 INFO:tasks.ceph.mon.c.smithi025.stderr: 9: (pool_stat_t::decode(ceph::buffer::list::iterator&)+0x4d2) [0x55ab89cc0d02] 2019-02-13T04:22:45.905 INFO:tasks.ceph.mon.c.smithi025.stderr: 10: (PGMapDigest::decode(ceph::buffer::list::iterator&)+0x50c) [0x55ab899fb60c] 2019-02-13T04:22:45.905 INFO:tasks.ceph.mon.c.smithi025.stderr: 11: (MgrStatMonitor::prepare_report(boost::intrusive_ptr<MonOpRequest>)+0x72) [0x55ab89b38a92] 2019-02-13T04:22:45.905 INFO:tasks.ceph.mon.c.smithi025.stderr: 12: (MgrStatMonitor::prepare_update(boost::intrusive_ptr<MonOpRequest>)+0xbf) [0x55ab89b38ecf] 2019-02-13T04:22:45.906 INFO:tasks.ceph.mon.c.smithi025.stderr: 13: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0xaf8) [0x55ab89a695a8] 2019-02-13T04:22:45.906 INFO:tasks.ceph.mon.c.smithi025.stderr: 14: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x51f) [0x55ab899430ff] 2019-02-13T04:22:45.906 INFO:tasks.ceph.mon.c.smithi025.stderr: 15: (Monitor::_ms_dispatch(Message*)+0x7eb) [0x55ab8994477b] 2019-02-13T04:22:45.906 INFO:tasks.ceph.mon.c.smithi025.stderr: 16: (Monitor::handle_forward(boost::intrusive_ptr<MonOpRequest>)+0xa8d) [0x55ab8994608d] 2019-02-13T04:22:45.906 INFO:tasks.ceph.mon.c.smithi025.stderr: 17: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0xdbd) [0x55ab8994399d] 2019-02-13T04:22:45.907 INFO:tasks.ceph.mon.c.smithi025.stderr: 18: (Monitor::_ms_dispatch(Message*)+0x7eb) [0x55ab8994477b] 2019-02-13T04:22:45.907 INFO:tasks.ceph.mon.c.smithi025.stderr: 19: (Monitor::ms_dispatch(Message*)+0x23) [0x55ab89970fc3] 2019-02-13T04:22:45.907 INFO:tasks.ceph.mon.c.smithi025.stderr: 20: (DispatchQueue::entry()+0x792) [0x55ab89e4e402]
/a/sage-2019-02-13_03:14:34-upgrade:luminous-x-wip-v2-upgrade-distro-basic-smithi/3582463
Related issues
History
#1 Updated by Sage Weil about 5 years ago
- Subject changed from luminous->nautilus: PGMapDigest decode error on luminous end to luminous->(mimic,nautilus): PGMapDigest decode error on luminous end
This appears to be broken since mimic, and triggers if you upgrade a mgr before all mons are upgrade.
We call encode_digest in mgr/DaemonServer.cc, with all features:
// FIXME: no easy way to get mon features here. this will do for // now, though, as long as we don't make a backward-incompat change. pg_map.encode_digest(osdmap, m->get_data(), CEPH_FEATURES_ALL); dout(10) << pg_map << dendl;
...but the digest encoding changed luminous->mimic:
if (v >= 2) { encode(num_pg_by_state, bl); } else { uint32_t n = num_pg_by_state.size(); encode(n, bl); for (auto p : num_pg_by_state) { encode((uint32_t)p.first, bl); encode(p.second, bl); } }
because the num_pg_by_state key went from int32_t to int64_t. lame!
#2 Updated by Sage Weil about 5 years ago
- Status changed from 12 to Fix Under Review
- Backport set to mimic
fixed by a patch to MMonMgrDigest in https://github.com/ceph/ceph/pull/26389
#3 Updated by Sage Weil about 5 years ago
- Status changed from Fix Under Review to Pending Backport
the commit to backport to mimic is e4ae368ff7a5396194f8bdd5692429af5457998b
#4 Updated by Nathan Cutler about 5 years ago
- Copied to Backport #38342: mimic: luminous->(mimic,nautilus): PGMapDigest decode error on luminous end added
#5 Updated by Sage Weil about 5 years ago
- Status changed from Pending Backport to Fix Under Review
Follow-up fix: https://github.com/ceph/ceph/pull/26636
#6 Updated by Nathan Cutler about 5 years ago
- Status changed from Fix Under Review to Resolved