Project

General

Profile

Actions

Bug #24145

closed

osdmap decode error in rados/standalone/*

Added by Sage Weil almost 6 years ago. Updated almost 6 years ago.

Status:
Duplicate
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2018-05-15T22:58:39.209 INFO:tasks.workunit.client.0.smithi116.stdout:   -15> 2018-05-15 22:50:42.795 7f6d7504f700 10 osd.3 pg_epoch: 45 pg[1.1( empty local-lis/les=37/38 n=0 ec=2/2 lis/c 37/37 les/c/f 38/39/0 37/37/37) [3,0,2] r=0 lpr=39 crt=0'0 mlcod 0'0 unknown mbc={}] handle_advance_map [3,0,2]/[3,0,2] -- 3/3
2018-05-15T22:58:39.209 INFO:tasks.workunit.client.0.smithi116.stdout:   -14> 2018-05-15 22:50:42.795 7f6d7504f700 20 osd.3:1.update_pg_epoch 1.1 45 -> 46
2018-05-15T22:58:39.209 INFO:tasks.workunit.client.0.smithi116.stdout:   -13> 2018-05-15 22:50:42.795 7f6d7504f700 10 osd.3 pg_epoch: 46 pg[1.1( empty local-lis/les=37/38 n=0 ec=2/2 lis/c 37/37 les/c/f 38/39/0 37/37/37) [3,0,2] r=0 lpr=39 crt=0'0 mlcod 0'0 unknown mbc={}] state<Reset>: Reset advmap
2018-05-15T22:58:39.209 INFO:tasks.workunit.client.0.smithi116.stdout:   -12> 2018-05-15 22:50:42.795 7f6d7504f700 10 osd.3 pg_epoch: 46 pg[1.1( empty local-lis/les=37/38 n=0 ec=2/2 lis/c 37/37 les/c/f 38/39/0 37/37/37) [3,0,2] r=0 lpr=39 crt=0'0 mlcod 0'0 unknown mbc={}] check_recovery_sources no source osds () went down
2018-05-15T22:58:39.210 INFO:tasks.workunit.client.0.smithi116.stdout:   -11> 2018-05-15 22:50:42.795 7f6d7504f700 20 osd.3 53 get_map 47 - loading and decoding 0x5564e0a8c400
2018-05-15T22:58:39.210 INFO:tasks.workunit.client.0.smithi116.stdout:   -10> 2018-05-15 22:50:42.795 7f6d7504f700 10 osd.3 53 add_map_bl 47 2690 bytes
2018-05-15T22:58:39.211 INFO:tasks.workunit.client.0.smithi116.stdout:     0> 2018-05-15 22:50:42.807 7f6d7504f700 -1 *** Caught signal (Aborted) **
2018-05-15T22:58:39.211 INFO:tasks.workunit.client.0.smithi116.stdout: in thread 7f6d7504f700 thread_name:tp_osd_tp
2018-05-15T22:58:39.211 INFO:tasks.workunit.client.0.smithi116.stdout:
2018-05-15T22:58:39.211 INFO:tasks.workunit.client.0.smithi116.stdout: ceph version 13.1.0-100-g33c5549 (33c55492ab1ace07e76616c794b63f09b18b52ea) mimic (rc)
2018-05-15T22:58:39.211 INFO:tasks.workunit.client.0.smithi116.stdout: 1: (()+0x915bc0) [0x5564de4ccbc0]
2018-05-15T22:58:39.211 INFO:tasks.workunit.client.0.smithi116.stdout: 2: (()+0x11390) [0x7f6d97c92390]
2018-05-15T22:58:39.211 INFO:tasks.workunit.client.0.smithi116.stdout: 3: (gsignal()+0x38) [0x7f6d973df428]
2018-05-15T22:58:39.211 INFO:tasks.workunit.client.0.smithi116.stdout: 4: (abort()+0x16a) [0x7f6d973e102a]
2018-05-15T22:58:39.212 INFO:tasks.workunit.client.0.smithi116.stdout: 5: (__gnu_cxx::__verbose_terminate_handler()+0x135) [0x7f6d996c7de5]
2018-05-15T22:58:39.212 INFO:tasks.workunit.client.0.smithi116.stdout: 6: (__cxxabiv1::__terminate(void (*)())+0x6) [0x7f6d9962f5e6]
2018-05-15T22:58:39.212 INFO:tasks.workunit.client.0.smithi116.stdout: 7: (()+0x734631) [0x7f6d9962f631]
2018-05-15T22:58:39.212 INFO:tasks.workunit.client.0.smithi116.stdout: 8: (()+0x735d24) [0x7f6d99630d24]
2018-05-15T22:58:39.212 INFO:tasks.workunit.client.0.smithi116.stdout: 9: (OSDMap::decode(ceph::buffer::list::iterator&)+0x1915) [0x7f6d9935d985]
2018-05-15T22:58:39.212 INFO:tasks.workunit.client.0.smithi116.stdout: 10: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f6d9935ece1]
2018-05-15T22:58:39.212 INFO:tasks.workunit.client.0.smithi116.stdout: 11: (OSDService::try_get_map(unsigned int)+0x508) [0x5564ddf83918]
2018-05-15T22:58:39.212 INFO:tasks.workunit.client.0.smithi116.stdout: 12: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PG::RecoveryCtx*)+0x19d) [0x5564ddf8787d]
2018-05-15T22:58:39.212 INFO:tasks.workunit.client.0.smithi116.stdout: 13: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x1a1) [0x5564ddf88041]
2018-05-15T22:58:39.213 INFO:tasks.workunit.client.0.smithi116.stdout: 14: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x50) [0x5564de1f0c70]
2018-05-15T22:58:39.213 INFO:tasks.workunit.client.0.smithi116.stdout: 15: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x590) [0x5564ddf97840]
2018-05-15T22:58:39.213 INFO:tasks.workunit.client.0.smithi116.stdout: 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x46e) [0x7f6d991e406e]
2018-05-15T22:58:39.213 INFO:tasks.workunit.client.0.smithi116.stdout: 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f6d991e60f0]

/a/sage-2018-05-14_21:04:26-rados-wip-sage2-testing-2018-05-14-1426-distro-basic-smithi/2532970

The core wouldn't let me look at the buffer, but we did appear to crash on the

  DECODE_START_LEGACY_COMPAT_LEN(8, 7, 7, bl); // wrapper

line at the top of OSDMap::decode(). hrm.

Related issues 1 (0 open1 closed)

Is duplicate of RADOS - Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.shResolvedDavid Zafman03/28/2018

Actions
Actions #1

Updated by Sage Weil almost 6 years ago

  • Subject changed from osdmap decode error in rados/standalone/erasure-code.yaml to osdmap decode error in rados/standalone/*
  • Priority changed from Normal to High
2018-05-24T11:29:55.027 INFO:tasks.workunit.client.0.smithi016.stderr:terminate called after throwing an instance of 'ceph::buffer::malformed_input'
2018-05-24T11:29:55.027 INFO:tasks.workunit.client.0.smithi016.stderr:  what():  buffer::malformed_input: void OSDMap::decode(ceph::buffer::list::iterator&) no longer understand old encoding version 8 < 48
2018-05-24T11:29:55.028 INFO:tasks.workunit.client.0.smithi016.stderr:*** Caught signal (Aborted) **
2018-05-24T11:29:55.028 INFO:tasks.workunit.client.0.smithi016.stderr: in thread 7f5fe9c8c200 thread_name:ceph-osd
2018-05-24T11:29:55.033 INFO:tasks.workunit.client.0.smithi016.stderr: ceph version 13.1.1-65-ge8d43bc (e8d43bc1d0fca786ae27bd1ce8e3dcac6d87c961) mimic (rc)
2018-05-24T11:29:55.033 INFO:tasks.workunit.client.0.smithi016.stderr: 1: (()+0x913840) [0x5597aa33b840]
2018-05-24T11:29:55.033 INFO:tasks.workunit.client.0.smithi016.stderr: 2: (()+0x11390) [0x7f5fdfa81390]
2018-05-24T11:29:55.033 INFO:tasks.workunit.client.0.smithi016.stderr: 3: (gsignal()+0x38) [0x7f5fdf1ce428]
2018-05-24T11:29:55.034 INFO:tasks.workunit.client.0.smithi016.stderr: 4: (abort()+0x16a) [0x7f5fdf1d002a]
2018-05-24T11:29:55.034 INFO:tasks.workunit.client.0.smithi016.stderr: 5: (__gnu_cxx::__verbose_terminate_handler()+0x135) [0x7f5fe14b64f5]
2018-05-24T11:29:55.034 INFO:tasks.workunit.client.0.smithi016.stderr: 6: (__cxxabiv1::__terminate(void (*)())+0x6) [0x7f5fe141dcf6]
2018-05-24T11:29:55.034 INFO:tasks.workunit.client.0.smithi016.stderr: 7: (()+0x733d41) [0x7f5fe141dd41]
2018-05-24T11:29:55.034 INFO:tasks.workunit.client.0.smithi016.stderr: 8: (()+0x735434) [0x7f5fe141f434]
2018-05-24T11:29:55.034 INFO:tasks.workunit.client.0.smithi016.stderr: 9: (OSDMap::decode(ceph::buffer::list::iterator&)+0x1915) [0x7f5fe114a325]
2018-05-24T11:29:55.034 INFO:tasks.workunit.client.0.smithi016.stderr: 10: (OSDMap::decode(ceph::buffer::list&)+0x31) [0x7f5fe114b681]
2018-05-24T11:29:55.035 INFO:tasks.workunit.client.0.smithi016.stderr: 11: (OSDService::try_get_map(unsigned int)+0x508) [0x5597a9defa28]
2018-05-24T11:29:55.035 INFO:tasks.workunit.client.0.smithi016.stderr: 12: (OSD::load_pgs()+0x460) [0x5597a9df0a10]
2018-05-24T11:29:55.035 INFO:tasks.workunit.client.0.smithi016.stderr: 13: (OSD::init()+0xcd3) [0x5597a9dfbc63]

/a/sage-2018-05-23_14:55:44-rados-wip-sage3-testing-2018-05-22-2126-distro-basic-smithi/2577552
rados/standalone/{supported-random-distro$/{ubuntu_16.04.yaml} workloads/scrub.yaml}
Actions #2

Updated by Kefu Chai almost 6 years ago

  • Status changed from 12 to Duplicate
Actions #3

Updated by Kefu Chai almost 6 years ago

  • Is duplicate of Bug #23492: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh added
Actions

Also available in: Atom PDF