Project

General

Profile

Actions

Bug #23716

closed

osd/ECUtil.cc: 25: FAILED assert(i->second.length() == total_data_size) (on upgrade from luminous)

Added by Sage Weil about 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

     0> 2018-04-13 20:46:14.928 7efece7db700 -1 /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.0.2-1252-g8e82189/rpm/el7/BUILD/ceph-13.0.2-1252-g8e82189/src/osd/ECUtil.cc: In function 'int ECUtil::decode(const ECUtil::stripe_info_t&, ceph::ErasureCodeInterfaceRef&, std::map<int, ceph::buffer::list>&, ceph::bufferlist*)' thread 7efece7db700 time 2018-04-13 20:46:14.925440
/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.0.2-1252-g8e82189/rpm/el7/BUILD/ceph-13.0.2-1252-g8e82189/src/osd/ECUtil.cc: 25: FAILED assert(i->second.length() == total_data_size)

 ceph version 13.0.2-1252-g8e82189 (8e821891ca4919ac02cf9ac2689581542602476b) mimic (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0xff) [0x7efef46ba18f]
 2: (()+0x285377) [0x7efef46ba377]
 3: (ECUtil::decode(ECUtil::stripe_info_t const&, std::shared_ptr<ceph::ErasureCodeInterface>&, std::map<int, ceph::buffer::list, std::less<int>, std::allocator<std::pair<int const, ceph::buffer::list> > >&, ceph::buffer::list*)+0x306) [0x561684916bf6]
 4: (CallClientContexts::finish(std::pair<RecoveryMessages*, ECBackend::read_result_t&>&)+0x2be) [0x5616849ce3de]
 5: (ECBackend::complete_read_op(ECBackend::ReadOp&, RecoveryMessages*)+0x83) [0x5616849a3c83]
 6: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&, RecoveryMessages*, ZTracer::Trace const&)+0xc2a) [0x5616849aac4a]
 7: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0xb8) [0x5616849ba8e8]
 8: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x97) [0x5616848a9c47]
 9: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x696) [0x561684858a46]
 10: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1e7) [0x5616846bc127]
 11: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x561684929f02]
 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x975) [0x5616846d99e5]
 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x3df) [0x7efef46bff5f]
 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7efef46c0b20]

/a/sage-2018-04-13_18:50:42-upgrade:luminous-x-master-distro-basic-smithi/2395012
Actions #1

Updated by Sage Weil about 6 years ago

   -23> 2018-04-13 20:46:14.924 7efece7db700 20 osd.0 pg_epoch: 106 pg[2.7s0( v 106'320 (0'0,106'320] local-lis/les=25/26 n=28 ec=22/22 lis/c 25/25 les/c/f 26/26/0 25/25/25) [0,4,2]p0(0) r=0 lpr=25 crt=106'320 lcod 106'319 mlcod 106'319 active+clean ps=[1~18,1a~4,1f~6,26~8,2f~2,32~1,34~5,3b~6,42~1,45~1,49~3]] handle_sub_read_reply Complete: ReadOp(tid=676, to_read={2:edc54d16:::smithi00926773-45 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head=read_request_t(to_read=[0,8192,0], need={0(0)=[0,1],4(1)=[0,1]}, want_attrs=0)}, complete={2:edc54d16:::smithi00926773-45 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head=read_result_t(r=0, errors={}, noattrs, returned=(0, 8192, [0(0),4096, 4(1),0]))}, priority=127, obj_to_source={2:edc54d16:::smithi00926773-45 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head=0(0),4(1)}, source_to_obj={0(0)=2:edc54d16:::smithi00926773-45 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head,4(1)=2:edc54d16:::smithi00926773-45 oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo:head}, in_progress=)

it looks like osd.4 (which is luminous) returned 0 bytes instead of 4096.
25          assert(i->second.length() == total_data_size);
(gdb) p total_data_size
$1 = 4096

i can't examine hte iterator, unfortunately, or the map itself
(gdb) p to_decode
Python Exception <class 'gdb.error'> There is no member or method named _M_value_field.: 
$2 = std::map with 2 elements
Actions #2

Updated by Josh Durgin about 6 years ago

  • Assignee set to Josh Durgin
Actions #3

Updated by Sage Weil about 6 years ago

  • Status changed from 12 to Resolved

This seems to be resolved. My guess is it's fallout from https://github.com/ceph/ceph/pull/21604

Actions

Also available in: Atom PDF