Project

General

Profile

Actions

Bug #37919

closed

osd/ECBackend.cc: 1547: FAILED ceph_assert(!(*m).is_missing(hoid))

Added by Sage Weil over 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous,mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

 -1306> 2019-01-15 10:38:47.842 7f61d0235700 10 osd.6 pg_epoch: 567 pg[4.3s0( v 564'3104 lc 490'2889 (434'100,564'3104] local-lis/les=565/566 n=1028 ec=431/431 lis/c 565/444 les/c/f 566/451/0 532/565/532) [6,5,7]p6(0) r=0 lpr=565 pi=[444,565)/4 rops=1 crt=564'3104 lcod 490'2896 mlcod 0'0 active+degraded m=61 mbc={0
={(0+1)=61,(1+0)=460,(1+1)=153},1={(1+0)=674},2={(0+0)=599,(1+0)=75}}] get_all_avail_shards: checking acting 5(1)
 -1303> 2019-01-15 10:38:47.843 7f61d0235700 10 osd.6 pg_epoch: 567 pg[4.3s0( v 564'3104 lc 490'2889 (434'100,564'3104] local-lis/les=565/566 n=1028 ec=431/431 lis/c 565/444 les/c/f 566/451/0 532/565/532) [6,5,7]p6(0) r=0 lpr=565 pi=[444,565)/4 rops=1 crt=564'3104 lcod 490'2896 mlcod 0'0 active+degraded m=61 mbc={0
={(0+1)=61,(1+0)=460,(1+1)=153},1={(1+0)=674},2={(0+0)=599,(1+0)=75}}] get_all_avail_shards: checking acting 6(0)
 -1299> 2019-01-15 10:38:47.844 7f61d0235700 10 osd.6 pg_epoch: 567 pg[4.3s0( v 564'3104 lc 490'2889 (434'100,564'3104] local-lis/les=565/566 n=1028 ec=431/431 lis/c 565/444 les/c/f 566/451/0 532/565/532) [6,5,7]p6(0) r=0 lpr=565 pi=[444,565)/4 rops=1 crt=564'3104 lcod 490'2896 mlcod 0'0 active+degraded m=61 mbc={0
={(0+1)=61,(1+0)=460,(1+1)=153},1={(1+0)=674},2={(0+0)=599,(1+0)=75}}] get_all_avail_shards: checking acting 7(2)
 -1254> 2019-01-15 10:38:47.845 7f61d0235700 10 osd.6 pg_epoch: 567 pg[4.3s0( v 564'3104 lc 490'2889 (434'100,564'3104] local-lis/les=565/566 n=1028 ec=431/431 lis/c 565/444 les/c/f 566/451/0 532/565/532) [6,5,7]p6(0) r=0 lpr=565 pi=[444,565)/4 rops=1 crt=564'3104 lcod 490'2896 mlcod 0'0 active+degraded m=61 mbc={0
={(0+1)=61,(1+0)=460,(1+1)=153},1={(1+0)=674},2={(0+0)=599,(1+0)=75}}] get_all_avail_shards: checking missing_loc 0(0)
 -1250> 2019-01-15 10:38:47.846 7f61d0235700 10 osd.6 pg_epoch: 567 pg[4.3s0( v 564'3104 lc 490'2889 (434'100,564'3104] local-lis/les=565/566 n=1028 ec=431/431 lis/c 565/444 les/c/f 566/451/0 532/565/532) [6,5,7]p6(0) r=0 lpr=565 pi=[444,565)/4 rops=1 crt=564'3104 lcod 490'2896 mlcod 0'0 active+degraded m=61 mbc={0={(0+1)=61,(1+0)=460,(1+1)=153},1={(1+0)=674},2={(0+0)=599,(1+0)=75}}] get_all_avail_shards: checking missing_loc 5(1)
 -1247> 2019-01-15 10:38:47.846 7f61d0235700 10 osd.6 pg_epoch: 567 pg[4.3s0( v 564'3104 lc 490'2889 (434'100,564'3104] local-lis/les=565/566 n=1028 ec=431/431 lis/c 565/444 les/c/f 566/451/0 532/565/532) [6,5,7]p6(0) r=0 lpr=565 pi=[444,565)/4 rops=1 crt=564'3104 lcod 490'2896 mlcod 0'0 active+degraded m=61 mbc={0={(0+1)=61,(1+0)=460,(1+1)=153},1={(1+0)=674},2={(0+0)=599,(1+0)=75}}] get_all_avail_shards: checking missing_loc 6(0)
 -1245> 2019-01-15 10:38:47.847 7f61d0235700 10 osd.6 pg_epoch: 567 pg[4.3s0( v 564'3104 lc 490'2889 (434'100,564'3104] local-lis/les=565/566 n=1028 ec=431/431 lis/c 565/444 les/c/f 566/451/0 532/565/532) [6,5,7]p6(0) r=0 lpr=565 pi=[444,565)/4 rops=1 crt=564'3104 lcod 490'2896 mlcod 0'0 active+degraded m=61 mbc={0={(0+1)=61,(1+0)=460,(1+1)=153},1={(1+0)=674},2={(0+0)=599,(1+0)=75}}] get_all_avail_shards: checking missing_loc 7(2)
  -182> 2019-01-15 10:38:47.906 7f61d0235700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.1-2590-gea1c8ca/rpm/el7/BUILD/ceph-14.0.1-2590-gea1c8ca/src/osd/ECBackend.cc: In function 'void ECBackend::get_all_avail_shards(const hobject_t&, const std::set<pg_shard_t>&, std::set<int>&, std::map<shard_id_t, pg_shard_t>&, bool)' thread 7f61d0235700 time 2019-01-15 10:38:47.878089
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.1-2590-gea1c8ca/rpm/el7/BUILD/ceph-14.0.1-2590-gea1c8ca/src/osd/ECBackend.cc: 1547: FAILED ceph_assert(!(*m).is_missing(hoid))

 ceph version 14.0.1-2590-gea1c8ca (ea1c8caf95758ce122b97d7d708086b9eff3187f) nautilus (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x55bc80abc070]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x55bc80abc23e]
 3: (ECBackend::get_all_avail_shards(hobject_t const&, std::set<pg_shard_t, std::less<pg_shard_t>, std::allocator<pg_shard_t> > const&, std::set<int, std::less<int>, std::allocator<int> >&, std::map<shard_id_t, pg_shard_t, std::less<shard_id_t>, std::allocator<std::pair<shard_id_t const, pg_shard_t> > >&, bool)+0xc
1c) [0x55bc80f0a5cc]
 4: (ECBackend::get_min_avail_to_read_shards(hobject_t const&, std::set<int, std::less<int>, std::allocator<int> > const&, bool, bool, std::map<pg_shard_t, std::vector<std::pair<int, int>, std::allocator<std::pair<int, int> > >, std::less<pg_shard_t>, std::allocator<std::pair<pg_shard_t const, std::vector<std::pair
<int, int>, std::allocator<std::pair<int, int> > > > > >*)+0x104) [0x55bc80f0a734]
 5: (ECBackend::continue_recovery_op(ECBackend::RecoveryOp&, RecoveryMessages*)+0x313) [0x55bc80f0f433]
 6: (ECBackend::run_recovery_op(PGBackend::RecoveryHandle*, int)+0x1457) [0x55bc80f137e7]
 7: (PrimaryLogPG::maybe_kick_recovery(hobject_t const&)+0x27d) [0x55bc80d66fad]
 8: (PrimaryLogPG::wait_for_degraded_object(hobject_t const&, boost::intrusive_ptr<OpRequest>)+0x48) [0x55bc80d673a8]
 9: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x1949) [0x55bc80da7169]
 10: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xbd4) [0x55bc80daac14]
 11: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1a9) [0x55bc80bf61d9]
 12: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x55bc80e80832]
 13: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xa0c) [0x55bc80c0f7cc]
 14: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) [0x55bc8120ab53]
 15: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55bc8120dbf0]
 16: (()+0x7e25) [0x7f61fea2ee25]
 17: (clone()+0x6d) [0x7f61fd8f7bad]

/a/sage-2019-01-15_05:14:22-rados-wip-sage-testing-2019-01-14-2051-distro-basic-smithi/3464772

Related issues 2 (0 open2 closed)

Copied to RADOS - Backport #38105: luminous: osd/ECBackend.cc: 1547: FAILED ceph_assert(!(*m).is_missing(hoid))ResolvedPrashant DActions
Copied to RADOS - Backport #38106: mimic: osd/ECBackend.cc: 1547: FAILED ceph_assert(!(*m).is_missing(hoid))ResolvedPrashant DActions
Actions #1

Updated by Neha Ojha over 5 years ago

Looks like we are testing with leveldb here, not sure that matters for the purpose of this bug, but we could get rid of that fragment.

rados/thrash-erasure-code/{ceph.yaml clusters/{fixed-2.yaml openstack.yaml} fast/fast.yaml leveldb.yaml msgr-failures/osd-delay.yaml objectstore/filestore-xfs.yaml rados.yaml recovery-overrides/{default.yaml} supported-random-distro$/{centos_latest.yaml} thrashers/morepggrow.yaml thrashosds-health.yaml workloads/ec-radosbench.yaml}

Actions #2

Updated by Neha Ojha about 5 years ago

  • Assignee set to Neha Ojha
Actions #3

Updated by Neha Ojha about 5 years ago

  • Status changed from 12 to Fix Under Review
  • Pull request ID set to 26175
Actions #4

Updated by Neha Ojha about 5 years ago

  • Backport set to luminous,mimic
Actions #5

Updated by Neha Ojha about 5 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #6

Updated by Nathan Cutler about 5 years ago

  • Copied to Backport #38105: luminous: osd/ECBackend.cc: 1547: FAILED ceph_assert(!(*m).is_missing(hoid)) added
Actions #7

Updated by Nathan Cutler about 5 years ago

  • Copied to Backport #38106: mimic: osd/ECBackend.cc: 1547: FAILED ceph_assert(!(*m).is_missing(hoid)) added
Actions #8

Updated by Nathan Cutler about 5 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF