Actions
Bug #46705
closedNegative peer_num_objects crashes osd
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:
0%
Source:
Community (dev)
Tags:
Backport:
octopus,nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
rados
Component(RADOS):
OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Full stack:
ceph version 14.2.10-211-g951f3f726d7 (951f3f726d7ae38e9407dc144a36197de856fa80) nautilus (stable) 1: (()+0xf630) [0x7f3548c4b630] 2: (gsignal()+0x37) [0x7f3547e6f387] 3: (abort()+0x148) [0x7f3547e70a78] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x5619811261f8] 5: (()+0x4ce3d0) [0x5619811263d0] 6: (PG::choose_acting(pg_shard_t&, bool, bool*, bool)+0x148a) [0x5619812bcf5a] 7: (PG::RecoveryState::Recovered::Recovered(boost::statechart::state<PG::RecoveryState::Recovered, PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::my_context)+0x193) [0x5619812eecd3] 8: (PG::RecoveryState::Backfilling::react(PG::Backfilled const&)+0x94) [0x5619812fa794] 9: (boost::statechart::simple_state<PG::RecoveryState::Backfilling, PG::RecoveryState::Active, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0xbd) [0x56198133ef7d] 10: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<boost::statechart::none>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x5a) [0x56198131989a] 11: (PG::do_peering_event(std::shared_ptr<PGPeeringEvent>, PG::RecoveryCtx*)+0x2ca) [0x5619813058ea] 12: (OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x20c) [0x5619812372ac] 13: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x51) [0x5619814bb9a1] 14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x891) [0x56198122b321] 15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c6) [0x56198182bed6] 16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x56198182f450] 17: (()+0x7ea5) [0x7f3548c43ea5] 18: (clone()+0x6d) [0x7f3547f378dd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Log:
-5645> 2020-07-20 04:34:32.067 7f351e329700 5 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] exit Started/Primary/Active/Backfilling 0.090218 5 0.000236 -5642> 2020-07-20 04:34:32.067 7f351e329700 5 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] enter Started/Primary/Active/Recovered -5640> 2020-07-20 04:34:32.067 7f351e329700 10 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] needs_recovery is recovered -5637> 2020-07-20 04:34:32.067 7f351e329700 20 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] _update_calc_stats actingset 5,6 upset 3,6 acting_recovery_backfill 3,5,6 -5636> 2020-07-20 04:34:32.067 7f351e329700 20 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] _update_calc_stats acting [5,6] up [3,6] -5633> 2020-07-20 04:34:32.067 7f351e329700 20 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] _update_calc_stats shard 5 primary objects 0 missing 0 -5632> 2020-07-20 04:34:32.067 7f351e329700 20 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] _update_calc_stats shard 3 objects -1 missing 1 -5631> 2020-07-20 04:34:32.067 7f351e329700 20 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] _update_calc_stats shard 6 objects 0 missing 0 -5630> 2020-07-20 04:34:32.067 7f351e329700 20 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] _update_calc_stats object_location_counts {} -5629> 2020-07-20 04:34:32.067 7f351e329700 20 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] _update_calc_stats missing shard 6 missing= 0 -5628> 2020-07-20 04:34:32.067 7f351e329700 20 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] _update_calc_stats missing shard 3 missing= 1 -5627> 2020-07-20 04:34:32.068 7f351e329700 20 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] _update_calc_stats acting shard 5 missing= 0 -5626> 2020-07-20 04:34:32.068 7f351e329700 20 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] _update_calc_stats degraded 0 -5625> 2020-07-20 04:34:32.068 7f351e329700 20 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] _update_calc_stats misplaced 1 -5624> 2020-07-20 04:34:32.068 7f351e329700 20 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] publish_stats_to_osd reporting purged_snaps [] -5623> 2020-07-20 04:34:32.068 7f351e329700 15 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] publish_stats_to_osd 667:15701 -5622> 2020-07-20 04:34:32.068 7f351e329700 10 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] choose_acting all_info osd.3 6.1f( v 667'8335 (583'5325,667'8335] local-lis/les=665/666 n=-1 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) -5621> 2020-07-20 04:34:32.068 7f351e329700 10 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] choose_acting all_info osd.5 6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) -5620> 2020-07-20 04:34:32.068 7f351e329700 10 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] choose_acting all_info osd.6 6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) -5619> 2020-07-20 04:34:32.068 7f351e329700 10 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] calc_replicated_acting newest update on osd.5 with 6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) restrict_to_up_acting -5618> 2020-07-20 04:34:32.068 7f351e329700 20 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] choose_async_recovery_replicated candidates by cost are: 1,3 -5617> 2020-07-20 04:34:32.068 7f351e329700 20 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] choose_async_recovery_replicated final candidates by cost are: 1,3 -5616> 2020-07-20 04:34:32.068 7f351e329700 20 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] choose_async_recovery_replicated result want=[5,6] async_recovery=3 -5615> 2020-07-20 04:34:32.068 7f351e329700 10 osd.5 pg_epoch: 667 pg[6.1f( v 667'8335 (577'5259,667'8335] local-lis/les=665/666 n=0 ec=546/498 lis/c 665/619 les/c/f 666/620/0 664/665/665) [3,6]/[5,6] backfill=[3] r=0 lpr=665 pi=[619,665)/1 luod=666'8334 crt=667'8335 lcod 666'8333 mlcod 666'8333 active+remapped mbc={255={}}] acting_recovery_backfill is 3,5,6 -10> 2020-07-20 04:34:33.298 7f351e329700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.10-211-g951f3f726d7/rpm/el7/BUILD/ceph-14.2.10-211-g951f3f726d7/src/osd/PG.cc: In function 'bool PG::choose_acting(pg_shard_t&, bool, bool*, bool)' thread 7f351e329700 time 2020-07-20 04:34:33.175080 -10> 2020-07-20 04:34:33.298 7f351e329700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.10-211-g951f3f726d7/rpm/el7/BUILD/ceph-14.2.10-211-g951f3f726d7/src/osd/PG.cc: In function 'bool PG::choose_acting(pg_shard_t&, bool, bool*, bool)' thread 7f351e329700 time 2020-07-20 04:34:33.175080 -9> 2020-07-20 04:34:33.325 7f351e329700 -1 *** Caught signal (Aborted) **
Updated by Kefu Chai almost 4 years ago
- Status changed from New to Pending Backport
- Pull request ID set to 36274
Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #46709: octopus: Negative peer_num_objects crashes osd added
Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #46710: nautilus: Negative peer_num_objects crashes osd added
Updated by Nathan Cutler over 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".
Actions