Actions
Bug #38483
closedFAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
% Done:
0%
Source:
Tags:
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2019-02-26 03:04:15.748 7fefccfc0700 10 osd.4:7._attach_pg 1.7 0x55aec8689000 2019-02-26 03:04:15.748 7fefccfc0700 20 osd.4:7._wake_pg_slot _wake_pg_slot 1.7 to_process <> waiting <> waiting_peering {} ... 2019-02-26 03:04:15.749 7fefbf7a5700 20 osd.4 572 advance_pg 1.7 is merge target, sources are 1.f 2019-02-26 03:04:15.749 7fefbf7a5700 1 osd.4 572 advance_pg 1.f is merge source, target is 1.7 2019-02-26 03:04:15.750 7fefbf7a5700 10 osd.4 572 add_merge_waiter added merge_waiter 1.f for 1.7, have 1/1 ... 2019-02-26 03:04:15.750 7fefbf7a5700 10 osd.4 pg_epoch: 547 pg[1.7( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c 0/0 les/c/f 0/0/0 0/0/0) [6,3,7] r=-1 lpr=547 crt=0'0 unknown mbc={}] merge_from from {1.f=0x55aec9916000} split_bits 3 2019-02-26 03:04:15.750 7fefbf7a5700 10 osd.4 pg_epoch: 547 pg[1.7( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c 0/0 les/c/f 0/0/0 0/0/0) [6,3,7] r=-1 lpr=547 crt=0'0 unknown mbc={}] merge_from target incomplete 2019-02-26 03:04:15.750 7fefbf7a5700 10 osd.4 pg_epoch: 547 pg[1.7( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c 0/0 les/c/f 0/0/0 0/0/0) [6,3,7] r=-1 lpr=547 crt=0'0 unknown mbc={}] merge_from taking source's past_intervals ... 2019-02-26 03:04:15.751 7fefbf7a5700 10 osd.4 572 split_pgs splitting pg[1.7( empty lb MIN (NIBBLEWISE) local-lis/les=0/547 n=0 ec=33/14 lis/c 374/374 les/c/f 547/547/0 536/560/374) [6,3,7] r=-1 lpr=560 pi=[374,560)/2 crt=0'0 unknown NOTIFY mbc={}] into 1.f 2019-02-26 03:04:15.751 7fefbf7a5700 10 osd.4 pg_epoch: 560 pg[1.7( empty lb MIN (bitwise) local-lis/les=0/547 n=0 ec=33/14 lis/c 374/374 les/c/f 547/547/0 536/560/374) [6,3,7] r=-1 lpr=560 pi=[374,560)/2 crt=0'0 unknown NOTIFY mbc={}] release_backoffs [MIN,MAX) 2019-02-26 03:04:15.751 7fefbf7a5700 10 osd.4 572 split_pgs splitting pg[1.7( empty lb MIN (bitwise) local-lis/les=0/547 n=0 ec=33/14 lis/c 374/374 les/c/f 547/547/0 536/560/374) [6,3,7] r=-1 lpr=560 pi=[374,560)/2 crt=0'0 unknown NOTIFY mbc={}] into 1.17 ... 2019-02-26 03:04:15.752 7fefbf7a5700 10 osd.4 572 _finish_splits pg[1.17( empty lb MIN (bitwise) local-lis/les=0/547 n=0 ec=560/14 lis/c 374/374 les/c/f 547/547/0 536/560/374) [6,3,7] r=-1 lpr=0 pi=[374,560)/2 crt=0'0 unknown NOTIFY mbc={}] 2019-02-26 03:04:15.752 7fefbf7a5700 10 osd.4 pg_epoch: 560 pg[1.17( empty lb MIN (bitwise) local-lis/les=0/547 n=0 ec=560/14 lis/c 374/374 les/c/f 547/547/0 536/560/374) [6,3,7] r=-1 lpr=0 pi=[374,560)/2 crt=0'0 unknown NOTIFY mbc={}] handle_initialize 2019-02-26 03:04:15.752 7fefbf7a5700 5 osd.4 pg_epoch: 560 pg[1.17( empty lb MIN (bitwise) local-lis/les=0/547 n=0 ec=560/14 lis/c 374/374 les/c/f 547/547/0 536/560/374) [6,3,7] r=-1 lpr=0 pi=[374,560)/2 crt=0'0 unknown NOTIFY mbc={}] exit Initial 0.001060 0 0.000000 2019-02-26 03:04:15.752 7fefbf7a5700 5 osd.4 pg_epoch: 560 pg[1.17( empty lb MIN (bitwise) local-lis/les=0/547 n=0 ec=560/14 lis/c 374/374 les/c/f 547/547/0 536/560/374) [6,3,7] r=-1 lpr=0 pi=[374,560)/2 crt=0'0 unknown NOTIFY mbc={}] enter Reset 2019-02-26 03:04:15.752 7fefbf7a5700 20 osd.4 pg_epoch: 560 pg[1.17( empty lb MIN (bitwise) local-lis/les=0/547 n=0 ec=560/14 lis/c 374/374 les/c/f 547/547/0 536/560/374) [6,3,7] r=-1 lpr=0 pi=[374,560)/2 crt=0'0 unknown NOTIFY mbc={}] set_last_peering_reset 560 2019-02-26 03:04:15.752 7fefbf7a5700 10 osd.4 pg_epoch: 560 pg[1.17( empty lb MIN (bitwise) local-lis/les=0/547 n=0 ec=560/14 lis/c 374/374 les/c/f 547/547/0 536/560/374) [6,3,7] r=-1 lpr=560 pi=[374,560)/2 crt=0'0 unknown NOTIFY mbc={}] Clearing blocked outgoing recovery messages 2019-02-26 03:04:15.752 7fefbf7a5700 10 osd.4 pg_epoch: 560 pg[1.17( empty lb MIN (bitwise) local-lis/les=0/547 n=0 ec=560/14 lis/c 374/374 les/c/f 547/547/0 536/560/374) [6,3,7] r=-1 lpr=560 pi=[374,560)/2 crt=0'0 unknown NOTIFY mbc={}] Not blocking outgoing recovery messages 2019-02-26 03:04:15.752 7fefbf7a5700 10 osd.4 pg_epoch: 560 pg[1.17( empty lb MIN (bitwise) local-lis/les=0/547 n=0 ec=560/14 lis/c 374/374 les/c/f 547/547/0 536/560/374) [6,3,7] r=-1 lpr=560 pi=[374,560)/2 crt=0'0 unknown NOTIFY mbc={}] null 2019-02-26 03:04:15.752 7fefbf7a5700 15 osd.4 572 enqueue_peering_evt 1.17 epoch_sent: 560 epoch_requested: 560 NullEvt 2019-02-26 03:04:15.752 7fefbf7a5700 20 osd.4 op_wq(7) _enqueue OpQueueItem(1.17 PGPeeringEvent(epoch_sent: 560 epoch_requested: 560 NullEvt) prio 255 cost 10 e560) 2019-02-26 03:04:15.752 7fefbf7a5700 10 osd.4:7.register_and_wake_split_child 1.17 0x55aec8234000 0> 2019-02-26 03:04:15.756 7fefbf7a5700 -1 *** Caught signal (Aborted) ** in thread 7fefbf7a5700 thread_name:tp_osd_tp ceph version 14.1.0-125-g8b98d22 (8b98d22533def4c768359c2efe9496780b036d22) nautilus (dev) 1: (()+0xf5d0) [0x7fefe558e5d0] 2: (gsignal()+0x37) [0x7fefe4385207] 3: (abort()+0x148) [0x7fefe43868f8] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x55aebaf52a1b] 5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x55aebaf52b9a] 6: (OSDShard::register_and_wake_split_child(PG*)+0x7e3) [0x55aebb0bf503] 7: (OSD::_finish_splits(std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >&)+0x121) [0x55aebb0bf671] 8: (Context::complete(int)+0x9) [0x55aebb0c6349] 9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x67c) [0x55aebb0abd7c] 10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) [0x55aebb69f003] 11: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55aebb6a20a0]
/a/sage-2019-02-26_00:43:29-rados-wip-sage-testing-2019-02-25-1642-distro-basic-smithi/3638207
looks like the merge -> split sequence doesn't prime the merge target
Actions