https://tracker.ceph.com/
https://tracker.ceph.com/favicon.ico
2018-11-19T22:48:11Z
Ceph
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=124785
2018-11-19T22:48:11Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Can't reproduce</i></li></ul><p>I'm guessing this was fixed by 450f337d6fd048c8c95a0ec0dec0d97f5474922e</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=129529
2019-02-17T14:16:33Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>Can't reproduce</i> to <i>12</i></li><li><strong>Priority</strong> changed from <i>Normal</i> to <i>Urgent</i></li></ul><p>reproduced, but without logs..</p>
<pre>
2019-02-16T22:22:14.214 INFO:tasks.ceph.osd.2.smithi107.stderr:/build/ceph-14.0.1-3819-g2bd9523/src/osd/OSD.cc: In function 'void OSDShard::register_and_wake_split_child(PG*)' thread 7f7e8557f700 time 2019-02-16 22:22:14.218197
2019-02-16T22:22:14.214 INFO:tasks.ceph.osd.2.smithi107.stderr:/build/ceph-14.0.1-3819-g2bd9523/src/osd/OSD.cc: 10561: FAILED ceph_assert(p != pg_slots.end())
2019-02-16T22:22:14.217 INFO:tasks.ceph.osd.2.smithi107.stderr: ceph version 14.0.1-3819-g2bd9523 (2bd9523756002a98bc3cea0c687fc99c4b8b988a) nautilus (dev)
2019-02-16T22:22:14.217 INFO:tasks.ceph.osd.2.smithi107.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x84d12c]
2019-02-16T22:22:14.217 INFO:tasks.ceph.osd.2.smithi107.stderr: 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x84d307]
2019-02-16T22:22:14.218 INFO:tasks.ceph.osd.2.smithi107.stderr: 3: (OSDShard::register_and_wake_split_child(PG*)+0x7f3) [0x9af683]
2019-02-16T22:22:14.218 INFO:tasks.ceph.osd.2.smithi107.stderr: 4: (OSD::_finish_splits(std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >&)+0x124) [0x9af7f4]
2019-02-16T22:22:14.218 INFO:tasks.ceph.osd.2.smithi107.stderr: 5: (Context::complete(int)+0x9) [0x9b6889]
2019-02-16T22:22:14.218 INFO:tasks.ceph.osd.2.smithi107.stderr: 6: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x6a4) [0x99c064]
2019-02-16T22:22:14.218 INFO:tasks.ceph.osd.2.smithi107.stderr: 7: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0xfb775c]
2019-02-16T22:22:14.218 INFO:tasks.ceph.osd.2.smithi107.stderr: 8: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xfba910]
2019-02-16T22:22:14.219 INFO:tasks.ceph.osd.2.smithi107.stderr: 9: (()+0x76ba) [0x7f7ea4c426ba]
2019-02-16T22:22:14.219 INFO:tasks.ceph.osd.2.smithi107.stderr: 10: (clone()+0x6d) [0x7f7ea424941d]
</pre><br />/a/sage-2019-02-16_18:46:49-rados-wip-sage-testing-2019-02-16-0946-distro-basic-smithi/3601996
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=129596
2019-02-18T17:32:04Z
Neha Ojha
nojha@redhat.com
<ul></ul><p>We have logs here: /a/nojha-2019-02-11_18:58:45-rados:thrash-erasure-code-wip-test-revert-distro-basic-smithi/3575122</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=129608
2019-02-18T21:30:26Z
Sage Weil
sage@newdream.net
<ul></ul><p>aha:<br />during startup, we load pg 2.fs1, but fail to prime it from init():<br /><pre>
2019-02-11 19:51:05.982 7f4735a22c00 0 osd.1 230 load_pgs opened 53 pgs
...
2019-02-11 19:51:05.983 7f4735a22c00 20 osd.1 230 identify_splits_and_merges 1.0 e199 to e230 pg_nums {14=8,28=18,33=28,58=38,160=48,190=58,198=68,207=78}
...
2019-02-11 19:51:05.983 7f4735a22c00 20 osd.1 230 identify_splits_and_merges 3.1fs1 e229 to e230 pg_nums {151=16,167=26,216=36,223=46,230=56}
</pre><br />2.fs1 isn't included there. Only 51 pgs are mentioned, but we loaded 53 of them. We did prime splits for 2 different PGs, though, which modified pg_slots, invalidating our iterator.</p>
<p>see /a/nojha-2019-02-11_18:58:45-rados:thrash-erasure-code-wip-test-revert-distro-basic-smithi/3575122 osd.1</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=129609
2019-02-18T21:31:08Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>12</i> to <i>Fix Under Review</i></li></ul><p><a class="external" href="https://github.com/ceph/ceph/pull/26492">https://github.com/ceph/ceph/pull/26492</a></p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=129724
2019-02-20T13:02:56Z
Sage Weil
sage@newdream.net
<ul><li><strong>Status</strong> changed from <i>Fix Under Review</i> to <i>Resolved</i></li></ul>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=142655
2019-08-08T15:25:02Z
Kefu Chai
tchaikov@gmail.com
<ul><li><strong>Status</strong> changed from <i>Resolved</i> to <i>New</i></li></ul><pre>
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/15.0.0-3600-ge2c05a7/rpm/el
7/BUILD/ceph-15.0.0-3600-ge2c05a7/src/osd/OSD.cc: 10451: FAILED ceph_assert(p != pg_slots.end())
ceph version 15.0.0-3600-ge2c05a7 (e2c05a7110b8a24787480b43d6293549ed2a42f1) octopus (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x560085e6afb1]
2: (()+0x4ee179) [0x560085e6b179]
3: (OSDShard::register_and_wake_split_child(PG*)+0x7ad) [0x560085f8190d]
4: (OSD::_finish_splits(std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >&)+0x1f3) [0x560085f81b63]
5: (Context::complete(int)+0x9) [0x560085f8c549]
6: (void finish_contexts<std::list<Context*, std::allocator<Context*> > >(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x7d) [0x5600863032bd]
7: (C_ContextsBase<Context, Context, std::list<Context*, std::allocator<Context*> > >::complete(int)+0x29) [0x560086303529]
8: (Finisher::finisher_thread_entry()+0x19d) [0x5600864ed20d]
9: (()+0x7dd5) [0x7fbe4154edd5]
10: (clone()+0x6d) [0x7fbe4041502d]
</pre>
<p>/a/kchai-2019-08-08_08:04:33-rados-wip-29537-kefu-distro-basic-smithi/4198166</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=143546
2019-08-20T20:14:38Z
Greg Farnum
gfarnum@redhat.com
<ul><li><strong>Priority</strong> changed from <i>Urgent</i> to <i>Normal</i></li></ul><p>We can bump this priority up if it reappears again.</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=143559
2019-08-20T20:29:42Z
Greg Farnum
gfarnum@redhat.com
<ul><li><strong>Duplicated by</strong> <i><a class="issue tracker-1 status-3 priority-4 priority-default closed" href="/issues/38483">Bug #38483</a>: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)</i> added</li></ul>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=164288
2020-04-28T01:31:17Z
Brad Hubbard
bhubbard@redhat.com
<ul><li><strong>Priority</strong> changed from <i>Normal</i> to <i>Urgent</i></li></ul><p>/a/teuthology-2020-04-26_07:01:02-rados-master-distro-basic-smithi/4986119</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=176531
2020-10-02T21:15:51Z
Neha Ojha
nojha@redhat.com
<ul><li><strong>Priority</strong> changed from <i>Urgent</i> to <i>Normal</i></li></ul><p>Haven't seen this in a while.</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=179341
2020-11-17T19:20:53Z
Neha Ojha
nojha@redhat.com
<ul></ul><p>/a/ksirivad-2020-11-16_07:16:50-rados-wip-mgr-progress-turn-off-option-distro-basic-smithi/5630402 - no logs</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=183226
2021-01-25T23:43:07Z
Neha Ojha
nojha@redhat.com
<ul></ul><p>/a/teuthology-2021-01-23_07:01:02-rados-master-distro-basic-gibba/5819503</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=187094
2021-03-09T22:23:19Z
Neha Ojha
nojha@redhat.com
<ul><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li><li><strong>Backport</strong> set to <i>pacific, octopus, nautilus</i></li></ul><p>/a/yuriw-2021-03-08_21:03:18-rados-wip-yuri5-testing-2021-03-08-1049-pacific-distro-basic-smithi/5947439</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=188334
2021-03-22T21:40:49Z
Neha Ojha
nojha@redhat.com
<ul></ul><p>/a/yuriw-2021-03-19_00:00:55-rados-wip-yuri8-testing-2021-03-18-1502-pacific-distro-basic-smithi/5978982</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=188336
2021-03-22T22:21:47Z
Neha Ojha
nojha@redhat.com
<ul></ul><p>relevant osd.3 logs from yuriw-2021-03-19_00:00:55-rados-wip-yuri8-testing-2021-03-18-1502-pacific-distro-basic-smithi/5978982</p>
<pre>
2021-03-19T16:15:42.602+0000 7f1e7d4ea700 10 osd.3 452 split_pgs splitting pg[1.18( empty local-lis/les=379/380 n=0 ec=134/14 lis/c=379/379 les/c/f=380/380/0 sis=452) [4] r=-1 lpr=452 crt=0'0 mlcod 0'0 unknown NOTIFY mbc={}] into 1.58
2021-03-19T16:15:42.602+0000 7f1e7d4ea700 10 osd.3 452 _make_pg 1.58
2021-03-19T16:15:42.602+0000 7f1e7d4ea700 5 osd.3 pg_epoch: 452 pg[1.58(unlocked)] enter Initial
...
2021-03-19T16:15:42.609+0000 7f1e7d4ea700 10 osd.3 453 _finish_splits pg[1.58( empty local-lis/les=379/380 n=0 ec=452/14 lis/c=379/379 les/c/f=380/380/0 sis=452) [6,5] r=-1 lpr=0 crt=0'0 mlcod 0'0 unknown NOTIFY mbc={}]
2021-03-19T16:15:42.609+0000 7f1e7d4ea700 10 osd.3 pg_epoch: 452 pg[1.58( empty local-lis/les=379/380 n=0 ec=452/14 lis/c=379/379 les/c/f=380/380/0 sis=452) [6,5] r=-1 lpr=0 crt=0'0 mlcod 0'0 unknown NOTIFY mbc={}] handle_initialize
2021-03-19T16:15:42.609+0000 7f1e7d4ea700 5 osd.3 pg_epoch: 452 pg[1.58( empty local-lis/les=379/380 n=0 ec=452/14 lis/c=379/379 les/c/f=380/380/0 sis=452) [6,5] r=-1 lpr=0 crt=0'0 mlcod 0'0 unknown NOTIFY mbc={}] exit Initial 0.006560 0 0.000000
2021-03-19T16:15:42.609+0000 7f1e7d4ea700 5 osd.3 pg_epoch: 452 pg[1.58( empty local-lis/les=379/380 n=0 ec=452/14 lis/c=379/379 les/c/f=380/380/0 sis=452) [6,5] r=-1 lpr=0 crt=0'0 mlcod 0'0 unknown NOTIFY mbc={}] enter Reset
2021-03-19T16:15:42.609+0000 7f1e7d4ea700 20 osd.3 pg_epoch: 452 pg[1.58( empty local-lis/les=379/380 n=0 ec=452/14 lis/c=379/379 les/c/f=380/380/0 sis=452) [6,5] r=-1 lpr=0 crt=0'0 mlcod 0'0 unknown NOTIFY mbc={}] set_last_peering_reset 452
2021-03-19T16:15:42.609+0000 7f1e7d4ea700 10 osd.3 pg_epoch: 452 pg[1.58( empty local-lis/les=379/380 n=0 ec=452/14 lis/c=379/379 les/c/f=380/380/0 sis=452) [6,5] r=-1 lpr=452 crt=0'0 mlcod 0'0 unknown NOTIFY mbc={}] Clearing blocked outgoing recovery messages
2021-03-19T16:15:42.609+0000 7f1e7d4ea700 10 osd.3 pg_epoch: 452 pg[1.58( empty local-lis/les=379/380 n=0 ec=452/14 lis/c=379/379 les/c/f=380/380/0 sis=452) [6,5] r=-1 lpr=452 crt=0'0 mlcod 0'0 unknown NOTIFY mbc={}] Not blocking outgoing recovery messages
2021-03-19T16:15:42.609+0000 7f1e7d4ea700 10 osd.3 pg_epoch: 452 pg[1.58( empty local-lis/les=379/380 n=0 ec=452/14 lis/c=379/379 les/c/f=380/380/0 sis=452) [6,5] r=-1 lpr=452 crt=0'0 mlcod 0'0 unknown NOTIFY mbc={}] null
2021-03-19T16:15:42.609+0000 7f1e7d4ea700 15 osd.3 453 enqueue_peering_evt 1.58 epoch_sent: 452 epoch_requested: 452 NullEvt
2021-03-19T16:15:42.609+0000 7f1e7d4ea700 20 osd.3 op_wq(0) _enqueue OpSchedulerItem(1.58 PGPeeringEvent(epoch_sent: 452 epoch_requested: 452 NullEvt) prio 255 cost 10 e452)
2021-03-19T16:15:42.609+0000 7f1e7d4ea700 10 osd.3:0.register_and_wake_split_child 1.58 0x561e34601000
...
2021-03-19T16:15:42.611+0000 7f1e7d4ea700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.1.0-945-g14d80dbf/rpm/el8/BUILD/ceph-16.1.0-945-g14d80dbf/src/osd/OSD.cc: In function 'void OSDShard::register_and_wake_split_child(PG*)' thread 7f1e7d4ea700 time 2021-03-19T16:15:42.610408+0000
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.1.0-945-g14d80dbf/rpm/el8/BUILD/ceph-16.1.0-945-g14d80dbf/src/osd/OSD.cc: 10551: FAILED ceph_assert(p != pg_slots.end())
ceph version 16.1.0-945-g14d80dbf (14d80dbf937b38fba622cec4b41998d0bd128816) pacific (rc)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x561e1b13475a]
2: ceph-osd(+0x568974) [0x561e1b134974]
3: (OSDShard::register_and_wake_split_child(PG*)+0x810) [0x561e1b26f670]
4: (OSD::_finish_splits(std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >&)+0x2ad) [0x561e1b26f97d]
5: (Context::complete(int)+0xd) [0x561e1b273e0d]
6: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xadc) [0x561e1b258d7c]
7: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x561e1b8bf764]
8: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x561e1b8c2404]
9: /lib64/libpthread.so.0(+0x82de) [0x7f1ea1b2c2de]
10: clone()
</pre>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=188783
2021-03-26T15:04:44Z
Neha Ojha
nojha@redhat.com
<ul></ul><p>/a/yuriw-2021-03-25_20:03:40-rados-wip-yuri8-testing-2021-03-25-1042-pacific-distro-basic-smithi/5999016</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=196534
2021-06-04T21:41:37Z
Greg Farnum
gfarnum@redhat.com
<ul></ul><p><a class="external" href="https://pulpito.ceph.com/gregf-2021-06-03_20:03:04-rados-pacific-mmonjoin-leader-testing-distro-basic-smithi/6150351/">https://pulpito.ceph.com/gregf-2021-06-03_20:03:04-rados-pacific-mmonjoin-leader-testing-distro-basic-smithi/6150351/</a></p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=199864
2021-07-28T10:01:34Z
Kefu Chai
tchaikov@gmail.com
<ul></ul><pre>
2021-07-28T09:04:05.163+0000 7fa0aef14700 20 bluestore(/var/lib/ceph/osd/ceph-1).collection(meta 0x5595b8d4f680) r -2 v.len 0
2021-07-28T09:04:05.163+0000 7fa0aef14700 10 bluestore(/var/lib/ceph/osd/ceph-1) read meta #-1:afd56935:::osdmap.368:0# 0x0~0 = -2
2021-07-28T09:04:05.163+0000 7fa0aef14700 -1 osd.1 403 failed to load OSD map for epoch 368, got 0 bytes
2021-07-28T09:04:05.163+0000 7fa0aef14700 20 osd.1 403 advance_pg missing map 368
2021-07-28T09:04:05.163+0000 7fa0aef14700 20 osd.1 403 get_map 369 - loading and decoding 0x5595b9761700
2021-07-28T09:04:05.163+0000 7fa0ae713700 -1 /build/ceph-17.0.0-6461-g3cfb73ec/src/osd/OSD.cc: In function 'void OSDShard::register_and_wake_split_child(PG*)' thread 7fa0ae713700 time 2021-07-28T09:04:05.
157796+0000
/build/ceph-17.0.0-6461-g3cfb73ec/src/osd/OSD.cc: 10748: FAILED ceph_assert(p != pg_slots.end())
ceph version 17.0.0-6461-g3cfb73ec (3cfb73ec2f3bd8e1d6621a767f6ab72f21d849ae) quincy (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14f) [0x5595b37992c0]
2: ceph-osd(+0xc184d2) [0x5595b37994d2]
3: (OSDShard::register_and_wake_split_child(PG*)+0x7d0) [0x5595b388c110]
4: (OSD::_finish_splits(std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >&)+0x3af) [0x5595b38ad36f]
5: (Context::complete(int)+0xd) [0x5595b38b20ed]
6: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x953) [0x5595b3891323]
7: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x403) [0x5595b3fe0d73]
8: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x5595b3fe3b94]
9: (Thread::_entry_func(void*)+0xd) [0x5595b3fd351d]
10: /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7fa0cb93b609]
11: clone()
</pre>
<p>/a/kchai-2021-07-28_08:37:00-rados-wip-kefu-testing-2021-07-28-1257-distro-basic-smithi/6297932/</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=200799
2021-08-09T17:50:56Z
Neha Ojha
nojha@redhat.com
<ul></ul><p>/a/yuriw-2021-08-06_16:31:19-rados-wip-yuri-master-8.6.21-distro-basic-smithi/6324576</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=202279
2021-09-01T15:55:36Z
Ronen Friedman
rfriedma@redhat.com
<ul></ul><p>Some possibly helpful hints:<br />1. In "my" specific instance, the pg address handed over to register_and_wake_split_child() was null.<br />2. The locking in the calling function (_finish_splits()) took a long time</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=202345
2021-09-02T20:34:30Z
Neha Ojha
nojha@redhat.com
<ul></ul><p>more useful debug logging being added in <a class="external" href="https://github.com/ceph/ceph/pull/42965">https://github.com/ceph/ceph/pull/42965</a></p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=202346
2021-09-02T20:34:58Z
Neha Ojha
nojha@redhat.com
<ul></ul><p>Ronen Friedman wrote:</p>
<blockquote>
<p>Some possibly helpful hints:<br />1. In "my" specific instance, the pg address handed over to register_and_wake_split_child() was null.<br />2. The locking in the calling function (_finish_splits()) took a long time</p>
</blockquote>
<p>Can you provide a link to the failed run?</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=202870
2021-09-09T21:47:37Z
Neha Ojha
nojha@redhat.com
<ul><li><strong>Duplicated by</strong> <i><a class="issue tracker-1 status-10 priority-4 priority-default closed" href="/issues/52149">Bug #52149</a>: crash: void OSDShard::register_and_wake_split_child(PG*): assert(p != pg_slots.end())</i> added</li></ul>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=203554
2021-09-26T05:00:35Z
Ronen Friedman
rfriedma@redhat.com
<ul></ul><p>Neha Ojha wrote:</p>
<blockquote>
<p>Can you provide a link to the failed run?</p>
</blockquote>
<p>Trying to reproduce.</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=208916
2022-01-21T18:39:31Z
Neha Ojha
nojha@redhat.com
<ul><li><strong>Priority</strong> changed from <i>High</i> to <i>Normal</i></li></ul>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=212110
2022-03-17T00:14:45Z
Telemetry Bot
<ul><li><strong>Crash signature (v1)</strong> updated (<a title="View differences" href="/journals/212110/diff?detail_id=223218">diff</a>)</li><li><strong>Crash signature (v2)</strong> updated (<a title="View differences" href="/journals/212110/diff?detail_id=223219">diff</a>)</li><li><strong>Affected Versions</strong> <i>v14.2.11, v14.2.7, v15.2.1, v15.2.11, v15.2.13, v15.2.5, v15.2.6, v15.2.8, v15.2.9</i> added</li></ul><p><a class="external" href="http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=86176dad44ae51d3e7de7eac892f695acedb065563b1a18490481db3635c017a">http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=86176dad44ae51d3e7de7eac892f695acedb065563b1a18490481db3635c017a</a></p>
<p>Assert condition: p != pg_slots.end()<br />Assert function: void OSDShard::register_and_wake_split_child(PG*)</p>
<p>Sanitized backtrace:<br /><pre> OSDShard::register_and_wake_split_child(PG*)
OSD::_finish_splits(std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >&)
Context::complete(int)
OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)
ShardedThreadPool::shardedthreadpool_worker(unsigned int)
ShardedThreadPool::WorkThreadSharded::entry()
clone()
</pre><br />Crash dump sample:<br /><pre>{
"archived": "2021-07-17 11:49:45.087615",
"assert_condition": "p != pg_slots.end()",
"assert_file": "osd/OSD.cc",
"assert_func": "void OSDShard::register_and_wake_split_child(PG*)",
"assert_line": 10321,
"assert_msg": "osd/OSD.cc: In function 'void OSDShard::register_and_wake_split_child(PG*)' thread 7ff20c836700 time 2021-07-17T04:26:15.494347-0700\nosd/OSD.cc: 10321: FAILED ceph_assert(p != pg_slots.end())",
"assert_thread_name": "tp_osd_tp",
"backtrace": [
"(()+0xf630) [0x7ff22dabd630]",
"(gsignal()+0x37) [0x7ff22c8ab387]",
"(abort()+0x148) [0x7ff22c8aca78]",
"(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x19b) [0x55f9736589f2]",
"(()+0x4deb6b) [0x55f973658b6b]",
"(OSDShard::register_and_wake_split_child(PG*)+0x781) [0x55f97376c7b1]",
"(OSD::_finish_splits(std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >&)+0x296) [0x55f97376caa6]",
"(Context::complete(int)+0x9) [0x55f973771c29]",
"(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x14ec) [0x55f97375864c]",
"(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) [0x55f973d44c46]",
"(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55f973d47790]",
"(()+0x7ea5) [0x7ff22dab5ea5]",
"(clone()+0x6d) [0x7ff22c9739fd]"
],
"ceph_version": "15.2.13",
"crash_id": "2021-07-17T11:26:15.505225Z_3117824e-c38e-48a5-b593-adaebe8106a8",
"entity_name": "osd.79ecd90a2b778d5c739dd4b7af798dda206d7c70",
"os_id": "centos",
"os_name": "CentOS Linux",
"os_version": "7 (Core)",
"os_version_id": "7",
"process_name": "ceph-osd",
"stack_sig": "b34e9d76dd14e64b9a2fe50f844fd40ec4279a5d92b937e331cd3cfa8a4f7d41",
"timestamp": "2021-07-17T11:26:15.505225Z",
"utsname_machine": "x86_64",
"utsname_release": "3.10.0-1160.31.1.el7.x86_64",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Thu Jun 10 13:32:12 UTC 2021"
}</pre></p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=212268
2022-03-17T15:52:15Z
Telemetry Bot
<ul><li><strong>Crash signature (v1)</strong> updated (<a title="View differences" href="/journals/212268/diff?detail_id=223572">diff</a>)</li><li><strong>Crash signature (v2)</strong> updated (<a title="View differences" href="/journals/212268/diff?detail_id=223573">diff</a>)</li></ul>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=212521
2022-03-19T01:18:04Z
Telemetry Bot
<ul><li><strong>Crash signature (v1)</strong> updated (<a title="View differences" href="/journals/212521/diff?detail_id=224215">diff</a>)</li><li><strong>Affected Versions</strong> <i>v14.2.15, v15.2.12, v16.2.1, v16.2.4, v16.2.5, v16.2.7</i> added</li></ul>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=214014
2022-04-04T17:33:33Z
Radoslaw Zarzynski
rzarzyns@redhat.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Need More Info</i></li><li><strong>Crash signature (v1)</strong> updated (<a title="View differences" href="/journals/214014/diff?detail_id=226595">diff</a>)</li></ul><p>Waiting for reproducing the issue. See comment <a class="issue tracker-2 status-1 priority-3 priority-lowest" title="Feature: mdsc: mempool for cap writeback? (New)" href="https://tracker.ceph.com/issues/25">#25</a>.</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=221498
2022-07-28T02:14:17Z
Telemetry Bot
<ul><li><strong>Crash signature (v1)</strong> updated (<a title="View differences" href="/journals/221498/diff?detail_id=234685">diff</a>)</li><li><strong>Affected Versions</strong> <i>v15.2.15, v15.2.16, v15.2.7, v17.2.0</i> added</li></ul>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=223698
2022-08-17T19:01:30Z
Radoslaw Zarzynski
rzarzyns@redhat.com
<ul><li><strong>Backport</strong> changed from <i>pacific, octopus, nautilus</i> to <i>pacific,quincy</i></li><li><strong>Crash signature (v1)</strong> updated (<a title="View differences" href="/journals/223698/diff?detail_id=237492">diff</a>)</li></ul>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=223699
2022-08-17T19:05:58Z
Radoslaw Zarzynski
rzarzyns@redhat.com
<ul></ul><p>Although there was a report from Telemetry, we still need more logs (read: a reoccurance at Sepia) which, hopefully, throw some extra light as Ronen introduced more logs into the <code>register_and_wake_split_child()</code>.</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=225810
2022-09-16T15:29:57Z
Neha Ojha
nojha@redhat.com
<ul></ul><p>/a/yuriw-2022-09-15_17:53:16-rados-quincy-release-distro-default-smithi/7034166</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=233759
2023-03-28T11:57:43Z
Nathan Gardiner
<ul></ul><p>I'm hitting this assertion on 16.2.11 on one of my OSDs, here's my logs:</p>
<p>2023-03-28T20:42:28.660+1100 7f5c81aa4700 -1 ./src/osd/OSD.cc: In function 'void OSDShard::register_and_wake_split_child(PG*)' thread 7f5c81aa4700 time 2023-03-28T20:42:28.653968+1100<br />./src/osd/OSD.cc: 10862: FAILED ceph_assert(slot->waiting_for_split.count(epoch))</p>
<pre><code>ceph version 16.2.11 (578f8e68e41b0a98523d0045ef6db90ce6f2e5ab) pacific (stable)<br /> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x124) [0x558c594172ea]<br /> 2: /usr/bin/ceph-osd(+0xac3475) [0x558c59417475]<br /> 3: (OSDShard::register_and_wake_split_child(PG*)+0x354) [0x558c594de204]<br /> 4: (OSD::_finish_splits(std::set&lt;boost::intrusive_ptr&lt;PG&gt;, std::less&lt;boost::intrusive_ptr&lt;PG&gt; >, std::allocator&lt;boost::intrusive_ptr&lt;PG&gt; > >&)+0xe6) [0x558c594e1ac6]<br /> 5: (Context::complete(int)+0x9) [0x558c5951bf19]<br /> 6: (OSD::ShardedOpWQ::handle_oncommits(std::__cxx11::list&lt;Context*, std::allocator&lt;Context*&gt; >&)+0x24) [0x558c5952d4f4]<br /> 7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x256b) [0x558c59501ceb]<br /> 8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x41a) [0x558c59bafaba]<br /> 9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x558c59bb2090]<br /> 10: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7f5c9e430ea7]<br /> 11: clone()</code></pre>
<p>2023-03-28T20:42:28.668+1100 7f5c81aa4700 -1 *<strong>* Caught signal (Aborted) *</strong><br /> in thread 7f5c81aa4700 thread_name:tp_osd_tp</p>
<pre><code>ceph version 16.2.11 (578f8e68e41b0a98523d0045ef6db90ce6f2e5ab) pacific (stable)<br /> 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x13140) [0x7f5c9e43c140]<br /> 2: gsignal()<br /> 3: abort()<br /> 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x16e) [0x558c59417334]<br /> 5: /usr/bin/ceph-osd(+0xac3475) [0x558c59417475]<br /> 6: (OSDShard::register_and_wake_split_child(PG*)+0x354) [0x558c594de204]<br /> 7: (OSD::_finish_splits(std::set&lt;boost::intrusive_ptr&lt;PG&gt;, std::less&lt;boost::intrusive_ptr&lt;PG&gt; >, std::allocator&lt;boost::intrusive_ptr&lt;PG&gt; > >&)+0xe6) [0x558c594e1ac6]<br /> 8: (Context::complete(int)+0x9) [0x558c5951bf19]<br /> 9: (OSD::ShardedOpWQ::handle_oncommits(std::__cxx11::list&lt;Context*, std::allocator&lt;Context*&gt; >&)+0x24) [0x558c5952d4f4]<br /> 10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x256b) [0x558c59501ceb]<br /> 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x41a) [0x558c59bafaba]<br /> 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x558c59bb2090]<br /> 13: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7f5c9e430ea7]<br /> 14: clone()<br /> NOTE: a copy of the executable, or `objdump -rdS &lt;executable&gt;` is needed to interpret this.</code></pre>
<p>--- begin dump of recent events ---</p>
RADOS - Bug #36304: FAILED ceph_assert(p != pg_slots.end()) in OSDShard::register_and_wake_split_child(PG*)
https://tracker.ceph.com/issues/36304?journal_id=233990
2023-03-30T18:02:46Z
Radoslaw Zarzynski
rzarzyns@redhat.com
<ul></ul><p>Hello Nathan!</p>
<p>Do you have a log or a coredump by any chance?</p>