Bug #36473
hung osd_repop, bluestore committed but failed to trigger repop_commit
0%
Description
/a/sage-2018-10-16_18:31:27-rados-wip-sage-testing-2018-10-16-0724-distro-basic-smithi/3148851
Usually after the bluestre commit succeeds the op wq wakes up and calls something like
2018-10-16 20:57:10.468 7f26725c0700 10 osd.2 pg_epoch: 45 pg[1.6( v 45'14646 (45'14623,45'14646] local-lis/les=29/30 n=12628 ec=9/9 lis/c 29/29 les/c/f 30/30/0 26/29/29) [1,2] r=1 lpr=29 luod=0'0 lua=25'12628 crt=45'14646 lcod 45'14632 active mbc={}] repop_commit on op osd_repop(osd.1.0:64467 1.6 e45/29 1:64f1278e:::benchmark_data_smithi176_34325_object9600:head v 45'14634) v2, sending commit to osd.1 2018-10-16 20:57:10.468 7f26725c0700 20 osd.2 45 share_map_peer 0x55f95b2c5080 already has epoch 45 2018-10-16 20:57:10.468 7f26725c0700 1 -- 172.21.15.176:6806/31292 --> 172.21.15.176:6802/39730 -- osd_repop_reply(osd.1.0:64467 1.6 e45/29 ondisk, result = 0) v2 -- ?+0 0x55f97436a780 con 0x55f95b2c5080
but for this op it didn't:
2018-10-16 20:57:41.245 7f268c5f4700 20 slow op osd_repop(osd.1.0:64485 1.6 e45/29 1:64f950b4:::benchmark_data_smithi176_38726_object47030:head v 45'14652) initiated 2018-10-16 20:57:10.470116 2018-10-16 20:57:41.245 7f268c5f4700 -1 osd.2 45 get_health_metrics reporting 1 slow ops, oldest is osd_repop(osd.1.0:64485 1.6 e45/29 1:64f950b4:::benchmark_data_smithi176_38726_object47030:head v 45'14652)
maybe a race condition on the queue wakeup?
Related issues
History
#1 Updated by Sage Weil over 5 years ago
2018-10-16 20:57:10.472 7f26815de700 20 bluestore(/var/lib/ceph/osd/ceph-2) _kv_finalize_thread kv_committed <0x55f976e3d200> 2018-10-16 20:57:10.472 7f26815de700 20 bluestore(/var/lib/ceph/osd/ceph-2) _kv_finalize_thread deferred_stable <> 2018-10-16 20:57:10.472 7f26815de700 10 bluestore(/var/lib/ceph/osd/ceph-2) _txc_state_proc txc 0x55f976e3d200 kv_submitted 2018-10-16 20:57:10.472 7f26815de700 20 bluestore(/var/lib/ceph/osd/ceph-2) _txc_committed_kv txc 0x55f976e3d200 2018-10-16 20:57:10.472 7f26815de700 10 bluestore(/var/lib/ceph/osd/ceph-2) _txc_state_proc txc 0x55f976e3d200 finishing 2018-10-16 20:57:10.472 7f26815de700 20 bluestore(/var/lib/ceph/osd/ceph-2) _txc_finish 0x55f976e3d200 onodes 0x55f975723440 2018-10-16 20:57:10.472 7f26815de700 20 bluestore(/var/lib/ceph/osd/ceph-2) _txc_finish txc 0x55f976e3d200 done 2018-10-16 20:57:10.472 7f26815de700 20 bluestore(/var/lib/ceph/osd/ceph-2) _txc_finish osr 0x55f95b3ab8c0 q now empty 2018-10-16 20:57:10.472 7f26815de700 10 bluestore(/var/lib/ceph/osd/ceph-2) _txc_release_alloc(sync) 0x55f976e3d200 [] 2018-10-16 20:57:10.472 7f26815de700 10 fbmap_alloc 0x55f95a18ea00 release done 2018-10-16 20:57:10.472 7f26815de700 20 bluestore(/var/lib/ceph/osd/ceph-2) _kv_finalize_thread sleep 2018-10-16 20:57:10.472 7f26725c0700 20 osd.2 op_wq(6) _process empty q, waiting 2018-10-16 20:57:10.472 7f266e5b8700 20 osd.2 op_wq(6) _process empty q, waiting
So.. how could the _process threads not notice that bluestore queued something? Here is the OSD code:
sdata->shard_lock.Lock(); if (sdata->pqueue->empty() && !(is_smallest_thread_index && !sdata->context_queue.empty())) { sdata->sdata_wait_lock.Lock(); if (!sdata->stop_waiting) { dout(20) << __func__ << " empty q, waiting" << dendl;
what if
- thread A: above code checks for empty, and it is
- thread B: bluestore queues the finisher, takes wait_lock, signals
- thread A: takes wait_lock, waits
Maybe we need to retest for empty after taking wait_lock above?
#2 Updated by xie xingguo over 5 years ago
See https://github.com/ceph/ceph/pull/23317#issuecomment-423432234 (should be the same issue):
I also took a close look at the failure case at: http://qa-proxy.ceph.com/teuthology/xxg-2018-09-21_02:02:01-rados-wip-fix-polog-overtrim-distro-basic-smithi/3047956/, which revealed that bluestore were starting dropping callbacks(e.g., on_commit) from high-level callers.
#3 Updated by Sage Weil over 5 years ago
- Status changed from 12 to Resolved
this is presumably https://github.com/ceph/ceph/pull/24761
#4 Updated by Sage Weil over 5 years ago
- Duplicated by Bug #36284: Bluestore might be hanging OSD added
#5 Updated by Dan Hill almost 4 years ago
Mimic (pr#22739 + pr#24269) introduced this race condition, which was fixed in Nautilus (pr#24761).
Was this evaluated for a backport?
#6 Updated by Dan Hill almost 4 years ago
- Status changed from Resolved to Pending Backport
- Backport set to mimic
Tagging for Mimic backport consideration.
#8 Updated by Nathan Cutler almost 4 years ago
- Copied to Backport #45025: mimic: hung osd_repop, bluestore committed but failed to trigger repop_commit added
#9 Updated by Nathan Cutler about 3 years ago
- Status changed from Pending Backport to Resolved
While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".