Bug #21171
closed
bluestore: aio submission deadlock
Added by Sage Weil over 6 years ago.
Updated over 6 years ago.
Description
- thread a holds deferred_submit_lock, blocks on aio submission (queue is full)
- thread b holds deferred_lock, blocks taking deferred_submit_lock
- aio completion handler blocks on deferred_lock, cannot drain aio queue.
- Description updated (diff)
- Status changed from 12 to Fix Under Review
Sage, is there an identifiable behavior when this happens? Do the osds die, or is IO simply forever blocked?
- Related to Bug #21246: bluestore: hang while replaying deferred ios from journal added
- Related to Bug #21180: Bluestore throttler causes down OSD added
There wsa also an aio submission bug that dropped ios on the floor. it was consistently reproducible with
make ceph_test_objectstore && rm -rf bluestore*test*dir c && CEPH_ARGS="--log-file c --no-log-to-stderr --debug-bluestore 20 --debug-bdev 20 --bdev-debug-aio --bdev-aio-max-queue-depth 16 --bluestore-cache-trim-interval .05" bin/ceph_test_objectstore --gtest_filter=*Syn*/2 --gtest_filter=ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2
on an nvme. that bug is also fixed by the pr.
- Status changed from Fix Under Review to Pending Backport
- Related to Bug #21314: Ceph OSDs crashing in BlueStore::queue_transactions() using EC added
- Copied to Backport #21325: luminous: bluestore: aio submission deadlock added
- Status changed from Pending Backport to Resolved
- Related to Bug #19511: bluestore overwhelms aio queue added
Since my issue (http://tracker.ceph.com/issues/21314) was marked as a dupe of this and I haven't received a response to the updates on that issue in a week, thought I'd add here as well: The fixes given haven't lead to any improvement for me. I still consistently see the same problems.
I've tried applying this fix as well as adding some of the workarounds suggested but my OSDs still crash with the same messages.
- Related to Bug #20222: v12.0.3 Luminous bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60 added
- Related to Bug #21475: 12.2.0 bluestore - OSD down/crash " internal heartbeat not healthy, dropping ping request added
Also available in: Atom
PDF