Project

General

Profile

Bug #21171

bluestore: aio submission deadlock

Added by Sage Weil almost 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
Start date:
08/29/2017
Due date:
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:

Description

- thread a holds deferred_submit_lock, blocks on aio submission (queue is full)
- thread b holds deferred_lock, blocks taking deferred_submit_lock
- aio completion handler blocks on deferred_lock, cannot drain aio queue.


Related issues

Related to RADOS - Bug #21246: bluestore: hang while replaying deferred ios from journal Resolved 09/05/2017
Related to RADOS - Bug #21180: Bluestore throttler causes down OSD Resolved 08/30/2017
Related to RADOS - Bug #21314: Ceph OSDs crashing in BlueStore::queue_transactions() using EC Duplicate 09/08/2017
Related to bluestore - Bug #19511: bluestore overwhelms aio queue Resolved 04/06/2017
Related to mgr - Bug #20222: v12.0.3 Luminous bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60 Duplicate 06/08/2017
Related to RADOS - Bug #21475: 12.2.0 bluestore - OSD down/crash " internal heartbeat not healthy, dropping ping request Duplicate 09/20/2017
Copied to RADOS - Backport #21325: luminous: bluestore: aio submission deadlock Resolved

History

#1 Updated by Sage Weil almost 2 years ago

  • Description updated (diff)

#2 Updated by Sage Weil almost 2 years ago

  • Status changed from Verified to Need Review

#3 Updated by Joao Eduardo Luis almost 2 years ago

Sage, is there an identifiable behavior when this happens? Do the osds die, or is IO simply forever blocked?

#4 Updated by Sage Weil almost 2 years ago

  • Related to Bug #21246: bluestore: hang while replaying deferred ios from journal added

#5 Updated by Sage Weil almost 2 years ago

  • Related to Bug #21180: Bluestore throttler causes down OSD added

#6 Updated by Sage Weil almost 2 years ago

There wsa also an aio submission bug that dropped ios on the floor. it was consistently reproducible with

make ceph_test_objectstore && rm -rf bluestore*test*dir c && CEPH_ARGS="--log-file c --no-log-to-stderr --debug-bluestore 20 --debug-bdev 20 --bdev-debug-aio --bdev-aio-max-queue-depth 16 --bluestore-cache-trim-interval .05" bin/ceph_test_objectstore  --gtest_filter=*Syn*/2 --gtest_filter=ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2

on an nvme. that bug is also fixed by the pr.

#7 Updated by Sage Weil almost 2 years ago

  • Backport set to luminous

#8 Updated by Sage Weil almost 2 years ago

  • Status changed from Need Review to Pending Backport

#9 Updated by Sage Weil almost 2 years ago

  • Related to Bug #21314: Ceph OSDs crashing in BlueStore::queue_transactions() using EC added

#10 Updated by Nathan Cutler almost 2 years ago

  • Copied to Backport #21325: luminous: bluestore: aio submission deadlock added

#11 Updated by Sage Weil almost 2 years ago

  • Status changed from Pending Backport to Resolved

#12 Updated by Sage Weil almost 2 years ago

  • Related to Bug #19511: bluestore overwhelms aio queue added

#13 Updated by Bob Bobington almost 2 years ago

Since my issue (http://tracker.ceph.com/issues/21314) was marked as a dupe of this and I haven't received a response to the updates on that issue in a week, thought I'd add here as well: The fixes given haven't lead to any improvement for me. I still consistently see the same problems.

I've tried applying this fix as well as adding some of the workarounds suggested but my OSDs still crash with the same messages.

#14 Updated by Sage Weil almost 2 years ago

  • Related to Bug #20222: v12.0.3 Luminous bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60 added

#15 Updated by Sage Weil almost 2 years ago

  • Related to Bug #21475: 12.2.0 bluestore - OSD down/crash " internal heartbeat not healthy, dropping ping request added

Also available in: Atom PDF