Project

General

Profile

Actions

Bug #21171

closed

bluestore: aio submission deadlock

Added by Sage Weil over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
luminous
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

- thread a holds deferred_submit_lock, blocks on aio submission (queue is full)
- thread b holds deferred_lock, blocks taking deferred_submit_lock
- aio completion handler blocks on deferred_lock, cannot drain aio queue.


Related issues 7 (0 open7 closed)

Related to RADOS - Bug #21246: bluestore: hang while replaying deferred ios from journalResolvedSage Weil09/05/2017

Actions
Related to RADOS - Bug #21180: Bluestore throttler causes down OSDResolved08/30/2017

Actions
Related to RADOS - Bug #21314: Ceph OSDs crashing in BlueStore::queue_transactions() using ECDuplicate09/08/2017

Actions
Related to bluestore - Bug #19511: bluestore overwhelms aio queueResolvedSage Weil04/06/2017

Actions
Related to mgr - Bug #20222: v12.0.3 Luminous bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60Duplicate06/08/2017

Actions
Related to RADOS - Bug #21475: 12.2.0 bluestore - OSD down/crash " internal heartbeat not healthy, dropping ping requestDuplicate09/20/2017

Actions
Copied to RADOS - Backport #21325: luminous: bluestore: aio submission deadlockResolvedSage WeilActions
Actions #1

Updated by Sage Weil over 6 years ago

  • Description updated (diff)
Actions #2

Updated by Sage Weil over 6 years ago

  • Status changed from 12 to Fix Under Review
Actions #3

Updated by Joao Eduardo Luis over 6 years ago

Sage, is there an identifiable behavior when this happens? Do the osds die, or is IO simply forever blocked?

Actions #4

Updated by Sage Weil over 6 years ago

  • Related to Bug #21246: bluestore: hang while replaying deferred ios from journal added
Actions #5

Updated by Sage Weil over 6 years ago

  • Related to Bug #21180: Bluestore throttler causes down OSD added
Actions #6

Updated by Sage Weil over 6 years ago

There wsa also an aio submission bug that dropped ios on the floor. it was consistently reproducible with

make ceph_test_objectstore && rm -rf bluestore*test*dir c && CEPH_ARGS="--log-file c --no-log-to-stderr --debug-bluestore 20 --debug-bdev 20 --bdev-debug-aio --bdev-aio-max-queue-depth 16 --bluestore-cache-trim-interval .05" bin/ceph_test_objectstore  --gtest_filter=*Syn*/2 --gtest_filter=ObjectStore/StoreTestSpecificAUSize.SyntheticMatrixCompression/2

on an nvme. that bug is also fixed by the pr.

Actions #7

Updated by Sage Weil over 6 years ago

  • Backport set to luminous
Actions #8

Updated by Sage Weil over 6 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #9

Updated by Sage Weil over 6 years ago

  • Related to Bug #21314: Ceph OSDs crashing in BlueStore::queue_transactions() using EC added
Actions #10

Updated by Nathan Cutler over 6 years ago

  • Copied to Backport #21325: luminous: bluestore: aio submission deadlock added
Actions #11

Updated by Sage Weil over 6 years ago

  • Status changed from Pending Backport to Resolved
Actions #12

Updated by Sage Weil over 6 years ago

  • Related to Bug #19511: bluestore overwhelms aio queue added
Actions #13

Updated by Bob Bobington over 6 years ago

Since my issue (http://tracker.ceph.com/issues/21314) was marked as a dupe of this and I haven't received a response to the updates on that issue in a week, thought I'd add here as well: The fixes given haven't lead to any improvement for me. I still consistently see the same problems.

I've tried applying this fix as well as adding some of the workarounds suggested but my OSDs still crash with the same messages.

Actions #14

Updated by Sage Weil over 6 years ago

  • Related to Bug #20222: v12.0.3 Luminous bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60 added
Actions #15

Updated by Sage Weil over 6 years ago

  • Related to Bug #21475: 12.2.0 bluestore - OSD down/crash " internal heartbeat not healthy, dropping ping request added
Actions

Also available in: Atom PDF