Actions
Bug #20381
closedbluestore: deferred aio submission can deadlock with completion
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
BlueStore
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2017-06-21T19:57:57.268 INFO:tasks.ceph.osd.1.smithi161.stderr:2017-06-21 19:57:57.260862 7fe21bf73700 -1 bdev(0x7fe239a43e00 /var/lib/ceph/osd/ceph-1/block) aio_submit retries 10 2017-06-21T19:57:57.272 INFO:tasks.ceph.osd.1.smithi161.stderr:2017-06-21 19:57:57.261862 7fe21bf73700 -1 bdev(0x7fe239a43e00 /var/lib/ceph/osd/ceph-1/block) aio_submit retries 1 2017-06-21T19:57:57.310 INFO:tasks.ceph.osd.1.smithi161.stderr:2017-06-21 19:57:57.304575 7fe21bf73700 -1 bdev(0x7fe239a43e00 /var/lib/ceph/osd/ceph-1/block) aio_submit retries 8 2017-06-21T19:57:59.365 INFO:tasks.ceph.osd.1.smithi161.stderr:2017-06-21 19:57:59.359316 7fe21bf73700 -1 bdev(0x7fe239a43e00 /var/lib/ceph/osd/ceph-1/block) aio_submit retries 14 2017-06-21T19:57:59.598 INFO:tasks.ceph.osd.1.smithi161.stderr:2017-06-21 19:57:59.590665 7fe21bf73700 -1 bdev(0x7fe239a43e00 /var/lib/ceph/osd/ceph-1/block) aio_submit retries 8 2017-06-21T19:58:00.113 INFO:tasks.ceph.osd.1.smithi161.stderr:2017-06-21 19:58:00.103374 7fe21bf73700 -1 bdev(0x7fe239a43e00 /var/lib/ceph/osd/ceph-1/block) aio_submit retries 12 2017-06-21T19:58:03.961 INFO:tasks.ceph.osd.1.smithi161.stderr:2017-06-21 19:58:03.943611 7fe220634700 -1 bdev(0x7fe239a43e00 /var/lib/ceph/osd/ceph-1/block) aio_submit retries 7 2017-06-21T19:58:08.308 INFO:tasks.ceph.osd.1.smithi161.stderr:2017-06-21 19:58:08.297819 7fe21bf73700 -1 bdev(0x7fe239a43e00 /var/lib/ceph/osd/ceph-1/block) aio_submit retries 16 2017-06-21T19:58:08.308 INFO:tasks.ceph.osd.1.smithi161.stderr:2017-06-21 19:58:08.297822 7fe21bf73700 -1 bdev(0x7fe239a43e00 /var/lib/ceph/osd/ceph-1/block) aio submit got (11) Resource temporarily unavailable 2017-06-21T19:58:09.211 INFO:tasks.ceph.osd.1.smithi161.stderr:/build/ceph-12.0.3-2007-g12a1512/src/os/bluestore/KernelDevice.cc: In function 'virtual void KernelDevice::aio_submit(IOContext*)' thread 7fe21bf73700 time 2017-06-21 19:58:09.205751 2017-06-21T19:58:09.211 INFO:tasks.ceph.osd.1.smithi161.stderr:/build/ceph-12.0.3-2007-g12a1512/src/os/bluestore/KernelDevice.cc: 529: FAILED assert(r == 0) Assertion: /build/ceph-12.0.3-2007-g12a1512/src/os/bluestore/KernelDevice.cc: 529: FAILED assert(r == 0) ceph version 12.0.3-2007-g12a1512 (12a15124517d574a84a552ee2354738a066f45e4) luminous (dev) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x10e) [0x7fe22e6bffde] 2: (KernelDevice::aio_submit(IOContext*)+0x5dd) [0x7fe22e6643dd] 3: (BlueStore::_deferred_submit(BlueStore::OpSequencer*)+0x5d3) [0x7fe22e53f7b3] 4: (BlueStore::_deferred_try_submit()+0x1cf) [0x7fe22e53ff8f] 5: (BlueStore::_kv_finalize_thread()+0x815) [0x7fe22e569715] 6: (BlueStore::KVFinalizeThread::entry()+0xd) [0x7fe22e5bd02d] 7: (()+0x8184) [0x7fe22c1dc184] 8: (clone()+0x6d) [0x7fe22b2cc37d]
Updated by Nathan Cutler almost 7 years ago
- Is duplicate of Bug #20379: bluestore assertion (KernelDevice.cc: 529: FAILED assert(r == 0)) added
Updated by Nathan Cutler almost 7 years ago
The backtrace looks exactly like the one in #20379 - duplicate?
Updated by John Spray almost 7 years ago
- Status changed from New to Duplicate
This ticket was opened first, but let's close it in favour of 20381 because that one has the integration test logs.
Updated by John Spray almost 7 years ago
- Is duplicate of deleted (Bug #20379: bluestore assertion (KernelDevice.cc: 529: FAILED assert(r == 0)))
Updated by John Spray almost 7 years ago
- Has duplicate Bug #20379: bluestore assertion (KernelDevice.cc: 529: FAILED assert(r == 0)) added
Updated by John Spray almost 7 years ago
- Status changed from Duplicate to New
Turns out when something is marked as a duplicate in redmine, it automatically closes this one when I close the other one! Reopening.
Updated by Sage Weil almost 7 years ago
- Description updated (diff)
- Status changed from New to 12
Updated by Sage Weil almost 7 years ago
- Assignee set to Sage Weil
aio completion thread blocking on deferred_lock:
void BlueStore::_deferred_aio_finish(OpSequencer *osr) { dout(10) << __func__ << " osr " << osr << dendl; assert(osr->deferred_running); DeferredBatch *b = osr->deferred_running; { std::lock_guard<std::mutex> l(deferred_lock); assert(osr->deferred_running == b); osr->deferred_running = nullptr; if (!osr->deferred_pending) { auto q = deferred_queue.iterator_to(*osr); deferred_queue.erase(q); } else if (deferred_aggressive) { _deferred_submit(osr); } }
while another thread is holding that lock and trying to submit deferred aio in _deferred_try_submit() > _deferred_submit(osr) -> bdev>aio_submit.
Updated by Sage Weil almost 7 years ago
- Subject changed from bluestore assertion (KernelDevice.cc: 529: FAILED assert(r == 0)) to bluestore: deferred aio submission can deadlock with completion
Updated by Sage Weil almost 7 years ago
Easy workaround is to make the aio queue really big.
Harder fix to do some complicated locking juggling. I worry about making the code even more complex, though. For now I'm just going to increase the aio queue (drastically).
Updated by Sage Weil almost 7 years ago
- Status changed from 12 to 7
Updated by Sage Weil almost 7 years ago
- Status changed from 7 to Resolved
Actions