After investigating the backtrace and logs, we find a deadlock is possible in the following scenario:
1) OPs issued by qemu are queued in the aio_work_queue, and finally an aio_flush. After the aio_flush is processed,the Imagectx's async_ops looks like:
async_ops --> [aio_flush, aio_op_1, ]
|
|---> flush_contexts [C_AsyncCallback(C_AioWrite(AioCompletion))]
2) A snapshot is then created and more OPs arrive. When ThreadPool thread tries to process an OP, the newly create snapshot requires the ImageCtx be refreshed, leading to flush_async_operations(), so the thread blocks waiting for all the OPs in async_ops to complete.
async_ops --> [aio_op_2, ..., aio_flush, aio_op_1, ]
| |
| |---> flush_contexts [C_AsyncCallback(C_AioWrite(AioCompletion))]
|
|-> flush_contexts [C_SafeCond]
3) aio_op_1 completes, triggering C_AsyncCallback's finish(), which then queue C_AioWrite(AioCompletion) to the ImageCtx's op_work_queue, waiting for the ThreadPool to process it but the thread blocks in flush_async_operations(). So the deadlock occurs.
After investigating the backtrace and logs, we find a deadlock is possible in the following scenario:
1) OPs issued by qemu are queued in the aio_work_queue, and finally an aio_flush. After the aio_flush is processed,the Imagectx's async_ops looks like:
async_ops --> [aio_flush, aio_op_1, ]
|
|---> flush_contexts [C_AsyncCallback(C_AioWrite(AioCompletion))]
2) A snapshot is created and more OPs arrive. When ThreadPool thread tries to process an OP, the newly create snapshot requires the ImageCtx be refreshed, leading to flush_async_operations(), so the thread blocks waiting for all the OPs in async_ops to complete.
async_ops --> [aio_op_2, ..., aio_flush, aio_op_1, ]
| |
| |---> flush_contexts [C_AsyncCallback(C_AioWrite(AioCompletion))]
|
|-> flush_contexts [C_SafeCond]
3) aio_op_1 completes, triggering C_AsyncCallback's finish(), which then queue C_AioWrite(AioCompletion) to the ImageCtx's op_work_queue, waiting for the ThreadPool to process it but the thread blocks in flush_async_operations(). So the deadlock occurs.
What we did to break the deadlock is to queue C_AioWrite(AioCompletion) using the ImageCtx's writeback_handler in C_AsyncCallback's finish() as https://github.com/ceph/ceph/pull/8011 did.
PR: https://github.com/ceph/ceph/pull/17045