Bug #63681: io_context_pool threads deadlock possibility when journaling is enabled - rbd - Ceph

Actions

Copy link

Bug #63681

open

io_context_pool threads deadlock possibility when journaling is enabled

Added by Joshua Baergen 5 months ago. Updated 5 months ago.

Status:

New

Priority:

Normal

Assignee:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

2 - major

Reviewed:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

Ilya discovered that the stress test introduced in https://github.com/ceph/ceph/pull/54377 would sometimes hang when run under Valgrind. I was able to reproduce the hang and get a thread dump via gdb (attached). Notes:

Thread 2 (io_context_pool, trying to complete a journal event): Holds Journal::m_event_lock, trying to acquire JournalMetadata::*m_timer_lock
Thread 32 (io_context_pool, submitting a discard I/O): Trying to acquire Journal::m_event_lock (required for all write I/Os when journaling is enabled)
Thread 39 (safe timer thread, handle_watch_reset): Holds JournalMetadata::*m_timer_lock while doing a synchronous ioctx.watch2() call

The key here is that both io_context_pool threads are consumed waiting for locks because handle_watch_reset() is holding m_timer_lock while it waits for its watch2() call to complete. The issue is that watch2() completion depends on one of the io_context_pool threads being free to process it - thus, deadlock.

I'm not familiar enough with this code to know what the right solution is, but some observations:
1. It's not clear to me that the event lock needs to be held for the entirety of completing an event, since upon completion there should be nothing else referring to that event.
2. Doing a synchronous I/O call in a timer thread seems like a bad idea, especially since the I/O path needs to take the timer lock.

This can be reproduced via: RBD_FEATURES=125 valgrind --tool=memcheck --leak-check=full /ceph_test_librbd --gtest_filter=TestJournalStress.DiscardWithPruneWriteOverlap

It seems that the test frequently pauses with one of the io_context_pool threads at 100% when run under Valgrind, which likely causes a watch timeout and thus the watch reset.

Files

allthreads.txt (48.1 KB) allthreads.txt

gdb thread dump from valgrind run

Joshua Baergen, 11/29/2023 02:50 PM

Actions

Copy link

Updated by Joshua Baergen 5 months ago

Oh, I totally forgot to mention in the description - by increasing librados_thread_count to something high, like 15, the test no longer runs into this deadlock because there's always at least one io_context_pool thread available.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » rbd

Custom queries

Bug #63681

io_context_pool threads deadlock possibility when journaling is enabled

Updated by Joshua Baergen 5 months ago