Bug #63633
openclient: fix copying bufferlist to iovec structures in Client::_read
0%
Description
ceph/src/common/mutex_debug.h: In function 'void ceph::mutex_debug_detail::mutex_debug_impl<<anonymous> >::_pre_unlock() [with bool Recursive = false]' thread 7ffa30c2e9c0 time 2023-11-23T18:40:12.995739+0530
/home/dparmar/CephRepoForRunningTestsLocally/ceph/src/common/mutex_debug.h: 163: FAILED ceph_assert(nlock == 1)
ceph version 18.0.0-7275-g8bc4cf1939c (8bc4cf1939ce76918542dbb5433652de295e5916) reef (dev)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x125) [0x7ffa3162127d]
2: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x7ffa31621484]
3: (ceph::mutex_debug_detail::mutex_debug_impl<false>::_pre_unlock()+0x32) [0x559ce274d4d8]
4: (ceph::mutex_debug_detail::mutex_debug_impl<false>::unlock(bool)+0x10) [0x559ce274d520]
5: (Client::ll_preadv_pwritev(Fh*, iovec const*, int, long, bool, Context*, ceph::buffer::v15_2_0::list*, bool, bool)+0x112) [0x559ce27b32c4]
6: (TestClient_LlreadvLlwritev_Test::TestBody()+0x125e) [0x559ce2749c80]
7: (void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x1b) [0x559ce2839a76]
8: (void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x80) [0x559ce2841a0d]
9: (testing::Test::Run()+0xb4) [0x559ce2833588]
10: (testing::TestInfo::Run()+0x104) [0x559ce283368e]
11: (testing::TestSuite::Run()+0xb2) [0x559ce2833742]
12: (testing::internal::UnitTestImpl::RunAllTests()+0x36b) [0x559ce2834bcb]
13: (bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x1b) [0x559ce2839ce6]
14: (bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x80) [0x559ce2841e53]
15: (testing::UnitTest::Run()+0x63) [0x559ce283385d]
16: (RUN_ALL_TESTS()+0x11) [0x559ce274785f]
17: main()
18: /lib64/libc.so.6(+0x27550) [0x7ffa3024a550]
19: __libc_start_main()
20: _start()
goes into a recursive deadlock. the crash is expected since async I/O with null context makes no sense, we need an assertion that a context is passed along with the async I/O call.
Updated by Venky Shankar 5 months ago
- Assignee set to Dhairya Parmar
- Backport changed from quincy,reef to reef
Updated by Dhairya Parmar 5 months ago
- Status changed from New to In Progress
- Pull request ID set to 54651
Updated by Dhairya Parmar 5 months ago
- Backport changed from reef to quincy,reef
Updated by Dhairya Parmar 5 months ago
we would need this in quincy since all other client trackers (that i created) are having backport set up until quincy
Updated by Dhairya Parmar 5 months ago
- Priority changed from Normal to High
- Pull request ID changed from 54651 to 54808
Updated by Dhairya Parmar 5 months ago
- Subject changed from client: async I/O with null context crashes to client: handle nullptr context in async i/o api
Updated by Dhairya Parmar 5 months ago
- Subject changed from client: handle nullptr context in async i/o api to client: fix copying bufferlist to iovec structures in Client::_read
RCA:
ll_preadv_pwritev should convert to sync call because it uses the same internal interfaces which the sync api uses, I tried testing sync calls and as expected they ran fine which made me to revert back to the test case i wrote in hope of finding some combination of param that might've made things wayward but nothing seemed wrong, however there is a noticeable difference between the sync and async api i.e. bufferlist, the sync api doesn't accept bufferlist as an arg but the async api does, so when i went ahead and removed the bufferlist arg from api call, as i had the intuition, test case passed (yay!), i also found that none of the the apis apart from ll_preadv_pwritev accepts bufferlist (which pretty much explains why we never caught any bugs for this). While re-reading all the async code again i came across this in ceph/src/client/Client.cc:
} else {
bufferlist bl;
int64_t r = _read(fh, offset, totallen, blp ? blp : &bl,
onfinish);
ldout(cct, 3) << "preadv(" << fh << ", " << offset << ") = " << r << dendl;
if (r <= 0) {
if (r < 0 && onfinish != nullptr) {
client_lock.unlock();
onfinish->complete(r);
client_lock.lock();
}
return r;
}
client_lock.unlock();
copy_bufferlist_to_iovec(iov, iovcnt, &bl, r);
client_lock.lock();
return r;
_read is called with blp ? blp : &bl while &bl is being copied to iovec structures, the reason this crash never occured in async api is because we return 0 from here [0]. So, we never reach copy_bufferlist_to_iovec() for an async call and C_Read_Sync_NonBlocking does the magic of handling it nicely but in sync call, it is bound to fail
Updated by Venky Shankar 3 months ago
- Status changed from In Progress to Pending Backport
- Backport changed from quincy,reef to reef
Dhairya, this should be reef only backport, yes?
Updated by Backport Bot 3 months ago
- Copied to Backport #64238: reef: client: fix copying bufferlist to iovec structures in Client::_read added
Updated by Dhairya Parmar 3 months ago
the backport has been rejected since the changes this patch brings is to the code that doesn't exist in reef. good to mark this resolved?