Project

General

Profile

Actions

Bug #63633

open

client: fix copying bufferlist to iovec structures in Client::_read

Added by Dhairya Parmar 5 months ago. Updated 3 months ago.

Status:
Pending Backport
Priority:
High
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Development
Tags:
backport_processed
Backport:
reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

ceph/src/common/mutex_debug.h: In function 'void ceph::mutex_debug_detail::mutex_debug_impl<<anonymous> >::_pre_unlock() [with bool Recursive = false]' thread 7ffa30c2e9c0 time 2023-11-23T18:40:12.995739+0530
/home/dparmar/CephRepoForRunningTestsLocally/ceph/src/common/mutex_debug.h: 163: FAILED ceph_assert(nlock == 1)

 ceph version 18.0.0-7275-g8bc4cf1939c (8bc4cf1939ce76918542dbb5433652de295e5916) reef (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x125) [0x7ffa3162127d]
 2: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x7ffa31621484]
 3: (ceph::mutex_debug_detail::mutex_debug_impl<false>::_pre_unlock()+0x32) [0x559ce274d4d8]
 4: (ceph::mutex_debug_detail::mutex_debug_impl<false>::unlock(bool)+0x10) [0x559ce274d520]
 5: (Client::ll_preadv_pwritev(Fh*, iovec const*, int, long, bool, Context*, ceph::buffer::v15_2_0::list*, bool, bool)+0x112) [0x559ce27b32c4]
 6: (TestClient_LlreadvLlwritev_Test::TestBody()+0x125e) [0x559ce2749c80]
 7: (void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x1b) [0x559ce2839a76]
 8: (void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x80) [0x559ce2841a0d]
 9: (testing::Test::Run()+0xb4) [0x559ce2833588]
 10: (testing::TestInfo::Run()+0x104) [0x559ce283368e]
 11: (testing::TestSuite::Run()+0xb2) [0x559ce2833742]
 12: (testing::internal::UnitTestImpl::RunAllTests()+0x36b) [0x559ce2834bcb]
 13: (bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x1b) [0x559ce2839ce6]
 14: (bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x80) [0x559ce2841e53]
 15: (testing::UnitTest::Run()+0x63) [0x559ce283385d]
 16: (RUN_ALL_TESTS()+0x11) [0x559ce274785f]
 17: main()
 18: /lib64/libc.so.6(+0x27550) [0x7ffa3024a550]
 19: __libc_start_main()
 20: _start()

goes into a recursive deadlock. the crash is expected since async I/O with null context makes no sense, we need an assertion that a context is passed along with the async I/O call.


Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #64238: reef: client: fix copying bufferlist to iovec structures in Client::_readRejectedDhairya ParmarActions
Actions #1

Updated by Venky Shankar 5 months ago

  • Assignee set to Dhairya Parmar
  • Backport changed from quincy,reef to reef
Actions #2

Updated by Dhairya Parmar 5 months ago

  • Status changed from New to In Progress
  • Pull request ID set to 54651
Actions #3

Updated by Dhairya Parmar 5 months ago

  • Backport changed from reef to quincy,reef
Actions #4

Updated by Dhairya Parmar 5 months ago

we would need this in quincy since all other client trackers (that i created) are having backport set up until quincy

Actions #5

Updated by Dhairya Parmar 5 months ago

  • Priority changed from Normal to High
  • Pull request ID changed from 54651 to 54808
Actions #6

Updated by Dhairya Parmar 5 months ago

  • Subject changed from client: async I/O with null context crashes to client: handle nullptr context in async i/o api
Actions #7

Updated by Dhairya Parmar 5 months ago

  • Subject changed from client: handle nullptr context in async i/o api to client: fix copying bufferlist to iovec structures in Client::_read

RCA:

ll_preadv_pwritev should convert to sync call because it uses the same internal interfaces which the sync api uses, I tried testing sync calls and as expected they ran fine which made me to revert back to the test case i wrote in hope of finding some combination of param that might've made things wayward but nothing seemed wrong, however there is a noticeable difference between the sync and async api i.e. bufferlist, the sync api doesn't accept bufferlist as an arg but the async api does, so when i went ahead and removed the bufferlist arg from api call, as i had the intuition, test case passed (yay!), i also found that none of the the apis apart from ll_preadv_pwritev accepts bufferlist (which pretty much explains why we never caught any bugs for this). While re-reading all the async code again i came across this in ceph/src/client/Client.cc:

 } else { 
     bufferlist bl; 
     int64_t r = _read(fh, offset, totallen, blp ? blp : &bl, 
                       onfinish); 
     ldout(cct, 3) << "preadv(" << fh << ", " <<  offset << ") = " << r << dendl; 
     if (r <= 0) { 
       if (r < 0 && onfinish != nullptr) { 
         client_lock.unlock(); 
         onfinish->complete(r); 
         client_lock.lock(); 
       } 
       return r; 
     } 

     client_lock.unlock(); 
     copy_bufferlist_to_iovec(iov, iovcnt, &bl, r); 
     client_lock.lock(); 
     return r; 

_read is called with blp ? blp : &bl while &bl is being copied to iovec structures, the reason this crash never occured in async api is because we return 0 from here [0]. So, we never reach copy_bufferlist_to_iovec() for an async call and C_Read_Sync_NonBlocking does the magic of handling it nicely but in sync call, it is bound to fail

[0] https://github.com/ceph/ceph/blob/b4102be9e0e9a6421f3bab12ee1286d56a0154ec/src/client/Client.cc#L10913

Actions #8

Updated by Venky Shankar 3 months ago

  • Status changed from In Progress to Pending Backport
  • Backport changed from quincy,reef to reef

Dhairya, this should be reef only backport, yes?

Actions #9

Updated by Backport Bot 3 months ago

  • Copied to Backport #64238: reef: client: fix copying bufferlist to iovec structures in Client::_read added
Actions #10

Updated by Backport Bot 3 months ago

  • Tags set to backport_processed
Actions #11

Updated by Dhairya Parmar 3 months ago

the backport has been rejected since the changes this patch brings is to the code that doesn't exist in reef. good to mark this resolved?

Actions

Also available in: Atom PDF