Project

General

Profile

Actions

Bug #63896

open

client: contiguous read fails for non-contiguous write (in async I/O api)

Added by Dhairya Parmar 4 months ago. Updated 3 months ago.

Status:
In Progress
Priority:
Normal
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Tags:
Backport:
reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

when waiting for onfinish, with this test case(trivial before and after parts removed):

  for(int i = 0; i < NUM_BUF; ++i) {
    writefinish.reset(new C_SaferCond("test-nonblocking-writefinish-non-contiguous"));
    rc = client->ll_preadv_pwritev(fh, current_iov++, 1, i * NUM_BUF * 10,
                                   true, writefinish.get(), nullptr);
    ASSERT_EQ(rc, 0);
    total_bytes_written += writefinish->wait();
  }
  ASSERT_EQ(total_bytes_written, bytes_to_write);

  readfinish.reset(new C_SaferCond("test-nonblocking-readfinish-contiguous"));
  rc = client->ll_preadv_pwritev(fh, iov_in_contiguous, NUM_BUF, 0, false,
                                 readfinish.get(), &bl);
  ASSERT_EQ(rc, 0);
  total_bytes_read = readfinish->wait();
  // should be less since the data written is non-contiguous but the read
  // was contiguous
  ASSERT_LE(total_bytes_read, bytes_to_write);

the executions stalls, check out 1_execution_stalled_contiguous_read linked below

BUT if the code to wait for context to complete is removed, i.e.:

  for(int i = 0; i < NUM_BUF; ++i) {
    writefinish.reset(new C_SaferCond("test-nonblocking-writefinish-non-contiguous"));
    rc = client->ll_preadv_pwritev(fh, current_iov++, 1, i * NUM_BUF * 10,
                                   true, writefinish.get(), nullptr);
    ASSERT_EQ(rc, 0);
    total_bytes_written += writefinish->wait();
  }
  ASSERT_EQ(total_bytes_written, bytes_to_write);

  readfinish.reset(new C_SaferCond("test-nonblocking-readfinish-contiguous"));
  rc = client->ll_preadv_pwritev(fh, iov_in_contiguous, NUM_BUF, 0, false,
                                 readfinish.get(), &bl);
  ASSERT_EQ(rc, 0);

execution aborts with a core dump with:

2023-12-28T19:53:08.780+0530 7f0f47a4a9c0 20 client.4423 awaiting reply|forward|kick on 0x7fff7814eaa0
terminate called after throwing an instance of 'std::system_error'
  what():  Invalid argument
*** Caught signal (Aborted) **

checkout 2_core_dumped(attached below) for more info

Ideally these should've just returned bytes less than total bytes written since the data between offsets must be 0 but it failed to handle this case


Files

1_execution_stalled_contiguous_read (102 KB) 1_execution_stalled_contiguous_read log for case where the contiguous read execution stalls (for non-contiguous write) waiting on context completion Dhairya Parmar, 12/28/2023 02:17 PM
2_core_dumped (268 KB) 2_core_dumped log for case where we don't wait for the context completion to take place when performing contiguous read for non-contiguous write Dhairya Parmar, 12/28/2023 02:25 PM
Actions #1

Updated by Dhairya Parmar 4 months ago

  • Description updated (diff)
Actions #2

Updated by Dhairya Parmar 4 months ago

  • Subject changed from client: contiguous read for non-contiguous write to client: contiguous read fails for non-contiguous write
Actions #3

Updated by Dhairya Parmar 4 months ago

from my findings, the read operation needs to follow the same layout as of the write operation i.e. read must take place at the locations where the data was written otherwise the read call just stalls. This is in contradiction of my expectation - return whatever the data that is read (which is in this case less than what was written).

So my question is: is the existing behaviour (of not allowing contiguous read for non-contiguous write) an anticipated thing or is this actually a bug and it should be what i mentioned above?

Actions #4

Updated by Dhairya Parmar 4 months ago

this fails with a backtrace when done synchronously

    -1> 2024-01-02T17:15:56.824+0530 7f0c392299c0 -1 /home/dparmar/CephRepoForRunningTestsLocally/ceph/src/common/mutex_debug.h: In function 'void ceph::mutex_debug_detail::mutex_debug_impl<<anonymous> >::_pre_unlock() [with bool Recursive = false]' thread 7f0c392299c0 time 2024-01-02T17:15:56.822549+0530
/home/dparmar/CephRepoForRunningTestsLocally/ceph/src/common/mutex_debug.h: 163: FAILED ceph_assert(nlock == 1)

 ceph version 18.0.0-7275-g8bc4cf1939c (8bc4cf1939ce76918542dbb5433652de295e5916) reef (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x125) [0x7f0c39c4b726]
 2: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x7f0c39c4b948]
 3: (ceph::mutex_debug_detail::mutex_debug_impl<false>::_pre_unlock()+0x32) [0x55ab11a6448a]
 4: (ceph::mutex_debug_detail::mutex_debug_impl<false>::unlock(bool)+0x10) [0x55ab11a644d2]
 5: (Client::ll_preadv_pwritev(Fh*, iovec const*, int, long, bool, Context*, ceph::buffer::v15_2_0::list*, bool, bool)+0x112) [0x55ab11aced2c]
 6: (TestClient_LlreadvLlwritevNonContiguous_Test::TestBody()+0x677) [0x55ab11a6105b]
 7: (void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x1b) [0x55ab11b5b3fb]
 8: (void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x80) [0x55ab11b6393c]
 9: (testing::Test::Run()+0xb4) [0x55ab11b548aa]
 10: (testing::TestInfo::Run()+0x104) [0x55ab11b549b0]
 11: (testing::TestSuite::Run()+0xb2) [0x55ab11b54a64]
 12: (testing::internal::UnitTestImpl::RunAllTests()+0x36b) [0x55ab11b5611b]
 13: (bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x1b) [0x55ab11b5b6a7]
 14: (bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x80) [0x55ab11b63e99]
 15: (testing::UnitTest::Run()+0x63) [0x55ab11b54b7f]
 16: (RUN_ALL_TESTS()+0x11) [0x55ab11a5f80b]
 17: main()
 18: /lib64/libc.so.6(+0x27b8a) [0x7f0c38849b8a]
 19: __libc_start_main()
 20: _start()

     0> 2024-01-02T17:15:56.826+0530 7f0c392299c0 -1 *** Caught signal (Aborted) **
 in thread 7f0c392299c0 thread_name:ceph_test_clien
Actions #5

Updated by Dhairya Parmar 4 months ago

  • Subject changed from client: contiguous read fails for non-contiguous write to client: contiguous read fails for non-contiguous write (in async I/O api)
Actions #6

Updated by Dhairya Parmar 4 months ago

update: it will not return lesser bytes but the same bytes since the RADOS fills up the holes with 0s, but the question still remains the same i.e. the stalling issue with the read call

Actions #7

Updated by Dhairya Parmar 4 months ago

So it seems like the caps ref count is still non-zero after the test case teardown:

2024-01-08T17:26:42.594+0530 7f746a27d9c0 10 client.4342 _put_inode on 0x10000000003.head(faked_ino=0 nref=19 ll_ref=0 cap_refs={4=0,1024=1,2048=0,4096=0,8192=0} open={3=0} mode=100666 size=2016/4194304 nlink=0 btime=2024-01-08T17:26:42.522136+0530 mtime=2024-01-08T17:26:42.568853+0530 ctime=2024-01-08T17:26:42.569355+0530 change_attr=5 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) objectset[0x10000000003 ts 0/0 objects 1 dirty_or_tx 0] 0x7f743800ce00) n = 17
2024-01-08T17:26:42.594+0530 7f746a27d9c0  2 client.4342 cache still has 0+1 items, waiting (for caps to release?)

interesting part is "cap_refs={4=0,1024=1,2048=0,4096=0,8192=0}", 1024 is the file cache cap which seems is not being put anytime which has stalled the read call.

Actions #8

Updated by Venky Shankar 4 months ago

Dhairya Parmar wrote:

So it seems like the caps ref count is still non-zero after the test case teardown:

[...]

interesting part is "cap_refs={4=0,1024=1,2048=0,4096=0,8192=0}", 1024 is the file cache cap which seems is not being put anytime which has stalled the read call.

So, this is the bug we spoke about after stand-up a couple of days ago. This would require putting a cap ref when the async call finishes.

Also, just FYI, the issue about the read call returning zero's (0's) isn't an issue since that the expected behaviour, correct?

Actions #9

Updated by Venky Shankar 4 months ago

  • Category set to Correctness/Safety
  • Status changed from New to Triaged
  • Assignee set to Dhairya Parmar
  • Target version set to v19.0.0
  • Backport set to reef
  • Component(FS) Client added
Actions #10

Updated by Dhairya Parmar 4 months ago

Venky Shankar wrote:

Dhairya Parmar wrote:

So it seems like the caps ref count is still non-zero after the test case teardown:

[...]

interesting part is "cap_refs={4=0,1024=1,2048=0,4096=0,8192=0}", 1024 is the file cache cap which seems is not being put anytime which has stalled the read call.

So, this is the bug we spoke about after stand-up a couple of days ago. This would require putting a cap ref when the async call finishes.

Yes since the managed pointer is of type C_Read_Async_Finisher, the file cache cap needs to be put in C_Read_Async_Finisher::finish before finishing the read call

Also, just FYI, the issue about the read call returning zero's (0's) isn't an issue since that the expected behaviour, correct?

Yup, it's the RADOS which does the memset of all the holes with 0's so the read call(sync or async) will always return you the no. of bytes you requested for, so long story short - the contents will not be the same (comparing with the buffers that were written) but indeed the amount of bytes will be.

Actions #11

Updated by Dhairya Parmar 4 months ago

RCA:

When doing a read like this where the ObjectCacher::file_read could not find the amount of data requested at the given offset, it would return zero; in this case for sync calls the code is pretty straightforward - inode gets a file cache cap ref, waits for the io to finish, put the cap ref and then do readahead but for async call, the code just provides a file cache cap ref and leaves it in the hands of async call finisher (i.e. Client::C_Read_Async_Finisher) to complete rest of the process, now the issue here is that this part of code never decremented the ref count and so here we have a cap leak which ultimately prevents the IO call to return successfully since revoking all the provided caps is a must else we will indefinitely keep on waiting for the caps to be released:

2024-01-08T17:26:42.594+0530 7f746a27d9c0 10 client.4342 _put_inode on 0x10000000003.head(faked_ino=0 nref=19 ll_ref=0 cap_refs={4=0,1024=1,2048=0,4096=0,8192=0} open={3=0} mode=100666 size=2016/4194304 nlink=0 btime=2024-01-08T17:26:42.522136+0530 mtime=2024-01-08T17:26:42.568853+0530 ctime=2024-01-08T17:26:42.569355+0530 change_attr=5 caps=pAsLsXsFsxcrwb(0=pAsLsXsFsxcrwb) objectset[0x10000000003 ts 0/0 objects 1 dirty_or_tx 0] 0x7f743800ce00) n = 17
2024-01-08T17:26:42.594+0530 7f746a27d9c0  2 client.4342 cache still has 0+1 items, waiting (for caps to release?)
Actions #12

Updated by Dhairya Parmar 4 months ago

  • Pull request ID set to 55144
Actions #13

Updated by Dhairya Parmar 4 months ago

  • Regression changed from No to Yes
Actions #14

Updated by Dhairya Parmar 4 months ago

  • Regression changed from Yes to No
Actions #15

Updated by Dhairya Parmar 3 months ago

  • Status changed from Triaged to In Progress
Actions

Also available in: Atom PDF