Project

General

Profile

Actions

Bug #63679

closed

client: handle zero byte sync/async write cases

Added by Dhairya Parmar 5 months ago. Updated 5 months ago.

Status:
Closed
Priority:
Normal
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
quincy,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

A bit of background: libcephfs is now capable of performing async I/O and a new api `Client::ll_preadv_pwritev` is available for it. I'm writing test cases in [0] and trying to cover all the edge cases as I can; while running them locally, one the test cases aimed to test zero bytes async I/O failed at the write call and returned this back trace

/home/dparmar/ceph/src/osdc/Striper.cc: 186: FAILED ceph_assert(len > 0)

 ceph version 18.0.0-7275-g8bc4cf1939c (8bc4cf1939ce76918542dbb5433652de295e5916) reef (dev)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x125) [0x7f7f5602127d]
 2: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x7f7f56021484]
 3: (Striper::file_to_extents(ceph::common::CephContext*, file_layout_t const*, unsigned long, unsigned long, unsigned long, unsigned long, boost::container::small_vector<striper::LightweightObjectExtent, 4ul, void, void>*)+0x31d) [0x55e76c4a0be7]
 4: (Striper::file_to_extents(ceph::common::CephContext*, char const*, file_layout_t const*, unsigned long, unsigned long, unsigned long, std::vector<ObjectExtent, std::allocator<ObjectExtent> >&, unsigned long)+0x79) [0x55e76c4a17e7]
 5: (Striper::file_to_extents(ceph::common::CephContext*, inodeno_t, file_layout_t const*, unsigned long, unsigned long, unsigned long, std::vector<ObjectExtent, std::allocator<ObjectExtent> >&)+0x75) [0x55e76c3bc083]
 6: (ObjectCacher::file_write(ObjectCacher::ObjectSet*, file_layout_t*, SnapContext const&, long, unsigned long, ceph::buffer::v15_2_0::list&, std::chrono::time_point<ceph::real_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, int, Context*, bool)+0x65) [0x55e76c3db4a1]
 7: (Client::_write(Fh*, long, unsigned long, char const*, iovec const*, int, Context*, bool, bool)+0x99b) [0x55e76c3a9639]
 8: (Client::_preadv_pwritev_locked(Fh*, iovec const*, unsigned int, long, bool, bool, Context*, ceph::buffer::v15_2_0::list*, bool, bool)+0x170) [0x55e76c3aa098]
 9: (Client::ll_preadv_pwritev(Fh*, iovec const*, int, long, bool, Context*, ceph::buffer::v15_2_0::list*, bool, bool)+0xbb) [0x55e76c3aa879]
 10: (TestClient_LlreadvLlwritevZeroBytes_Test::TestBody()+0x452) [0x55e76c340e74]
 11: (void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x1b) [0x55e76c431082]
 12: (void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x80) [0x55e76c439019]
 13: (testing::Test::Run()+0xb4) [0x55e76c42ab94]
 14: (testing::TestInfo::Run()+0x104) [0x55e76c42ac9a]
 15: (testing::TestSuite::Run()+0xb2) [0x55e76c42ad4e]
 16: (testing::internal::UnitTestImpl::RunAllTests()+0x36b) [0x55e76c42c1d7]
 17: (bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x1b) [0x55e76c4312f2]
 18: (bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x80) [0x55e76c43945f]
 19: (testing::UnitTest::Run()+0x63) [0x55e76c42ae69]
 20: (RUN_ALL_TESTS()+0x11) [0x55e76c33f85f]
 21: main()
 22: /lib64/libc.so.6(+0x27550) [0x7f7f54c4a550]
 23: __libc_start_main()
 24: _start()

it points to `Striper::file_to_extents`. The assertion that failed was added in 7bfde0a78e2a81ed9dac80b04ed17bfd426fed45. The only purpose i find in the function is assigning its value to `left` which is the loop variable, so it is going to be a no-op if it's value is zero anyways. So do we really this assertion?

[0] https://github.com/ceph/ceph/pull/54435


Related issues 2 (1 open1 closed)

Related to CephFS - Bug #63697: client: zero byte sync write failsIn ProgressDhairya Parmar

Actions
Related to CephFS - Bug #63734: client: handle callback when async io failsResolvedDhairya Parmar

Actions
Actions #1

Updated by Dhairya Parmar 5 months ago

i experimented by removing the assertion, the test case passed but test case execution could not be completed, its stalled during tear down at

2023-11-29T18:25:36.044+0530 7fcb8f4309c0  2 client.4526 cache still has 0+1 items, waiting (for caps to release?)
2023-11-29T18:25:36.968+0530 7fcb6affd6c0 20 client.4526 tick
2023-11-29T18:25:36.968+0530 7fcb6affd6c0 10 client.4526 renew_caps()
2023-11-29T18:25:36.968+0530 7fcb6affd6c0 15 client.4526 renew_caps requesting from mds.0
2023-11-29T18:25:36.968+0530 7fcb6affd6c0 10 client.4526 renew_caps mds.0
2023-11-29T18:25:36.968+0530 7fcb6affd6c0 20 client.4526 collect_and_send_metrics
2023-11-29T18:25:36.968+0530 7fcb6affd6c0 20 client.4526 collect_and_send_global_metrics
2023-11-29T18:25:36.968+0530 7fcb6affd6c0 20 client.4526 trim_cache size 0 max 16384
2023-11-29T18:25:36.968+0530 7fcb6affd6c0 20 client.4526 upkeep thread waiting interval 1.000000000s
2023-11-29T18:25:36.968+0530 7fcb817fa6c0 10 client.4526 handle_client_session client_session(renewcaps seq 2) v1 from mds.0
2023-11-29T18:25:36.968+0530 7fcb817fa6c0 10 client.4526 unmounting: trim pass, size was 0+1
2023-11-29T18:25:36.968+0530 7fcb817fa6c0 20 client.4526 trim_cache size 0 max 16384
2023-11-29T18:25:36.968+0530 7fcb817fa6c0 10 client.4526 unmounting: trim pass, size still 0+1
2023-11-29T18:25:37.968+0530 7fcb6affd6c0 20 client.4526 tick
2023-11-29T18:25:37.968+0530 7fcb6affd6c0 20 client.4526 collect_and_send_metrics
2023-11-29T18:25:37.968+0530 7fcb6affd6c0 20 client.4526 collect_and_send_global_metrics
2023-11-29T18:25:37.968+0530 7fcb6affd6c0 20 client.4526 trim_cache size 0 max 16384
2023-11-29T18:25:37.968+0530 7fcb6affd6c0 20 client.4526 upkeep thread waiting interval 1.000000000s
Actions #2

Updated by Venky Shankar 5 months ago

Does this fail in the sync path? If not, then it could be due to the async changes.

Actions #3

Updated by Dhairya Parmar 5 months ago

Venky Shankar wrote:

Does this fail in the sync path? If not, then it could be due to the async changes.

I see the sync paths using the same helper function, this might've always existed but since there is no test case for this; i can add one and test it out. PR coming out soon

Actions #4

Updated by Xiubo Li 5 months ago

Dhairya Parmar wrote:

i experimented by removing the assertion, the test case passed but test case execution could not be completed, its stalled during tear down at

[...]

It seems waiting Rados to feedback and then wake up it, but since the len is zero and then just missed that I guess.

Actions #5

Updated by Dhairya Parmar 5 months ago

Xiubo Li wrote:

Dhairya Parmar wrote:

i experimented by removing the assertion, the test case passed but test case execution could not be completed, its stalled during tear down at

[...]

It seems waiting Rados to feedback and then wake up it, but since the len is zero and then just missed that I guess.

yeah looks like that

Actions #6

Updated by Dhairya Parmar 5 months ago

from yesterday's conversation with venky during standup; fixing this on client side

Actions #7

Updated by Dhairya Parmar 5 months ago

  • Project changed from RADOS to CephFS
  • Subject changed from osdc: cannot perform zero byte async write to client: handle zero byte sync/async write cases
  • Category changed from Correctness/Safety to Correctness/Safety
  • Assignee set to Dhairya Parmar
  • Target version set to v19.0.0
  • Backport set to quincy,reef
  • Component(FS) Client added
Actions #8

Updated by Dhairya Parmar 5 months ago

  • Related to Bug #63697: client: zero byte sync write fails added
Actions #9

Updated by Dhairya Parmar 5 months ago

  • Related to Bug #63734: client: handle callback when async io fails added
Actions #10

Updated by Dhairya Parmar 5 months ago

  • Status changed from New to Closed
Actions

Also available in: Atom PDF