Bug #55971
closedLibRadosMiscConnectFailure.ConnectFailure test failure
0%
Description
All rados_api_tests in the run failed due to the same reason. This points to a regression.
One of the recent runs (http://pulpito.front.sepia.ceph.com/yuriw-2022-06-03_14:09:08-rados-wip-yuri7-testing-2022-06-02-1633-distro-default-smithi/) didn't have this failure.
/a/yuriw-2022-06-07_23:12:59-rados-wip-yuri7-testing-2022-06-07-1325-distro-default-smithi/6867183
/a/yuriw-2022-06-07_23:12:59-rados-wip-yuri7-testing-2022-06-07-1325-distro-default-smithi/6867203
/a/yuriw-2022-06-07_23:12:59-rados-wip-yuri7-testing-2022-06-07-1325-distro-default-smithi/6867215
/a/yuriw-2022-06-07_23:12:59-rados-wip-yuri7-testing-2022-06-07-1325-distro-default-smithi/6867253
/a/yuriw-2022-06-07_23:12:59-rados-wip-yuri7-testing-2022-06-07-1325-distro-default-smithi/6867284
/a/yuriw-2022-06-07_23:12:59-rados-wip-yuri7-testing-2022-06-07-1325-distro-default-smithi/6867374
/a/yuriw-2022-06-07_23:12:59-rados-wip-yuri7-testing-2022-06-07-1325-distro-default-smithi/6867409
/a/yuriw-2022-06-07_23:12:59-rados-wip-yuri7-testing-2022-06-07-1325-distro-default-smithi/6867412
Failure reason:
Log snippet from /a//a/yuriw-2022-06-07_23:12:59-rados-wip-yuri7-testing-2022-06-07-1325-distro-default-smithi/6867183
2022-06-08T01:26:00.304 INFO:tasks.workunit.client.0.smithi191.stdout: api_misc: [ RUN ] LibRadosMiscConnectFailure.ConnectFailure 2022-06-08T01:26:00.305 INFO:tasks.workunit.client.0.smithi191.stdout: api_misc: /build/ceph-17.0.0-12873-g425f1005/src/test/librados/misc.cc:61: Failure 2022-06-08T01:26:00.305 INFO:tasks.workunit.client.0.smithi191.stdout: api_misc: Expected equality of these values: 2022-06-08T01:26:00.305 INFO:tasks.workunit.client.0.smithi191.stdout: api_misc: 0 2022-06-08T01:26:00.306 INFO:tasks.workunit.client.0.smithi191.stdout: api_misc: rados_conf_set(cluster, "client_mount_timeout", "0.000000001") 2022-06-08T01:26:00.306 INFO:tasks.workunit.client.0.smithi191.stdout: api_misc: Which is: -22 2022-06-08T01:26:00.307 INFO:tasks.workunit.client.0.smithi191.stdout: api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure (34 ms)
Updated by Neha Ojha almost 2 years ago
- Assignee set to Laura Flores
Laura, can you please triage this bug?
Updated by Laura Flores almost 2 years ago
- [BAD] 3bf1e368cf1b1780854250309eb051cea308c327: http://pulpito.front.sepia.ceph.com/lflores-2022-06-08_19:32:10-rados:monthrash-main-distro-default-smithi/
- [GOOD] 2469ae8b315b4d112fe4cb3bb7b6589f69c7ee5c: http://pulpito.front.sepia.ceph.com/lflores-2022-06-08_19:56:02-rados:monthrash-main-distro-default-smithi/
- [UNRELATED FAIL] a74fa9a66fe2f0aceffb9b297fc1560a1da19ca5: http://pulpito.front.sepia.ceph.com/lflores-2022-06-08_19:58:07-rados:monthrash-main-distro-default-smithi/
- [GOOD] 52f341b00a5dc4ff54351a75605d2f7dfe9b6033: http://pulpito.front.sepia.ceph.com/lflores-2022-06-08_19:59:38-rados:monthrash-main-distro-default-smithi/
Currently sifting through the commits between 1 and 2, as that seems to be where the potential regression was introduced.
Updated by Laura Flores almost 2 years ago
The Tracker reports a problem with `client_mount_timeout`. Here are all of the changes that were made to the Client code between the good and bad commit:
[lflores@fedora ceph]$ git log --pretty=oneline --no-merges 2469ae8b315b4d112fe4cb3bb7b6589f69c7ee5c..3bf1e368cf1b1780854250309eb051cea308c327 src/client
c6cb986a2eea10b8ef15d1c6539873a88c5a69aa mds, client: remove useless feature required code
983b10506dc8466a0e47ff0d320d480dd09999ec client: Inode::hold_caps_until is time from monotonic clock now.
2e1f43c99b1818c2ffde64f5b01083c1907a9f87 (ci/wip-khiremat-46078-fuse-directory-dacs-override-1) client/fuse: Fix directory DACs overriding for root
a451a3670b7bb783ca6dcb8b2a31a8e6ec396899 client: allow overwrites to files with size greater than the max_file_size cfg
aabd5e9c578c2c6da9542bcb935bc36678503359 client: fix possible inifinite loop when getting an ESTALE from MDS
The only commit that changes `client_mount_timeout` is https://github.com/ceph/ceph/pull/44247/commits/983b10506dc8466a0e47ff0d320d480dd09999ec.
Updated by Laura Flores almost 2 years ago
The unit for `client_mount_timeout` was changed from float to seconds in that commit, so the LibRadosMiscConnectFailure.ConnectFailure should be modified to reflect this change.
Updated by Laura Flores almost 2 years ago
- Assignee changed from Laura Flores to Venky Shankar
@Venky please reassign as needed
Updated by Laura Flores almost 2 years ago
What I've found is that the seconds unit expects only integers. So with the change from `float` -> `secs`, it is no longer possible to set the client_mount_timeout to anything other than an integer.
[lflores@folio01 build]$ ./bin/ceph config set client client_mount_timeout 0.1
Error EINVAL: error parsing value: unexpected trailing '.1'
Updated by Laura Flores almost 2 years ago
This can be reproduced locally on the most up-to-date version of main with:
cd ceph/build
ninja ceph_test_rados_api_misc
./bin/ceph_test_rados_api_misc --gtest_filter=*LibRadosMiscConnectFailure*
Updated by Laura Flores almost 2 years ago
- Pull request ID set to 46604
I have opened a possible fix, but I don't think we can achieve the same equality with the `client_mount_timeout` values since the new seconds unit does not allow for float values. Setting the `client_mount_timeout` to 1 second passed all of my local tests, but this fix should be reviewed by the Core and CephFS team to see if it would make sense.
Updated by Patrick Donnelly almost 2 years ago
- Status changed from New to Fix Under Review
- Assignee changed from Venky Shankar to Laura Flores
- Target version set to v18.0.0
- Source set to Q/A
- Backport set to quincy,pacific
Updated by Laura Flores almost 2 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Backport Bot almost 2 years ago
- Copied to Backport #56004: quincy: LibRadosMiscConnectFailure.ConnectFailure test failure added
Updated by Backport Bot almost 2 years ago
- Copied to Backport #56005: pacific: LibRadosMiscConnectFailure.ConnectFailure test failure added
Updated by Laura Flores almost 2 years ago
- Status changed from Pending Backport to Resolved