Bug #55971
LibRadosMiscConnectFailure.ConnectFailure test failure
0%
Description
All rados_api_tests in the run failed due to the same reason. This points to a regression.
One of the recent runs (http://pulpito.front.sepia.ceph.com/yuriw-2022-06-03_14:09:08-rados-wip-yuri7-testing-2022-06-02-1633-distro-default-smithi/) didn't have this failure.
/a/yuriw-2022-06-07_23:12:59-rados-wip-yuri7-testing-2022-06-07-1325-distro-default-smithi/6867183
/a/yuriw-2022-06-07_23:12:59-rados-wip-yuri7-testing-2022-06-07-1325-distro-default-smithi/6867203
/a/yuriw-2022-06-07_23:12:59-rados-wip-yuri7-testing-2022-06-07-1325-distro-default-smithi/6867215
/a/yuriw-2022-06-07_23:12:59-rados-wip-yuri7-testing-2022-06-07-1325-distro-default-smithi/6867253
/a/yuriw-2022-06-07_23:12:59-rados-wip-yuri7-testing-2022-06-07-1325-distro-default-smithi/6867284
/a/yuriw-2022-06-07_23:12:59-rados-wip-yuri7-testing-2022-06-07-1325-distro-default-smithi/6867374
/a/yuriw-2022-06-07_23:12:59-rados-wip-yuri7-testing-2022-06-07-1325-distro-default-smithi/6867409
/a/yuriw-2022-06-07_23:12:59-rados-wip-yuri7-testing-2022-06-07-1325-distro-default-smithi/6867412
Failure reason:
Log snippet from /a//a/yuriw-2022-06-07_23:12:59-rados-wip-yuri7-testing-2022-06-07-1325-distro-default-smithi/6867183
2022-06-08T01:26:00.304 INFO:tasks.workunit.client.0.smithi191.stdout: api_misc: [ RUN ] LibRadosMiscConnectFailure.ConnectFailure 2022-06-08T01:26:00.305 INFO:tasks.workunit.client.0.smithi191.stdout: api_misc: /build/ceph-17.0.0-12873-g425f1005/src/test/librados/misc.cc:61: Failure 2022-06-08T01:26:00.305 INFO:tasks.workunit.client.0.smithi191.stdout: api_misc: Expected equality of these values: 2022-06-08T01:26:00.305 INFO:tasks.workunit.client.0.smithi191.stdout: api_misc: 0 2022-06-08T01:26:00.306 INFO:tasks.workunit.client.0.smithi191.stdout: api_misc: rados_conf_set(cluster, "client_mount_timeout", "0.000000001") 2022-06-08T01:26:00.306 INFO:tasks.workunit.client.0.smithi191.stdout: api_misc: Which is: -22 2022-06-08T01:26:00.307 INFO:tasks.workunit.client.0.smithi191.stdout: api_misc: [ FAILED ] LibRadosMiscConnectFailure.ConnectFailure (34 ms)
Related issues
History
#1 Updated by Neha Ojha almost 2 years ago
- Assignee set to Laura Flores
Laura, can you please triage this bug?
#2 Updated by Laura Flores almost 2 years ago
- [BAD] 3bf1e368cf1b1780854250309eb051cea308c327: http://pulpito.front.sepia.ceph.com/lflores-2022-06-08_19:32:10-rados:monthrash-main-distro-default-smithi/
- [GOOD] 2469ae8b315b4d112fe4cb3bb7b6589f69c7ee5c: http://pulpito.front.sepia.ceph.com/lflores-2022-06-08_19:56:02-rados:monthrash-main-distro-default-smithi/
- [UNRELATED FAIL] a74fa9a66fe2f0aceffb9b297fc1560a1da19ca5: http://pulpito.front.sepia.ceph.com/lflores-2022-06-08_19:58:07-rados:monthrash-main-distro-default-smithi/
- [GOOD] 52f341b00a5dc4ff54351a75605d2f7dfe9b6033: http://pulpito.front.sepia.ceph.com/lflores-2022-06-08_19:59:38-rados:monthrash-main-distro-default-smithi/
Currently sifting through the commits between 1 and 2, as that seems to be where the potential regression was introduced.
#3 Updated by Laura Flores almost 2 years ago
The Tracker reports a problem with `client_mount_timeout`. Here are all of the changes that were made to the Client code between the good and bad commit:
[lflores@fedora ceph]$ git log --pretty=oneline --no-merges 2469ae8b315b4d112fe4cb3bb7b6589f69c7ee5c..3bf1e368cf1b1780854250309eb051cea308c327 src/client
c6cb986a2eea10b8ef15d1c6539873a88c5a69aa mds, client: remove useless feature required code
983b10506dc8466a0e47ff0d320d480dd09999ec client: Inode::hold_caps_until is time from monotonic clock now.
2e1f43c99b1818c2ffde64f5b01083c1907a9f87 (ci/wip-khiremat-46078-fuse-directory-dacs-override-1) client/fuse: Fix directory DACs overriding for root
a451a3670b7bb783ca6dcb8b2a31a8e6ec396899 client: allow overwrites to files with size greater than the max_file_size cfg
aabd5e9c578c2c6da9542bcb935bc36678503359 client: fix possible inifinite loop when getting an ESTALE from MDS
The only commit that changes `client_mount_timeout` is https://github.com/ceph/ceph/pull/44247/commits/983b10506dc8466a0e47ff0d320d480dd09999ec.
#4 Updated by Laura Flores almost 2 years ago
- Regression changed from No to Yes
#5 Updated by Laura Flores almost 2 years ago
The unit for `client_mount_timeout` was changed from float to seconds in that commit, so the LibRadosMiscConnectFailure.ConnectFailure should be modified to reflect this change.
#6 Updated by Laura Flores almost 2 years ago
- Project changed from RADOS to CephFS
#7 Updated by Laura Flores almost 2 years ago
- Assignee changed from Laura Flores to Venky Shankar
@Venky please reassign as needed
#8 Updated by Laura Flores almost 2 years ago
What I've found is that the seconds unit expects only integers. So with the change from `float` -> `secs`, it is no longer possible to set the client_mount_timeout to anything other than an integer.
[lflores@folio01 build]$ ./bin/ceph config set client client_mount_timeout 0.1
Error EINVAL: error parsing value: unexpected trailing '.1'
#10 Updated by Laura Flores almost 2 years ago
This can be reproduced locally on the most up-to-date version of main with:
cd ceph/build
ninja ceph_test_rados_api_misc
./bin/ceph_test_rados_api_misc --gtest_filter=*LibRadosMiscConnectFailure*
#11 Updated by Laura Flores almost 2 years ago
- Pull request ID set to 46604
I have opened a possible fix, but I don't think we can achieve the same equality with the `client_mount_timeout` values since the new seconds unit does not allow for float values. Setting the `client_mount_timeout` to 1 second passed all of my local tests, but this fix should be reviewed by the Core and CephFS team to see if it would make sense.
#12 Updated by Patrick Donnelly almost 2 years ago
- Status changed from New to Fix Under Review
- Assignee changed from Venky Shankar to Laura Flores
- Target version set to v18.0.0
- Source set to Q/A
- Backport set to quincy,pacific
#14 Updated by Laura Flores almost 2 years ago
- Status changed from Fix Under Review to Pending Backport
#15 Updated by Backport Bot almost 2 years ago
- Copied to Backport #56004: quincy: LibRadosMiscConnectFailure.ConnectFailure test failure added
#16 Updated by Backport Bot almost 2 years ago
- Copied to Backport #56005: pacific: LibRadosMiscConnectFailure.ConnectFailure test failure added
#17 Updated by Laura Flores over 1 year ago
- Status changed from Pending Backport to Resolved