Bug #55853
closedtest_cls_rgw.sh: failures in 'cls_rgw.index_list' and 'cls_rgw.index_list_delimited`
0%
Description
/a/yuriw-2022-06-03_14:09:08-rados-wip-yuri7-testing-2022-06-02-1633-distro-default-smithi/6862540
2022-06-03T15:45:00.767 INFO:tasks.workunit.client.0.smithi033.stdout:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.9-84-g40883f98/rpm/el8/BUILD/ceph-16.2.9-84-g40883f98/src/test/cls_rgw/test_cls_rgw.cc:439: Failure
2022-06-03T15:45:00.832 INFO:tasks.workunit.client.0.smithi033.stdout:Expected equality of these values:
2022-06-03T15:45:00.833 INFO:tasks.workunit.client.0.smithi033.stdout: 4u
2022-06-03T15:45:00.833 INFO:tasks.workunit.client.0.smithi033.stdout: Which is: 4
2022-06-03T15:45:00.833 INFO:tasks.workunit.client.0.smithi033.stdout: m.size()
2022-06-03T15:45:00.833 INFO:tasks.workunit.client.0.smithi033.stdout: Which is: 0
2022-06-03T15:45:00.833 INFO:tasks.workunit.client.0.smithi033.stdout:[ FAILED ] cls_rgw.index_list (27 ms)
2022-06-03T15:45:00.833 INFO:tasks.workunit.client.0.smithi033.stdout:[ RUN ] cls_rgw.index_list_delimited
...
2022-06-03T15:45:41.035 INFO:tasks.workunit.client.0.smithi033.stdout:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.9-84-g40883f98/rpm/el8/BUILD/ceph-16.2.9-84-g40883f98/src/test/cls_rgw/test_cls_rgw.cc:524: Failure
2022-06-03T15:45:41.036 INFO:tasks.workunit.client.0.smithi033.stdout:Expected equality of these values:
2022-06-03T15:45:41.036 INFO:tasks.workunit.client.0.smithi033.stdout: 48u
2022-06-03T15:45:41.036 INFO:tasks.workunit.client.0.smithi033.stdout: Which is: 48
2022-06-03T15:45:41.036 INFO:tasks.workunit.client.0.smithi033.stdout: id_entry_map.size()
2022-06-03T15:45:41.037 INFO:tasks.workunit.client.0.smithi033.stdout: Which is: 0
2022-06-03T15:45:41.037 INFO:tasks.workunit.client.0.smithi033.stdout:We should get 40 top-level entries and the tops of 8 "subdirectories".
2022-06-03T15:45:41.037 INFO:tasks.workunit.client.0.smithi033.stdout:[ FAILED ] cls_rgw.index_list_delimited (40269 ms)
Updated by Casey Bodley almost 2 years ago
- Status changed from New to Need More Info
was this run based on a recent main branch, or something else?
we're not seeing failures in the rgw suite. is it possible that the branch being tested had omap listing regressions in it?
Updated by Laura Flores almost 2 years ago
Casey Bodley wrote:
was this run based on a recent main branch, or something else?
we're not seeing failures in the rgw suite. is it possible that the branch being tested had omap listing regressions in it?
Yes, this was based on a recent main branch. The testing trello card is here: https://trello.com/c/MaWPkMXi/1544-wip-yuri7-testing-2022-06-02-1633
- Edited to add that this was one of Yuri's test branches based on main. There was only one PR added to it though (linked in the Trello card) that would have had no influence on this failure, as it was a change to the telemetry module.
Updated by Laura Flores almost 2 years ago
/a/yuriw-2022-06-09_22:06:32-rados-wip-yuri3-testing-2022-06-09-1314-distro-default-smithi/6871566
/a/yuriw-2022-06-09_22:06:32-rados-wip-yuri3-testing-2022-06-09-1314-distro-default-smithi/6871409
Updated by Laura Flores almost 2 years ago
earliest bad run: http://pulpito.front.sepia.ceph.com/yuriw-2022-06-03_14:09:08-rados-wip-yuri7-testing-2022-06-02-1633-distro-default-smithi/
last good run: http://pulpito.front.sepia.ceph.com/yuriw-2022-06-01_23:19:00-rados-wip-yuri8-testing-2022-06-01-1114-distro-default-smithi/
Was not able to reproduce this locally (this was done on the most updated version of main):
ninja vstart -j$(nproc)
ninja -j$(nproc) ceph_test_cls_rgw
RGW=2 ../src/vstart.sh --debug --new -x --localhost --bluestore
./bin/ceph_test_cls_rgw --gtest_filter=*index_list*
Running main() from gmock_main.cc
Note: Google Test filter = *index_list*
[==========] Running 2 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 2 tests from cls_rgw
[ RUN ] cls_rgw.index_list
[ OK ] cls_rgw.index_list (32 ms)
[ RUN ] cls_rgw.index_list_delimited
[ OK ] cls_rgw.index_list_delimited (35440 ms)
[----------] 2 tests from cls_rgw (35472 ms total)
[----------] Global test environment tear-down
[==========] 2 tests from 1 test suite ran. (37563 ms total)
[ PASSED ] 2 tests.
Updated by Laura Flores almost 2 years ago
/a/yuriw-2022-06-13_16:36:31-rados-wip-yuri7-testing-2022-06-13-0706-distro-default-smithi/6876615
Updated by Laura Flores almost 2 years ago
Seeing if I can reproduce this, and/or if it happens every time: http://pulpito.front.sepia.ceph.com/lflores-2022-06-14_23:16:51-rados:upgrade:parallel-wip-yuri7-testing-2022-06-13-0706-distro-default-smithi/
Updated by Kamoltat (Junior) Sirivadhna almost 2 years ago
/a/yuriw-2022-06-23_14:17:25-rados-wip-yuri6-testing-2022-06-22-1419-distro-default-smithi/6894628
Updated by Kamoltat (Junior) Sirivadhna almost 2 years ago
/a/yuriw-2022-06-23_14:17:25-rados-wip-yuri6-testing-2022-06-22-1419-distro-default-smithi/6894633
Updated by Casey Bodley almost 2 years ago
it looks like this is coming from an upgrade test. can someone please identify the ceph versions of both this ceph_test_cls_rgw test and the osd(s) it's talking to?
Updated by Laura Flores almost 2 years ago
Kamoltat Sirivadhna wrote:
/a/yuriw-2022-06-23_14:17:25-rados-wip-yuri6-testing-2022-06-22-1419-distro-default-smithi/6894633
Based on this failed test:
2022-06-23T14:58:47.343 INFO:tasks.workunit.client.0.smithi162.stdout:[ RUN ] cls_rgw.index_suggest_complete
2022-06-23T14:58:47.346 INFO:tasks.workunit.client.0.smithi162.stdout:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.9-336-g3515edfe/rpm/el8/BUILD/ceph-16.2.9-336-g3515edfe/src/test/cls_rgw/test_cls_rgw.cc:406: Failure
2022-06-23T14:58:47.346 INFO:tasks.workunit.client.0.smithi162.stdout:Expected equality of these values:
2022-06-23T14:58:47.347 INFO:tasks.workunit.client.0.smithi162.stdout: 1
2022-06-23T14:58:47.347 INFO:tasks.workunit.client.0.smithi162.stdout: entries.size()
2022-06-23T14:58:47.347 INFO:tasks.workunit.client.0.smithi162.stdout: Which is: 0
2022-06-23T14:58:47.347 INFO:tasks.workunit.client.0.smithi162.stdout:[ FAILED ] cls_rgw.index_suggest_complete (4 ms)
Specifically this line:
2022-06-23T14:58:47.346 INFO:tasks.workunit.client.0.smithi162.stdout:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.9-336-g3515edfe/rpm/el8/BUILD/ceph-16.2.9-336-g3515edfe/src/test/cls_rgw/test_cls_rgw.cc:406: Failure
It looks like the tests are version 16.2.9.
A bit earlier in the teuthology log, we can see that the OSDs are already upgraded to 17.0.0:
2022-06-23T14:48:52.795 INFO:teuthology.orchestra.run.smithi162.stdout:{
2022-06-23T14:48:52.795 INFO:teuthology.orchestra.run.smithi162.stdout: "mon": {
2022-06-23T14:48:52.795 INFO:teuthology.orchestra.run.smithi162.stdout: "ceph version 17.0.0-13216-gfad4b1c2 (fad4b1c200ee6a758bd948f031903dd98c630b4c) quincy (dev)": 3
2022-06-23T14:48:52.795 INFO:teuthology.orchestra.run.smithi162.stdout: },
2022-06-23T14:48:52.796 INFO:teuthology.orchestra.run.smithi162.stdout: "mgr": {
2022-06-23T14:48:52.796 INFO:teuthology.orchestra.run.smithi162.stdout: "ceph version 17.0.0-13216-gfad4b1c2 (fad4b1c200ee6a758bd948f031903dd98c630b4c) quincy (dev)": 2
2022-06-23T14:48:52.796 INFO:teuthology.orchestra.run.smithi162.stdout: },
2022-06-23T14:48:52.796 INFO:teuthology.orchestra.run.smithi162.stdout: "osd": {
2022-06-23T14:48:52.797 INFO:teuthology.orchestra.run.smithi162.stdout: "ceph version 17.0.0-13216-gfad4b1c2 (fad4b1c200ee6a758bd948f031903dd98c630b4c) quincy (dev)": 8
2022-06-23T14:48:52.797 INFO:teuthology.orchestra.run.smithi162.stdout: },
2022-06-23T14:48:52.797 INFO:teuthology.orchestra.run.smithi162.stdout: "mds": {
2022-06-23T14:48:52.797 INFO:teuthology.orchestra.run.smithi162.stdout: "ceph version 17.0.0-13216-gfad4b1c2 (fad4b1c200ee6a758bd948f031903dd98c630b4c) quincy (dev)": 2
2022-06-23T14:48:52.797 INFO:teuthology.orchestra.run.smithi162.stdout: },
2022-06-23T14:48:52.798 INFO:teuthology.orchestra.run.smithi162.stdout: "overall": {
2022-06-23T14:48:52.798 INFO:teuthology.orchestra.run.smithi162.stdout: "ceph version 17.0.0-13216-gfad4b1c2 (fad4b1c200ee6a758bd948f031903dd98c630b4c) quincy (dev)": 15
2022-06-23T14:48:52.798 INFO:teuthology.orchestra.run.smithi162.stdout: }
2022-06-23T14:48:52.798 INFO:teuthology.orchestra.run.smithi162.stdout:}
The mismatched versions could be the issue.
The other recorded instances seem to be following the same pattern.
Updated by Kamoltat (Junior) Sirivadhna almost 2 years ago
/a/yuriw-2022-06-30_14:20:05-rados-wip-yuri3-testing-2022-06-28-1737-distro-default-smithi/[6907404, 6907413]
Updated by Kamoltat (Junior) Sirivadhna almost 2 years ago
/a/yuriw-2022-06-29_13:30:16-rados-wip-yuri3-testing-2022-06-28-1737-distro-default-smithi/6905612
Updated by Sridhar Seshasayee almost 2 years ago
/a/yuriw-2022-06-29_18:22:37-rados-wip-yuri2-testing-2022-06-29-0820-distro-default-smithi/6906109
/a/yuriw-2022-06-29_18:22:37-rados-wip-yuri2-testing-2022-06-29-0820-distro-default-smithi/6906268
Updated by Casey Bodley almost 2 years ago
- Status changed from Need More Info to New
- Assignee set to J. Eric Ivancich
Updated by Laura Flores almost 2 years ago
This time, index_suggest_complete failed in addition to the other two.
/a/nojha-2022-07-14_21:55:41-upgrade:pacific-x-snapshot_key_conversion-distro-default-smithi/6931111
2022-07-14T23:55:55.783 INFO:tasks.workunit.client.0.smithi186.stdout:/build/ceph-16.2.9-490-ge27cc18f/src/test/cls_rgw/test_cls_rgw.cc:406: Failure
2022-07-14T23:55:55.785 INFO:tasks.workunit.client.0.smithi186.stdout:Expected equality of these values:
2022-07-14T23:55:55.785 INFO:tasks.workunit.client.0.smithi186.stdout: 1
2022-07-14T23:55:55.785 INFO:tasks.workunit.client.0.smithi186.stdout: entries.size()
2022-07-14T23:55:55.785 INFO:tasks.workunit.client.0.smithi186.stdout: Which is: 0
2022-07-14T23:55:55.786 INFO:tasks.workunit.client.0.smithi186.stdout:[ FAILED ] cls_rgw.index_suggest_complete (6 ms)
Updated by Aishwarya Mathuria almost 2 years ago
/a/yuriw-2022-07-13_19:41:18-rados-wip-yuri7-testing-2022-07-11-1631-distro-default-smithi/6929404
index_suggest_complete failed here too
2022-07-14T04:31:35.586 INFO:tasks.workunit.client.0.smithi163.stdout:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.9-488-gc5e2739f/rpm/el8/BUILD/ceph-16.2.9-488-gc5e2739f/src/test/cls_rgw/test_cls_rgw.cc:406: Failure 2022-07-14T04:31:35.587 INFO:tasks.workunit.client.0.smithi163.stdout:Expected equality of these values: 2022-07-14T04:31:35.587 INFO:tasks.workunit.client.0.smithi163.stdout: 1 2022-07-14T04:31:35.587 INFO:tasks.workunit.client.0.smithi163.stdout: entries.size() 2022-07-14T04:31:35.588 INFO:tasks.workunit.client.0.smithi163.stdout: Which is: 0 2022-07-14T04:31:35.588 INFO:tasks.workunit.client.0.smithi163.stdout:[ FAILED ] cls_rgw.index_suggest_complete (3 ms) 2022-07-14T04:31:35.588 INFO:tasks.workunit.client.0.smithi163.stdout:[ RUN ] cls_rgw.index_list 2022-07-14T04:31:35.623 INFO:tasks.workunit.client.0.smithi163.stdout:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.9-488-gc5e2739f/rpm/el8/BUILD/ceph-16.2.9-488-gc5e2739f/src/test/cls_rgw/test_cls_rgw.cc:504: Failure 2022-07-14T04:31:35.623 INFO:tasks.workunit.client.0.smithi163.stdout:Expected equality of these values: 2022-07-14T04:31:35.624 INFO:tasks.workunit.client.0.smithi163.stdout: 4u 2022-07-14T04:31:35.624 INFO:tasks.workunit.client.0.smithi163.stdout: Which is: 4 2022-07-14T04:31:35.625 INFO:tasks.workunit.client.0.smithi163.stdout: m.size() 2022-07-14T04:31:35.625 INFO:tasks.workunit.client.0.smithi163.stdout: Which is: 0 2022-07-14T04:31:35.626 INFO:tasks.workunit.client.0.smithi163.stdout:[ FAILED ] cls_rgw.index_list (21 ms) 2022-07-14T04:32:16.633 INFO:tasks.workunit.client.0.smithi163.stdout:/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.9-488-gc5e2739f/rpm/el8/BUILD/ceph-16.2.9-488-gc5e2739f/src/test/cls_rgw/test_cls_rgw.cc:589: Failure 2022-07-14T04:32:16.634 INFO:tasks.workunit.client.0.smithi163.stdout:Expected equality of these values: 2022-07-14T04:32:16.635 INFO:tasks.workunit.client.0.smithi163.stdout: 48u 2022-07-14T04:32:16.635 INFO:tasks.workunit.client.0.smithi163.stdout: Which is: 48 2022-07-14T04:32:16.636 INFO:tasks.workunit.client.0.smithi163.stdout: id_entry_map.size() 2022-07-14T04:32:16.636 INFO:tasks.workunit.client.0.smithi163.stdout: Which is: 0 2022-07-14T04:32:16.636 INFO:tasks.workunit.client.0.smithi163.stdout:We should get 40 top-level entries and the tops of 8 "subdirectories". 2022-07-14T04:32:16.637 INFO:tasks.workunit.client.0.smithi163.stdout:[ FAILED ] cls_rgw.index_list_delimited (41026 ms)
Updated by Kamoltat (Junior) Sirivadhna almost 2 years ago
/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6943905
Updated by Kamoltat (Junior) Sirivadhna almost 2 years ago
/a/yuriw-2022-07-22_03:30:40-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6944371/
Updated by Kamoltat (Junior) Sirivadhna over 1 year ago
/a/yuriw-2022-07-24_15:38:21-rados-wip-yuri3-testing-2022-07-21-1604-distro-default-smithi/6946367/
Updated by Casey Bodley over 1 year ago
Laura Flores wrote:
It looks like the tests are version 16.2.9.
A bit earlier in the teuthology log, we can see that the OSDs are already upgraded to 17.0.0:
[...]The mismatched versions could be the issue.
thanks Laura. when we fix bugs in cls/rgw, we update ceph_test_cls_rgw to test that updated behavior. so in general, we just can't expect test cases from one release to pass against another release
i looked into these upgrade suites to see how they're running ceph_test_cls_rgw, and it seems to be packaged with other tests in the 'cls' workunit:
we shouldn't be trying to run these during the upgrade; instead, we might run them before the upgrade, then again after
Updated by Laura Flores over 1 year ago
@Casey, I see what you mean. The issue here does seem to be that the RGW workload is running during the upgrade, which is causing the version mismatch problem. I checked all upgrade/parallel tests though, even back on stable branches, and it seems like we have always run workloads in parallel with the upgrade sequence. This pattern never changes among upgrade/parallel tests:
https://github.com/ceph/ceph/blob/main/qa/suites/upgrade/pacific-x/parallel/1-tasks.yaml
- print: "**** done start parallel"
- parallel:
- workload
- upgrade-sequence
- print: "**** done end parallel"
This implies to me that the workloads are supposed to be work when run during the upgrade sequence. The only way around it that I can see is by setting up a sequential task, such as:
- print: "**** done start parallel"
- sequential:
- workload
- upgrade-sequence
- print: "**** done end parallel"
But that would defeat the purpose of the parallel test.
Updated by Casey Bodley over 1 year ago
sharing what i could find about the history here:
the cls_rgw.index_suggest_complete test was added for https://tracker.ceph.com/issues/54528, which has been backported to octopus. it isn't clear why that would fail, unless we were running a pacific version of ceph_test_cls_rgw against an octopus osd before that octopus backport was applied
cls_rgw.index_list_delimited was added for https://tracker.ceph.com/issues/41051, which i believe merged before octopus. i don't see any significant changes to
to cls_rgw.index_list there. unclear why either test would fail in upgrade suites
Updated by Laura Flores over 1 year ago
@Casey if it would help, here is the last good run I could find, and the earliest bad run:
last good run:- http://pulpito.front.sepia.ceph.com/yuriw-2022-06-01_23:19:00-rados-wip-yuri8-testing-2022-06-01-1114-distro-default-smithi/
- SHA: 513a3ce033e61b54e2727a6a27915fd798082922
- http://pulpito.front.sepia.ceph.com/yuriw-2022-06-03_14:09:08-rados-wip-yuri7-testing-2022-06-02-1633-distro-default-smithi/
- SHA: 9c982c6b65fc320a11d31aced63ff0af50067d91
Running `git log --pretty=oneline --no-merges 513a3ce033e61b54e2727a6a27915fd798082922..9c982c6b65fc320a11d31aced63ff0af50067d91 src/rgw` shows a lot of commits, but most (if not all) are coming from https://github.com/ceph/ceph/pull/39002. You'd know best, but this seems like it was maybe a big feature that was just intended for Quincy?
Point being, if this would be a minor change we could backport, it may make more sense to go that route. But if the change is coming from a large Quincy feature, it definitely wouldn't make sense to run the workload during the upgrade.
Updated by Ernesto Puerta over 1 year ago
- Translation missing: en.field_tag_list set to test-failure
Updated by Laura Flores over 1 year ago
/a/lflores-2022-08-25_17:56:48-rados-wip-yuri11-testing-2022-08-24-0658-distro-default-smithi/6993001
Updated by Casey Bodley about 1 year ago
- Status changed from New to Can't reproduce