Bug #59599
osd: cls_refcount unit test failures during upgrade sequence
0%
Description
/a/sseshasa-2023-05-01_18:57:15-rados-wip-sseshasa2-testing-2023-05-01-2153-quincy-distro-default-smithi/7259891
Historically also seen in:
/a/teuthology-2023-04-21_14:23:02-upgrade:pacific-x-quincy-distro-default-smithi/7247734
2023-05-01T19:56:01.877 INFO:tasks.workunit.client.0.smithi125.stdout:/build/ceph-16.2.12-83-g2b02306b/src/test/cls_refcount/test_cls_refcount.cc:417: Failure 2023-05-01T19:56:01.877 INFO:tasks.workunit.client.0.smithi125.stdout:Expected equality of these values: 2023-05-01T19:56:01.877 INFO:tasks.workunit.client.0.smithi125.stdout: -2 2023-05-01T19:56:01.877 INFO:tasks.workunit.client.0.smithi125.stdout: ioctx.operate(oid, op) 2023-05-01T19:56:01.878 INFO:tasks.workunit.client.0.smithi125.stdout: Which is: 0 2023-05-01T19:56:01.879 INFO:tasks.workunit.client.0.smithi125.stdout:[ FAILED ] cls_rgw.test_implicit_ec (4304 ms) ... 2023-05-01T19:56:05.632 INFO:tasks.workunit.client.0.smithi125.stdout:/build/ceph-16.2.12-83-g2b02306b/src/test/cls_refcount/test_cls_refcount.cc:518: Failure 2023-05-01T19:56:05.633 INFO:tasks.workunit.client.0.smithi125.stdout:Expected equality of these values: 2023-05-01T19:56:05.633 INFO:tasks.workunit.client.0.smithi125.stdout: -2 2023-05-01T19:56:05.633 INFO:tasks.workunit.client.0.smithi125.stdout: ioctx.operate(oid, op) 2023-05-01T19:56:05.633 INFO:tasks.workunit.client.0.smithi125.stdout: Which is: 0 2023-05-01T19:56:05.634 INFO:tasks.workunit.client.0.smithi125.stdout:[ FAILED ] cls_rgw.test_implicit_idempotent_ec (3755 ms) ... 2023-05-01T19:56:32.081 INFO:tasks.workunit.client.0.smithi125.stdout:[ FAILED ] 2 tests, listed below: 2023-05-01T19:56:32.081 INFO:tasks.workunit.client.0.smithi125.stdout:[ FAILED ] cls_rgw.test_implicit_ec 2023-05-01T19:56:32.081 INFO:tasks.workunit.client.0.smithi125.stdout:[ FAILED ] cls_rgw.test_implicit_idempotent_ec 2023-05-01T19:56:32.081 INFO:tasks.workunit.client.0.smithi125.stdout: 2023-05-01T19:56:32.081 INFO:tasks.workunit.client.0.smithi125.stdout: 2 FAILED TESTS
History
#1 Updated by Sridhar Seshasayee 11 months ago
/a/sseshasa-2023-05-01_18:57:15-rados-wip-sseshasa2-testing-2023-05-01-2153-quincy-distro-default-smithi/7259884
#2 Updated by Yuri Weinstein 11 months ago
See also here:
2023-05-05T15:30:03.114 INFO:tasks.workunit.client.0.smithi046.stdout:[----------] Global test environment tear-down 2023-05-05T15:30:03.114 INFO:tasks.workunit.client.0.smithi046.stdout:[==========] 10 tests from 1 test suite ran. (55944 ms total) 2023-05-05T15:30:03.115 INFO:tasks.workunit.client.0.smithi046.stdout:[ PASSED ] 8 tests. 2023-05-05T15:30:03.115 INFO:tasks.workunit.client.0.smithi046.stdout:[ FAILED ] 2 tests, listed below: 2023-05-05T15:30:03.115 INFO:tasks.workunit.client.0.smithi046.stdout:[ FAILED ] cls_rgw.test_implicit_ec 2023-05-05T15:30:03.115 INFO:tasks.workunit.client.0.smithi046.stdout:[ FAILED ] cls_rgw.test_implicit_idempotent_ec 2023-05-05T15:30:03.115 INFO:tasks.workunit.client.0.smithi046.stdout: 2023-05-05T15:30:03.115 INFO:tasks.workunit.client.0.smithi046.stdout: 2 FAILED TESTS
#3 Updated by Laura Flores 10 months ago
/a/yuriw-2023-05-22_15:26:04-rados-wip-yuri10-testing-2023-05-18-0815-quincy-distro-default-smithi/7282680
#4 Updated by Laura Flores 10 months ago
- Project changed from Ceph to rgw
#5 Updated by Laura Flores 10 months ago
- Tags set to test-failure
- Backport set to quincy
#6 Updated by Casey Bodley 10 months ago
- Project changed from rgw to RADOS
i don't see any significant changes to this refcount object class in a long time. the test_implicit_ec
test case does the same thing as the passing test_implicit
test case does, except against an erasure-coded pool. cls_refcount_get()
returns the expected -ENOENT
against a replicated pool, but returns 0
against an erasure-coded pool
this difference in behavior must be at the rados level, not in rgw or cls_refcount
#7 Updated by Radoslaw Zarzynski 10 months ago
- Assignee set to Nitzan Mordechai
Hello Nitzan! Could it be related to https://github.com/ceph/ceph/pull/47332?
#8 Updated by Laura Flores 10 months ago
/a/yuriw-2023-05-25_14:52:58-rados-wip-yuri3-testing-2023-05-24-1136-quincy-distro-default-smithi/7286576
#9 Updated by Nitzan Mordechai 10 months ago
That behavior only happens with upgrade, i'm looking into it. But that error only occurs when the code that i added in PrimarylogPG is not invoke (from the other side, if the code is not there, the test shouldn't be there either..)
#10 Updated by Nitzan Mordechai 10 months ago
- Status changed from New to In Progress
#11 Updated by Nitzan Mordechai 10 months ago
I think that i got it right - its pretty weird (for me) but thats what i found -
All the tests that failed in that bug report are upgrade tests, the test that failed is - cls_rgw.test_implicit_idempotent_ec that related to PRs:
main - https://github.com/ceph/ceph/pull/47332 (Merged)
quincy - https://github.com/ceph/ceph/pull/49936 (Open.. not Merged yet)
Pacific - https://github.com/ceph/ceph/pull/49937 (Merged)
The test install first Pacific and Upgrade to quincy - then run some tests - one of them is CLS tests with the failing test.
- workunit: branch: pacific clients: client.0: - cls
We are running workunit from pacific that have the test_implicit_idempotent_ec, but the cluster is running quincy that doesn't have the code in PrimaryLogPG that handle that fail and that causing the test to fail.
i wonder, why are we running pacific tests on quincy ? don't we want to test quincy workunit to make sure new features and tests went in after the upgrade? (in that case its backwords, but for normal situation when our upgraded version will probably have new tests or features)
let's wait https://github.com/ceph/ceph/pull/49936 to be merged and re-run that upgrade.
#12 Updated by Radoslaw Zarzynski 10 months ago
- Status changed from In Progress to Resolved
- Pull request ID set to 47332
Backporting has been done manually, without tracker tickets.