Bug #53219
openLibRadosTwoPoolsPP.ManifestRollbackRefcount failure
0%
Description
2021-11-10T17:58:05.520 INFO:tasks.workunit.client.0.smithi053.stdout: api_tier_pp: [ RUN ] LibRadosTwoPoolsPP.ManifestRollbackRefcount 2021-11-10T17:58:05.520 INFO:tasks.workunit.client.0.smithi053.stdout: api_tier_pp: require_osd_release = quincy 2021-11-10T17:58:05.520 INFO:tasks.workunit.client.0.smithi053.stdout: api_tier_pp: /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-8935-ged18f0aa/rpm/el8/BUILD/ceph-17.0.0-8935-ged18f0aa/src/test/librados/tier_cxx.cc:193: Failure 2021-11-10T17:58:05.520 INFO:tasks.workunit.client.0.smithi053.stdout: api_tier_pp: Value of: src_refcount == expected_refcount 2021-11-10T17:58:05.521 INFO:tasks.workunit.client.0.smithi053.stdout: api_tier_pp: Actual: false 2021-11-10T17:58:05.521 INFO:tasks.workunit.client.0.smithi053.stdout: api_tier_pp: Expected: true 2021-11-10T17:58:05.521 INFO:tasks.workunit.client.0.smithi053.stdout: api_tier_pp: [ FAILED ] LibRadosTwoPoolsPP.ManifestRollbackRefcount (425014 ms)
/a/sage-2021-11-10_16:44:53-orch:cephadm-master-distro-basic-smithi/6496428
description: orch:cephadm/thrash/{0-distro/centos_8.2_container_tools_3.0 1-start
2-thrash 3-tasks/rados_api_tests fixed-2 msgr/async root}
Updated by Sage Weil over 2 years ago
- Project changed from Ceph to RADOS
- Category set to Tiering
Updated by Myoungwon Oh over 2 years ago
I think this is the same issue as https://tracker.ceph.com/issues/52872.
Recovery takes almost 8 minutes even if current retry duration is 5 minutes.
Updated by Neha Ojha over 2 years ago
Myoungwon Oh wrote:
I think this is the same issue as https://tracker.ceph.com/issues/52872.
Recovery takes almost 8 minutes even if current retry duration is 5 minutes.
Any idea why the recovery time keeps increasing, in https://github.com/ceph/ceph/pull/43493 you mentioned 6 minutes. And how do we make sure that increase in retry duration is enough?
Updated by Neha Ojha over 2 years ago
- Related to Bug #52872: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure. added
Updated by Myoungwon Oh over 2 years ago
Calculating reference count on manifest snapshotted object requires correct refcount information. So, current unittests wait for the recovery if the object is degraded. I don't have an idea to remove retry timeout here because recovery time differs in the cluster state. Let me think about it.