Project

General

Profile

Bug #53219

LibRadosTwoPoolsPP.ManifestRollbackRefcount failure

Added by Sage Weil 3 months ago. Updated about 19 hours ago.

Status:
New
Priority:
Normal
Assignee:
Category:
Tiering
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-11-10T17:58:05.520 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp: [ RUN      ] LibRadosTwoPoolsPP.ManifestRollbackRefcount
2021-11-10T17:58:05.520 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp: require_osd_release = quincy
2021-11-10T17:58:05.520 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp: /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-8935-ged18f0aa/rpm/el8/BUILD/ceph-17.0.0-8935-ged18f0aa/src/test/librados/tier_cxx.cc:193: Failure
2021-11-10T17:58:05.520 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp: Value of: src_refcount == expected_refcount
2021-11-10T17:58:05.521 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp:   Actual: false
2021-11-10T17:58:05.521 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp: Expected: true
2021-11-10T17:58:05.521 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp: [  FAILED  ] LibRadosTwoPoolsPP.ManifestRollbackRefcount (425014 ms)

/a/sage-2021-11-10_16:44:53-orch:cephadm-master-distro-basic-smithi/6496428
description: orch:cephadm/thrash/{0-distro/centos_8.2_container_tools_3.0 1-start
2-thrash 3-tasks/rados_api_tests fixed-2 msgr/async root}

Related issues

Related to RADOS - Bug #52872: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure. Pending Backport

History

#1 Updated by Sage Weil 3 months ago

  • Project changed from Ceph to RADOS
  • Category set to Tiering

#2 Updated by Myoungwon Oh 3 months ago

I'll take a look

#3 Updated by Neha Ojha 3 months ago

  • Assignee set to Myoungwon Oh

#4 Updated by Myoungwon Oh 2 months ago

I think this is the same issue as https://tracker.ceph.com/issues/52872.
Recovery takes almost 8 minutes even if current retry duration is 5 minutes.

#5 Updated by Neha Ojha 2 months ago

Myoungwon Oh wrote:

I think this is the same issue as https://tracker.ceph.com/issues/52872.
Recovery takes almost 8 minutes even if current retry duration is 5 minutes.

Any idea why the recovery time keeps increasing, in https://github.com/ceph/ceph/pull/43493 you mentioned 6 minutes. And how do we make sure that increase in retry duration is enough?

#6 Updated by Neha Ojha 2 months ago

  • Related to Bug #52872: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure. added

#7 Updated by Myoungwon Oh 2 months ago

Calculating reference count on manifest snapshotted object requires correct refcount information. So, current unittests wait for the recovery if the object is degraded. I don't have an idea to remove retry timeout here because recovery time differs in the cluster state. Let me think about it.

#8 Updated by Neha Ojha about 19 hours ago

  • Priority changed from High to Normal

Also available in: Atom PDF