Project

General

Profile

Actions

Bug #53219

open

LibRadosTwoPoolsPP.ManifestRollbackRefcount failure

Added by Sage Weil over 2 years ago. Updated about 2 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
Tiering
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-11-10T17:58:05.520 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp: [ RUN      ] LibRadosTwoPoolsPP.ManifestRollbackRefcount
2021-11-10T17:58:05.520 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp: require_osd_release = quincy
2021-11-10T17:58:05.520 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp: /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-8935-ged18f0aa/rpm/el8/BUILD/ceph-17.0.0-8935-ged18f0aa/src/test/librados/tier_cxx.cc:193: Failure
2021-11-10T17:58:05.520 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp: Value of: src_refcount == expected_refcount
2021-11-10T17:58:05.521 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp:   Actual: false
2021-11-10T17:58:05.521 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp: Expected: true
2021-11-10T17:58:05.521 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp: [  FAILED  ] LibRadosTwoPoolsPP.ManifestRollbackRefcount (425014 ms)

/a/sage-2021-11-10_16:44:53-orch:cephadm-master-distro-basic-smithi/6496428
description: orch:cephadm/thrash/{0-distro/centos_8.2_container_tools_3.0 1-start
2-thrash 3-tasks/rados_api_tests fixed-2 msgr/async root}

Related issues 1 (1 open0 closed)

Related to RADOS - Bug #52872: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure.Pending BackportMyoungwon Oh

Actions
Actions #1

Updated by Sage Weil over 2 years ago

  • Project changed from Ceph to RADOS
  • Category set to Tiering
Actions #2

Updated by Myoungwon Oh over 2 years ago

I'll take a look

Actions #3

Updated by Neha Ojha over 2 years ago

  • Assignee set to Myoungwon Oh
Actions #4

Updated by Myoungwon Oh over 2 years ago

I think this is the same issue as https://tracker.ceph.com/issues/52872.
Recovery takes almost 8 minutes even if current retry duration is 5 minutes.

Actions #5

Updated by Neha Ojha over 2 years ago

Myoungwon Oh wrote:

I think this is the same issue as https://tracker.ceph.com/issues/52872.
Recovery takes almost 8 minutes even if current retry duration is 5 minutes.

Any idea why the recovery time keeps increasing, in https://github.com/ceph/ceph/pull/43493 you mentioned 6 minutes. And how do we make sure that increase in retry duration is enough?

Actions #6

Updated by Neha Ojha over 2 years ago

  • Related to Bug #52872: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure. added
Actions #7

Updated by Myoungwon Oh over 2 years ago

Calculating reference count on manifest snapshotted object requires correct refcount information. So, current unittests wait for the recovery if the object is degraded. I don't have an idea to remove retry timeout here because recovery time differs in the cluster state. Let me think about it.

Actions #8

Updated by Neha Ojha about 2 years ago

  • Priority changed from High to Normal
Actions

Also available in: Atom PDF