Bug #53219: LibRadosTwoPoolsPP.ManifestRollbackRefcount failure - RADOS - Ceph

Actions

Copy link

Bug #53219

open

LibRadosTwoPoolsPP.ManifestRollbackRefcount failure

Added by Sage Weil over 2 years ago. Updated about 2 years ago.

Status:

New

Priority:

Normal

Assignee:

Myoungwon Oh

Category:

Tiering

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

2021-11-10T17:58:05.520 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp: [ RUN      ] LibRadosTwoPoolsPP.ManifestRollbackRefcount
2021-11-10T17:58:05.520 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp: require_osd_release = quincy
2021-11-10T17:58:05.520 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp: /home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-8935-ged18f0aa/rpm/el8/BUILD/ceph-17.0.0-8935-ged18f0aa/src/test/librados/tier_cxx.cc:193: Failure
2021-11-10T17:58:05.520 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp: Value of: src_refcount == expected_refcount
2021-11-10T17:58:05.521 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp:   Actual: false
2021-11-10T17:58:05.521 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp: Expected: true
2021-11-10T17:58:05.521 INFO:tasks.workunit.client.0.smithi053.stdout:              api_tier_pp: [  FAILED  ] LibRadosTwoPoolsPP.ManifestRollbackRefcount (425014 ms)

/a/sage-2021-11-10_16:44:53-orch:cephadm-master-distro-basic-smithi/6496428
description: orch:cephadm/thrash/{0-distro/centos_8.2_container_tools_3.0 1-start
2-thrash 3-tasks/rados_api_tests fixed-2 msgr/async root}

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Updated by Sage Weil over 2 years ago

Project changed from Ceph to RADOS
Category set to Tiering

Actions

Copy link

Updated by Myoungwon Oh over 2 years ago

I'll take a look

Actions

Copy link

Updated by Neha Ojha over 2 years ago

Assignee set to Myoungwon Oh

Actions

Copy link

Updated by Myoungwon Oh over 2 years ago

I think this is the same issue as https://tracker.ceph.com/issues/52872.
Recovery takes almost 8 minutes even if current retry duration is 5 minutes.

Actions

Copy link

Updated by Neha Ojha over 2 years ago

Myoungwon Oh wrote:

I think this is the same issue as https://tracker.ceph.com/issues/52872.
Recovery takes almost 8 minutes even if current retry duration is 5 minutes.

Any idea why the recovery time keeps increasing, in https://github.com/ceph/ceph/pull/43493 you mentioned 6 minutes. And how do we make sure that increase in retry duration is enough?

Actions

Copy link

Updated by Neha Ojha over 2 years ago

Related to Bug #52872: LibRadosTwoPoolsPP.ManifestSnapRefcount Failure. added

Actions

Copy link

Updated by Myoungwon Oh over 2 years ago

Calculating reference count on manifest snapshotted object requires correct refcount information. So, current unittests wait for the recovery if the object is degraded. I don't have an idea to remove retry timeout here because recovery time differs in the cluster state. Let me think about it.

Actions

Copy link

Updated by Neha Ojha about 2 years ago

Priority changed from High to Normal

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » RADOS

Custom queries

Bug #53219

LibRadosTwoPoolsPP.ManifestRollbackRefcount failure

Updated by Sage Weil over 2 years ago

Updated by Myoungwon Oh over 2 years ago

Updated by Neha Ojha over 2 years ago

Updated by Myoungwon Oh over 2 years ago

Updated by Neha Ojha over 2 years ago

Updated by Neha Ojha over 2 years ago

Updated by Myoungwon Oh over 2 years ago

Updated by Neha Ojha about 2 years ago