Bug #58837: mgr/test_progress.py: test_osd_healthy_recovery fails after timeout - RADOS - Ceph

Actions

Copy link

Bug #58837

open

mgr/test_progress.py: test_osd_healthy_recovery fails after timeout

Added by Laura Flores about 1 year ago. Updated about 1 year ago.

Status:

New

Priority:

Normal

Assignee:

Kamoltat (Junior) Sirivadhna

Category:

Target version:

% Done:

Source:

Tags:

Backport:

quincy

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

/a/yuriw-2023-02-22_20:55:15-rados-wip-yuri4-testing-2023-02-22-0817-quincy-distro-default-smithi/7184746

2023-02-23T08:16:35.335 INFO:tasks.cephfs_test_runner:======================================================================
2023-02-23T08:16:35.335 INFO:tasks.cephfs_test_runner:ERROR: test_osd_healthy_recovery (tasks.mgr.test_progress.TestProgress)
2023-02-23T08:16:35.335 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2023-02-23T08:16:35.335 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2023-02-23T08:16:35.335 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_cbccb547f47ec697c2e2ecf23392cc636ea19450/qa/tasks/mgr/test_progress.py", line 303, in test_osd_healthy_recovery
2023-02-23T08:16:35.335 INFO:tasks.cephfs_test_runner:    self.wait_until_true(lambda: self._is_complete(ev['id']),
2023-02-23T08:16:35.335 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_cbccb547f47ec697c2e2ecf23392cc636ea19450/qa/tasks/ceph_test_case.py", line 212, in wait_until_true
2023-02-23T08:16:35.335 INFO:tasks.cephfs_test_runner:    raise TestTimeoutError("Timed out after {0}s and {1} retries".format(elapsed, retry_count))
2023-02-23T08:16:35.335 INFO:tasks.cephfs_test_runner:tasks.ceph_test_case.TestTimeoutError: Timed out after 120s and 0 retries

Also seen in a main wip branch:
/a/lflores-2023-01-27_15:39:50-rados-wip-lflores-testing-2023-01-26-2227-distro-default-smithi/7141897

Actions

Copy link

Updated by Laura Flores about 1 year ago

Seen in the mgr logs: 2 pgs stuck in recovery

{
    "PG_DEGRADED": {
        "severity": "HEALTH_WARN",
        "summary": {
            "message": "Degraded data redundancy: 2 pgs undersized",
            "count": 2
        },
        "detail": [
            {
                "message": "pg 7.3 is stuck undersized for 2m, current state active+recovering+undersized+remapped, last acting [2]" 
            },
            {
                "message": "pg 7.b is stuck undersized for 2m, current state active+recovering+undersized+remapped, last acting [2]" 
            }
        ]
    }
}

Actions

Copy link