Project

General

Profile

Bug #46211

qa: pools stuck in creating

Added by Patrick Donnelly 7 months ago. Updated 7 months ago.

Status:
Duplicate
Priority:
Urgent
Assignee:
-
Category:
-
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature:

Description

During cluster setup for the CephFS suites, we see this failure:

"2020-06-25T00:41:20.858154+0000 mon.b (mon.0) 208 : cluster [WRN] Health check failed: Reduced data availability: 1 pg inactive, 1 pg peering (PG_AVAILABILITY)" in cluster log

The actual failure is:

2020-06-25T01:13:05.675 INFO:tasks.ceph:Waiting for all PGs to be active+clean and split+merged, waiting on ['4.5', '2.7'] to go clean and/or [] to split/merge
...
2020-06-25T01:13:25.676 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200624.212840/qa/tasks/ceph.py", line 1829, in task
    healthy(ctx=ctx, config=dict(cluster=config['cluster']))
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200624.212840/qa/tasks/ceph.py", line 1419, in healthy
    manager.wait_for_clean()
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200624.212840/qa/tasks/ceph_manager.py", line 2516, in wait_for_clean
    'wait_for_clean: failed before timeout expired'
AssertionError: wait_for_clean: failed before timeout expired

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 33, in nested
    yield vars
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200624.212840/qa/tasks/ceph.py", line 1838, in task
    osd_scrub_pgs(ctx, config)
  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200624.212840/qa/tasks/ceph.py", line 1209, in osd_scrub_pgs
    raise RuntimeError("Scrubbing terminated -- not all pgs were active and clean.")
RuntimeError: Scrubbing terminated -- not all pgs were active and clean.

From: /ceph/teuthology-archive/pdonnell-2020-06-24_22:47:27-fs-wip-pdonnell-testing-20200624.212840-distro-basic-smithi/5176751/teuthology.log

This had appeared to be related to EC pools but the above failure also includes a replicated RBD pool (PG 2.7). Here's another with just replicated pools: /ceph/teuthology-archive/pdonnell-2020-06-24_22:47:27-fs-wip-pdonnell-testing-20200624.212840-distro-basic-smithi/5176615/teuthology.log

The break appears to have occurred between June 19th and 23rd: http://pulpito.ceph.com/?suite=fs


Related issues

Duplicates RADOS - Bug #46180: qa: Scrubbing terminated -- not all pgs were active and clean. Resolved

History

#1 Updated by Patrick Donnelly 7 months ago

  • Duplicates Bug #46180: qa: Scrubbing terminated -- not all pgs were active and clean. added

#2 Updated by Patrick Donnelly 7 months ago

  • Status changed from New to Duplicate

Also available in: Atom PDF