Bug #46211
qa: pools stuck in creating
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
During cluster setup for the CephFS suites, we see this failure:
"2020-06-25T00:41:20.858154+0000 mon.b (mon.0) 208 : cluster [WRN] Health check failed: Reduced data availability: 1 pg inactive, 1 pg peering (PG_AVAILABILITY)" in cluster log
The actual failure is:
2020-06-25T01:13:05.675 INFO:tasks.ceph:Waiting for all PGs to be active+clean and split+merged, waiting on ['4.5', '2.7'] to go clean and/or [] to split/merge ... 2020-06-25T01:13:25.676 ERROR:teuthology.contextutil:Saw exception from nested tasks Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200624.212840/qa/tasks/ceph.py", line 1829, in task healthy(ctx=ctx, config=dict(cluster=config['cluster'])) File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200624.212840/qa/tasks/ceph.py", line 1419, in healthy manager.wait_for_clean() File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200624.212840/qa/tasks/ceph_manager.py", line 2516, in wait_for_clean 'wait_for_clean: failed before timeout expired' AssertionError: wait_for_clean: failed before timeout expired During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", line 33, in nested yield vars File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200624.212840/qa/tasks/ceph.py", line 1838, in task osd_scrub_pgs(ctx, config) File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20200624.212840/qa/tasks/ceph.py", line 1209, in osd_scrub_pgs raise RuntimeError("Scrubbing terminated -- not all pgs were active and clean.") RuntimeError: Scrubbing terminated -- not all pgs were active and clean.
From: /ceph/teuthology-archive/pdonnell-2020-06-24_22:47:27-fs-wip-pdonnell-testing-20200624.212840-distro-basic-smithi/5176751/teuthology.log
This had appeared to be related to EC pools but the above failure also includes a replicated RBD pool (PG 2.7). Here's another with just replicated pools: /ceph/teuthology-archive/pdonnell-2020-06-24_22:47:27-fs-wip-pdonnell-testing-20200624.212840-distro-basic-smithi/5176615/teuthology.log
The break appears to have occurred between June 19th and 23rd: http://pulpito.ceph.com/?suite=fs
Related issues
History
#1 Updated by Patrick Donnelly almost 4 years ago
- Duplicates Bug #46180: qa: Scrubbing terminated -- not all pgs were active and clean. added
#2 Updated by Patrick Donnelly almost 4 years ago
- Status changed from New to Duplicate