Bug #43656
AssertionError: not all PGs are active or peered 15 seconds after marking out OSDs
Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
backport_processed
Backport:
nautilus
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
2020-01-17T22:19:28.631 ERROR:tasks.thrashosds.thrasher:exception: Traceback (most recent call last): File "/home/teuthworker/src/github.com_liewegas_ceph_wip-cephadm-cot/qa/tasks/ceph_manager.py", line 1040, in do_thrash self._do_thrash() File "/home/teuthworker/src/github.com_liewegas_ceph_wip-cephadm-cot/qa/tasks/ceph_manager.py", line 1052, in wrapper return func(self) File "/home/teuthworker/src/github.com_liewegas_ceph_wip-cephadm-cot/qa/tasks/ceph_manager.py", line 1182, in _do_thrash self.choose_action()() File "/home/teuthworker/src/github.com_liewegas_ceph_wip-cephadm-cot/qa/tasks/ceph_manager.py", line 847, in test_pool_min_size 'not all PGs are active or peered 15 seconds after marking out OSDs' AssertionError: not all PGs are active or peered 15 seconds after marking out OSDs
/a/sage-2020-01-17_21:45:24-rados:thrash-erasure-code-master-distro-basic-smithi/4679221
Related issues
History
#1 Updated by Sage Weil about 4 years ago
In this case, the workload happened to delete the old pool/pgs and create a new one right before the check, so the new pool's PGs were all in state 'unknown'--not because of the out osd, but because they were new.
i.e., the test is buggy.
#2 Updated by Sage Weil about 4 years ago
/a/sage-2020-01-20_14:10:17-rados:thrash-erasure-code-wip-sage-testing-2020-01-19-1713-distro-basic-smithi/4688160
#3 Updated by Sage Weil about 4 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 32737
#4 Updated by Sage Weil about 4 years ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to nautilus
#5 Updated by Nathan Cutler about 4 years ago
- Copied to Backport #43776: nautilus: AssertionError: not all PGs are active or peered 15 seconds after marking out OSDs added
#6 Updated by Nathan Cutler about 4 years ago
Hi Sage:
This issue appears to have been introduced by https://github.com/ceph/ceph/pull/17619 - a major octopus feature which is not being backported to nautilus. So I'm not sure if the backport to nautilus is valid.
Marked #43776 "Need More Info" for now.
Thanks,
Nathan
#7 Updated by Backport Bot over 1 year ago
- Tags set to backport_processed
#8 Updated by Konstantin Shalygin over 1 year ago
- Status changed from Pending Backport to Resolved