Bug #54511
test_pool_min_size: AssertionError: not clean before minsize thrashing starts
0%
Description
/a/yuriw-2022-03-04_00:56:58-rados-wip-yuri4-testing-2022-03-03-1448-distro-default-smithi/6719015
2022-03-04T03:06:27.624 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last): File "/home/teuthworker/src/github.com_ceph_ceph-c_c8f79f870e0d6a996c92d420e6256d312bac1c7c/qa/tasks/ceph_manager.py", line 189, in wrapper return func(self) File "/home/teuthworker/src/github.com_ceph_ceph-c_c8f79f870e0d6a996c92d420e6256d312bac1c7c/qa/tasks/ceph_manager.py", line 1412, in _do_thrash self.choose_action()() File "/home/teuthworker/src/github.com_ceph_ceph-c_c8f79f870e0d6a996c92d420e6256d312bac1c7c/qa/tasks/ceph_manager.py", line 896, in test_pool_min_size 'not clean before minsize thrashing starts' AssertionError: not clean before minsize thrashing starts 2022-03-04T03:06:27.625 ERROR:tasks.thrashosds.thrasher:exception: Traceback (most recent call last): File "/home/teuthworker/src/github.com_ceph_ceph-c_c8f79f870e0d6a996c92d420e6256d312bac1c7c/qa/tasks/ceph_manager.py", line 1280, in do_thrash self._do_thrash() File "/home/teuthworker/src/github.com_ceph_ceph-c_c8f79f870e0d6a996c92d420e6256d312bac1c7c/qa/tasks/ceph_manager.py", line 189, in wrapper return func(self) File "/home/teuthworker/src/github.com_ceph_ceph-c_c8f79f870e0d6a996c92d420e6256d312bac1c7c/qa/tasks/ceph_manager.py", line 1412, in _do_thrash self.choose_action()() File "/home/teuthworker/src/github.com_ceph_ceph-c_c8f79f870e0d6a996c92d420e6256d312bac1c7c/qa/tasks/ceph_manager.py", line 896, in test_pool_min_size 'not clean before minsize thrashing starts' AssertionError: not clean before minsize thrashing starts
This error occurs at the early stage of `test_pool_min_size`, where it checks if all the PGs are active+clean after spending at most 60 seconds waiting for PGs to be in active+clean,
Related issues
History
#1 Updated by Aishwarya Mathuria almost 2 years ago
/a/yuriw-2022-03-29_21:35:32-rados-wip-yuri5-testing-2022-03-29-1152-quincy-distro-default-smithi/6767633
#2 Updated by Radoslaw Zarzynski almost 2 years ago
Need to observe more thrashers/minsize_recovery
where this issue happens.
#3 Updated by Radoslaw Zarzynski almost 2 years ago
- Related to Bug #49777: test_pool_min_size: 'check for active or peered' reached maximum tries (5) after waiting for 25 seconds added
#4 Updated by Laura Flores almost 2 years ago
- Related to Bug #51904: test_pool_min_size:AssertionError:wait_for_clean:failed before timeout expired due to down PGs added
#5 Updated by Kamoltat (Junior) Sirivadhna over 1 year ago
/a/ksirivad-2022-07-01_21:00:49-rados:thrash-erasure-code-main-distro-default-smithi/6910103/
#6 Updated by Kamoltat (Junior) Sirivadhna over 1 year ago
- Description updated (diff)
#7 Updated by Kamoltat (Junior) Sirivadhna over 1 year ago
- Description updated (diff)
#8 Updated by Kamoltat (Junior) Sirivadhna over 1 year ago
- Status changed from New to Fix Under Review
- Pull request ID set to 47138
#9 Updated by Kamoltat (Junior) Sirivadhna over 1 year ago
I was able to reproduce the problem after modifying qa/tasks/ceph_manager.py: https://github.com/ceph/ceph/pull/46931/commits/1f6bcbb3d680d8589e498b993d2cf566480e2c3e.
Runs I was able to reproduce the problem after modifying qa/tasks/ceph_manager.py:
/a/ksirivad-2022-07-09_05:39:52-rados:thrash-erasure-code-main-distro-default-smithi/6921351
/a/ksirivad-2022-07-09_05:39:52-rados:thrash-erasure-code-main-distro-default-smithi/6921372
/a/ksirivad-2022-07-09_05:39:52-rados:thrash-erasure-code-main-distro-default-smithi/6921374
/a/ksirivad-2022-07-09_05:39:52-rados:thrash-erasure-code-main-distro-default-smithi/6921382
/a/ksirivad-2022-07-09_05:39:52-rados:thrash-erasure-code-main-distro-default-smithi/6921383
/a/ksirivad-2022-07-09_05:39:52-rados:thrash-erasure-code-main-distro-default-smithi/6921385
Problem
We didn’t give enough buffer between starting an osd backup and actually checking for active+clean. The pgs passed ceph_manager.wait_for_recovery and ceph_manager.wait_for_clean because recovery hasn’t start yet and eventually failed at ceph_manager.is_clean(). My analysis can be found here:
https://docs.google.com/document/d/1HKQc5kO-A9c7ThYTGtUlgTliYyfy__0tFXXs2KHLsZg/edit
Solution
Time out for 60 seconds before ceph_manager.wait_for_recovery + ceph_manager.wait_for_clean.
#10 Updated by Neha Ojha over 1 year ago
- Assignee set to Kamoltat (Junior) Sirivadhna
#11 Updated by Kamoltat (Junior) Sirivadhna over 1 year ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to quincy, pacific
#12 Updated by Backport Bot over 1 year ago
- Copied to Backport #57019: quincy: test_pool_min_size: AssertionError: not clean before minsize thrashing starts added
#13 Updated by Backport Bot over 1 year ago
- Copied to Backport #57020: pacific: test_pool_min_size: AssertionError: not clean before minsize thrashing starts added
#14 Updated by Backport Bot over 1 year ago
- Tags set to backport_processed
#15 Updated by Kamoltat (Junior) Sirivadhna 11 months ago
- Status changed from Pending Backport to Resolved