Bug #23662
closedosd: regression causes SLOW_OPS warnings in multimds suite
0%
Description
See: [1], first instance of the problem at [0].
The last run which did not cause most multimds jobs to fail with SLOW_OPS warnings was [2]. That branch was [3].
To check that this is some kind of osd regression and not an issue with the testing infrastructure, I reran multimds on the basis for that branch [4]. The results (filtering for filestore) are nearly complete at [5]. These indicate that the testing infrastructure is not to blame.
I'm planning to run running the basis [6] for the [0] test run next [9] for another comparison but the issue also exists for parallel tests with same basis [7,8].
My current suspicion is that there is an OSD regression but it's possible another cephfs change caused this. (I need to wrap up this issue for a meeting but will double check that next.)
[0] http://pulpito.ceph.com/pdonnell-2018-03-30_04:03:50-multimds-wip-pdonnell-testing-20180329.205607-testing-basic-smithi/
[1] http://pulpito.ceph.com/?suite=multimds
[2] http://pulpito.ceph.com/pdonnell-2018-03-17_22:31:23-multimds-wip-pdonnell-testing-20180317.202121-testing-basic-smithi/
[3] https://github.com/ceph/ceph-ci/tree/wip-pdonnell-testing-20180317.202121
[4] https://github.com/ceph/ceph-ci/tree/wip-multimds-regression
[5] http://pulpito.ceph.com/pdonnell-2018-04-11_18:38:52-multimds-wip-multimds-regression-testing-basic-smithi/
[6] https://github.com/ceph/ceph-ci/tree/wip-multimds-regression2
[7] http://pulpito.ceph.com/pdonnell-2018-03-30_21:27:51-multimds-wip-pdonnell-testing-20180329.205635-testing-basic-smithi
[8] http://pulpito.ceph.com/pdonnell-2018-03-30_21:45:07-multimds-wip-pdonnell-testing-20180329.211514-testing-basic-smithi
[9] http://pulpito.ceph.com/pdonnell-2018-04-11_21:31:18-multimds-wip-multimds-regression2-testing-basic-smithi/
[10] 9b46b9723bcc468cf60554a7fb23d5092b1dfed3..wip-pdonnell-testing-20180329.205607
Updated by Patrick Donnelly about 6 years ago
Looks like the obvious cause: https://github.com/ceph/ceph/pull/20660
Updated by Patrick Donnelly about 6 years ago
- Status changed from New to Fix Under Review
- Assignee set to Patrick Donnelly
- ceph-qa-suite multimds added
- Component(RADOS) Manager (RADOS bits) added
Updated by Sage Weil about 6 years ago
- Status changed from Fix Under Review to Resolved