Project

General

Profile

Actions

Bug #23662

closed

osd: regression causes SLOW_OPS warnings in multimds suite

Added by Patrick Donnelly about 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
multimds
Component(RADOS):
Manager (RADOS bits), OSD
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

See: [1], first instance of the problem at [0].

The last run which did not cause most multimds jobs to fail with SLOW_OPS warnings was [2]. That branch was [3].

To check that this is some kind of osd regression and not an issue with the testing infrastructure, I reran multimds on the basis for that branch [4]. The results (filtering for filestore) are nearly complete at [5]. These indicate that the testing infrastructure is not to blame.

I'm planning to run running the basis [6] for the [0] test run next [9] for another comparison but the issue also exists for parallel tests with same basis [7,8].

My current suspicion is that there is an OSD regression but it's possible another cephfs change caused this. (I need to wrap up this issue for a meeting but will double check that next.)

[0] http://pulpito.ceph.com/pdonnell-2018-03-30_04:03:50-multimds-wip-pdonnell-testing-20180329.205607-testing-basic-smithi/
[1] http://pulpito.ceph.com/?suite=multimds
[2] http://pulpito.ceph.com/pdonnell-2018-03-17_22:31:23-multimds-wip-pdonnell-testing-20180317.202121-testing-basic-smithi/
[3] https://github.com/ceph/ceph-ci/tree/wip-pdonnell-testing-20180317.202121
[4] https://github.com/ceph/ceph-ci/tree/wip-multimds-regression
[5] http://pulpito.ceph.com/pdonnell-2018-04-11_18:38:52-multimds-wip-multimds-regression-testing-basic-smithi/
[6] https://github.com/ceph/ceph-ci/tree/wip-multimds-regression2
[7] http://pulpito.ceph.com/pdonnell-2018-03-30_21:27:51-multimds-wip-pdonnell-testing-20180329.205635-testing-basic-smithi
[8] http://pulpito.ceph.com/pdonnell-2018-03-30_21:45:07-multimds-wip-pdonnell-testing-20180329.211514-testing-basic-smithi
[9] http://pulpito.ceph.com/pdonnell-2018-04-11_21:31:18-multimds-wip-multimds-regression2-testing-basic-smithi/
[10] 9b46b9723bcc468cf60554a7fb23d5092b1dfed3..wip-pdonnell-testing-20180329.205607

Actions #1

Updated by Patrick Donnelly about 6 years ago

  • Description updated (diff)
Actions #2

Updated by Patrick Donnelly about 6 years ago

Looks like the obvious cause: https://github.com/ceph/ceph/pull/20660

Actions #3

Updated by Patrick Donnelly about 6 years ago

  • Status changed from New to Fix Under Review
  • Assignee set to Patrick Donnelly
  • ceph-qa-suite multimds added
  • Component(RADOS) Manager (RADOS bits) added
Actions #4

Updated by Sage Weil about 6 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF