Project

General

Profile

Actions

Bug #16288

closed

mds: `session evict` tell command blocks forever with async messenger (TestVolumeClient.test_evict_client failure)

Added by John Spray almost 8 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
High
Category:
Code Hygiene
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
jewel
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm assuming for the moment that this is an MDS bug rather than something getting dropped in the new messenger code.

MDSRankDispatcher::evict_sessions blocks on journal flush. Seems that we might be preventing the osd op reply being serviced by doing that.


Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #16621: jewel: mds: `session evict` tell command blocks forever with async messenger (TestVolumeClient.test_evict_client failure)ResolvedAbhishek VarshneyActions
Actions #1

Updated by John Spray almost 8 years ago

  • Subject changed from mds: `session evict` tell command blocks forever with async messenger to mds: `session evict` tell command blocks forever with async messenger (TestVolumeClient.test_evict_client failure)
Actions #2

Updated by Greg Farnum almost 8 years ago

  • Priority changed from Normal to High

This deadlocks and lockdep makes it crash in our nightlies; we should fix it quickly! :)

Actions #3

Updated by John Spray almost 8 years ago

NB back out part of https://github.com/ceph/ceph-qa-suite/pull/1054 when fixing this, it's switched back to simple messenger for the moment.

Actions #4

Updated by Douglas Fuller almost 8 years ago

  • Status changed from New to In Progress
  • Assignee set to Douglas Fuller
Actions #5

Updated by Greg Farnum almost 8 years ago

John, do you have any logs? The only failure of this test I can find is http://qa-proxy.ceph.com/teuthology/teuthology-2016-05-07_18:04:02-fs-master---basic-smithi/178451, but that's complaining about client counts, not stuck asok requests.

Actions #6

Updated by Greg Farnum almost 8 years ago

  • Status changed from In Progress to Need More Info
Actions #7

Updated by John Spray almost 8 years ago

  • Status changed from Need More Info to New

Oops, I meant to paste to begin with. I think it was this one:
/a/jspray-2016-06-13_14:56:46-fs-wip-jcsp-testing-quota-2-distro-basic-mira/257054

Actions #8

Updated by Greg Farnum almost 8 years ago

Not to take away Doug's thunder, but I gather he's been unable to reproduce it. The AsyncMessenger may have already been "fixed" so that this isn't a problem, but we should also change the way evict_sessions() works to not block where it does. We should discuss.

Actions #9

Updated by Douglas Fuller almost 8 years ago

  • Status changed from New to In Progress

Still no reproducer, but

https://github.com/ceph/ceph/pull/9971

may help.

Actions #10

Updated by John Spray almost 8 years ago

  • Status changed from In Progress to Pending Backport
  • Backport set to jewel
Actions #11

Updated by Loïc Dachary almost 8 years ago

  • Copied to Backport #16621: jewel: mds: `session evict` tell command blocks forever with async messenger (TestVolumeClient.test_evict_client failure) added
Actions #12

Updated by Greg Farnum almost 8 years ago

  • Category set to Code Hygiene
Actions #13

Updated by Loïc Dachary over 7 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF