Project

General

Profile

Actions

Bug #11783

closed

protocol: flushing caps on MDS restart can go bad

Added by John Spray almost 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Not consistent, not happening on master.

http://pulpito.ceph.com/teuthology-2015-05-16_23:04:02-fs-next-testing-basic-multi/896420/

2015-05-17T15:48:28.604 INFO:tasks.workunit.client.0.plana46.stdout:-------------------
2015-05-17T15:48:28.604 INFO:tasks.workunit.client.0.plana46.stdout:../pjd-fstest-20090130-RC/tests/rename/09.t  (Wstat: 0 Tests: 56 Failed: 3)
2015-05-17T15:48:28.605 INFO:tasks.workunit.client.0.plana46.stdout:  Failed tests:  12-13, 15
2015-05-17T15:48:28.605 INFO:tasks.workunit.client.0.plana46.stdout:Files=191, Tests=1964, 315 wallclock secs ( 3.70 usr  3.48 sys +  5.08 cusr  7.35 csys = 19.61 CPU)
2015-05-17T15:48:28.605 INFO:tasks.workunit.client.0.plana46.stdout:Result: FAIL
Actions #1

Updated by Greg Farnum almost 9 years ago

Yep, this one looks unfamiliar to me. :( Do we have client logs from when it happened that we can reference?

Actions #2

Updated by Zheng Yan almost 9 years ago

  • Status changed from New to In Progress

this is a message ordering issue when MDS failover.

chown marks Ax dirty
client flushes and releases Ax cap
chown send a setattr request to MDS
MDS failovers
client re-sends the setattr request
client send cap_reconnect
MDS gets reovered
client re-sends the cap message to flush Ax cap

Actions #3

Updated by Greg Farnum almost 9 years ago

  • Subject changed from cfuse_workunit_suites_pjd failure on next to protocol: flushing caps on MDS restart can go bad
  • Assignee set to Zheng Yan
Actions #5

Updated by Greg Farnum over 8 years ago

  • Status changed from In Progress to Fix Under Review
Actions #6

Updated by Greg Farnum over 8 years ago

  • Status changed from Fix Under Review to 7
  • Priority changed from Normal to High

I merged this by checking that it's working manually, but the testing isn't behaving properly so I haven't merged that yet. In http://pulpito.ceph.com/ubuntu-2015-09-22_15:56:36-fs-greg-fs-testing---basic-multi/1064720/ we have an example of the failure injecting either not being set or not being picked up by the client.

However, I'm not super-comfortable without that test coverage in-tree, so let's try and figure it out quickly.

Actions #7

Updated by Greg Farnum over 8 years ago

I guess I should note that I only saw this the once and it also included a (slightly outdated) version of the vstart runner branch, so there could be some interplay going on there if it broke conf file handling.

Actions #8

Updated by Greg Farnum over 8 years ago

  • Status changed from 7 to Resolved
Actions

Also available in: Atom PDF