Project

General

Profile

Actions

Bug #14254

closed

failed pjd chown test 117

Added by Greg Farnum over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

http://pulpito.ceph.com/gregf-2016-01-04_11:47:54-fs-master---basic-mira/13222/

2016-01-04T13:28:49.073 INFO:tasks.workunit.client.0.mira105.stdout:Test Summary Report
2016-01-04T13:28:49.074 INFO:tasks.workunit.client.0.mira105.stdout:-------------------
2016-01-04T13:28:49.074 INFO:tasks.workunit.client.0.mira105.stdout:../pjd-fstest-20090130-RC/tests/chown/00.t   (Wstat: 0 Tests: 171 Failed: 1)
2016-01-04T13:28:49.075 INFO:tasks.workunit.client.0.mira105.stdout:  Failed test:  117
2016-01-04T13:28:49.075 INFO:tasks.workunit.client.0.mira105.stdout:Files=191, Tests=1964, 377 wallclock secs ( 2.58 usr  1.86 sys +  4.54 cusr  8.47 csys = 17.45 CPU)
2016-01-04T13:28:49.076 INFO:tasks.workunit.client.0.mira105.stdout:Result: FAIL
2016-01-04T13:28:49.076 INFO:tasks.workunit:Stopping ['suites/pjd.sh'] on client.0...
Actions #1

Updated by Greg Farnum over 8 years ago

Okay, I believe the important sequence is:

  • client creates inode, mds replies unsafe
  • client requests inode change gid 65533, mds replies unsafe
  • client requests inode change gid 65532, mds crashes
  • mds goes into replay
  • client sends off caps, which include inode on gid 65533
  • mds does not recognize inode number
  • mds drops client cap update on clientreplay_start, because it can't find inode
  • client replays create and two setattrs
  • client does not update trace on replies because it has already has same seq (1)
  • client sends cap update, which still has gid 65533, mds accepts it

So I'm not quite sure what to do here. We have tids set by the client, and we have seqs set by the server, but seqs get reset when the MDS does and cap tids aren't related to regular op tids. I think for any missing inodes we ought to be trying to apply the caps after the messages get replayed, rather than just dropping them, but that actually doesn't help us here because we'd be going backwards anyway.

...actually, I don't see what keeps a client cap flush from overwriting a setattr change under non-failure cases, if for some reason the response manages to take long enough for caps to get flushed in the middle.

Actions #2

Updated by Greg Farnum over 8 years ago

Okay, this is running into code from the uid/gid enforcement stuff. From https://github.com/ceph/ceph/commit/1957aeddbf05f2ecf3be0a760ff5b5c313370eea in particular, which is justified by the comment in https://github.com/ceph/ceph/commit/ac031443 — but I don't think that's a true statement. Many of our other update functions invoke check_caps, but _setattr doesn't and even if it did (prior to making changes) there's no reason to think the dirtied cap bits would force an immediate flush. So I think we need to force a flush if we're doing a sync setattr while we have dirty caps.

But that's not a complete fix. We still need a way of ordering cap updates with respect to requested setattr operations.

Actions #3

Updated by Greg Farnum over 8 years ago

  • Status changed from New to 12

https://github.com/ceph/ceph/pull/7136 addresses the need to flush

Actions #4

Updated by Greg Farnum over 8 years ago

  • Priority changed from Urgent to High

Okay, so the only way we can get into this trouble is if:
1) the inode isn't found prior to replay (ie, it got created but not journaled)
2) we have competing local dirty cap state and outstanding setattr requests.

And we can't get those competing states outside of the weird allowed-by-the-capabilities-but-not-our-security state case we're looking at now. So we just need to handle that.

Actions #5

Updated by Zheng Yan over 8 years ago

  • Assignee set to Zheng Yan
Actions #6

Updated by Zheng Yan over 8 years ago

I interpret this differently.

  • client creates inode, mds replies unsafe
  • client requests inode change gid 65533, client marks Ax dirty
  • client requests inode change gid 65532, client sends setattr request to MDS
  • mds crashes and goes into reconnect stage
  • client re-sends the setattr request (client does not got unsafe, see Client::resend_unsafe_requests)
  • mds decide not to handle the setattr request in client_replay stage. (see Server::dispatch)
  • client re-sends the cap flush with gid 65533 (see Client::early_kick_flushing_caps)
  • mds goes to client_replay stags
  • mds drops client cap update on clientreplay_start, because it can't find inode
  • mds replays the create
  • mds goto to active
  • mds handles the setattr request and send replay to client (
  • client re-sends the cap flush with gid 65533 again (see Client::kick_flushing_caps)
  • client does not update trace on replies because it has already has same seq (1)
  • client sends cap update, which still has gid 65533, mds accepts it
Actions #7

Updated by Zheng Yan over 8 years ago

  • Status changed from 12 to Fix Under Review
Actions #8

Updated by Greg Farnum over 8 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF