Bug #14254: failed pjd chown test 117 - CephFS - Ceph

Actions

Copy link

Bug #14254

closed

failed pjd chown test 117

Added by Greg Farnum over 8 years ago. Updated over 8 years ago.

Status:

Resolved

Priority:

High

Assignee:

Zheng Yan

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

http://pulpito.ceph.com/gregf-2016-01-04_11:47:54-fs-master---basic-mira/13222/

2016-01-04T13:28:49.073 INFO:tasks.workunit.client.0.mira105.stdout:Test Summary Report
2016-01-04T13:28:49.074 INFO:tasks.workunit.client.0.mira105.stdout:-------------------
2016-01-04T13:28:49.074 INFO:tasks.workunit.client.0.mira105.stdout:../pjd-fstest-20090130-RC/tests/chown/00.t   (Wstat: 0 Tests: 171 Failed: 1)
2016-01-04T13:28:49.075 INFO:tasks.workunit.client.0.mira105.stdout:  Failed test:  117
2016-01-04T13:28:49.075 INFO:tasks.workunit.client.0.mira105.stdout:Files=191, Tests=1964, 377 wallclock secs ( 2.58 usr  1.86 sys +  4.54 cusr  8.47 csys = 17.45 CPU)
2016-01-04T13:28:49.076 INFO:tasks.workunit.client.0.mira105.stdout:Result: FAIL
2016-01-04T13:28:49.076 INFO:tasks.workunit:Stopping ['suites/pjd.sh'] on client.0...

Actions

Copy link

Updated by Greg Farnum over 8 years ago

Okay, I believe the important sequence is:

client creates inode, mds replies unsafe
client requests inode change gid 65533, mds replies unsafe
client requests inode change gid 65532, mds crashes
mds goes into replay
client sends off caps, which include inode on gid 65533
mds does not recognize inode number
mds drops client cap update on clientreplay_start, because it can't find inode
client replays create and two setattrs
client does not update trace on replies because it has already has same seq (1)
client sends cap update, which still has gid 65533, mds accepts it

So I'm not quite sure what to do here. We have tids set by the client, and we have seqs set by the server, but seqs get reset when the MDS does and cap tids aren't related to regular op tids. I think for any missing inodes we ought to be trying to apply the caps after the messages get replayed, rather than just dropping them, but that actually doesn't help us here because we'd be going backwards anyway.

...actually, I don't see what keeps a client cap flush from overwriting a setattr change under non-failure cases, if for some reason the response manages to take long enough for caps to get flushed in the middle.

Actions

Copy link

Updated by Greg Farnum over 8 years ago

Okay, this is running into code from the uid/gid enforcement stuff. From https://github.com/ceph/ceph/commit/1957aeddbf05f2ecf3be0a760ff5b5c313370eea in particular, which is justified by the comment in https://github.com/ceph/ceph/commit/ac031443 — but I don't think that's a true statement. Many of our other update functions invoke check_caps, but _setattr doesn't and even if it did (prior to making changes) there's no reason to think the dirtied cap bits would force an immediate flush. So I think we need to force a flush if we're doing a sync setattr while we have dirty caps.

But that's not a complete fix. We still need a way of ordering cap updates with respect to requested setattr operations.

Actions

Copy link

Updated by Greg Farnum over 8 years ago

Status changed from New to 12

https://github.com/ceph/ceph/pull/7136 addresses the need to flush

Actions

Copy link

Updated by Greg Farnum over 8 years ago

Priority changed from Urgent to High

Okay, so the only way we can get into this trouble is if:
1) the inode isn't found prior to replay (ie, it got created but not journaled)
2) we have competing local dirty cap state and outstanding setattr requests.

And we can't get those competing states outside of the weird allowed-by-the-capabilities-but-not-our-security state case we're looking at now. So we just need to handle that.

Actions

Copy link

Updated by Zheng Yan over 8 years ago

Assignee set to Zheng Yan

Actions

Copy link

Updated by Zheng Yan over 8 years ago

I interpret this differently.

client creates inode, mds replies unsafe
client requests inode change gid 65533, client marks Ax dirty
client requests inode change gid 65532, client sends setattr request to MDS
mds crashes and goes into reconnect stage
client re-sends the setattr request (client does not got unsafe, see Client::resend_unsafe_requests)
mds decide not to handle the setattr request in client_replay stage. (see Server::dispatch)
client re-sends the cap flush with gid 65533 (see Client::early_kick_flushing_caps)
mds goes to client_replay stags
mds drops client cap update on clientreplay_start, because it can't find inode
mds replays the create
mds goto to active
mds handles the setattr request and send replay to client (
client re-sends the cap flush with gid 65533 again (see Client::kick_flushing_caps)
client does not update trace on replies because it has already has same seq (1)
client sends cap update, which still has gid 65533, mds accepts it