Bug #14254
closed
failed pjd chown test 117
Added by Greg Farnum over 8 years ago.
Updated over 8 years ago.
Description
http://pulpito.ceph.com/gregf-2016-01-04_11:47:54-fs-master---basic-mira/13222/
2016-01-04T13:28:49.073 INFO:tasks.workunit.client.0.mira105.stdout:Test Summary Report
2016-01-04T13:28:49.074 INFO:tasks.workunit.client.0.mira105.stdout:-------------------
2016-01-04T13:28:49.074 INFO:tasks.workunit.client.0.mira105.stdout:../pjd-fstest-20090130-RC/tests/chown/00.t (Wstat: 0 Tests: 171 Failed: 1)
2016-01-04T13:28:49.075 INFO:tasks.workunit.client.0.mira105.stdout: Failed test: 117
2016-01-04T13:28:49.075 INFO:tasks.workunit.client.0.mira105.stdout:Files=191, Tests=1964, 377 wallclock secs ( 2.58 usr 1.86 sys + 4.54 cusr 8.47 csys = 17.45 CPU)
2016-01-04T13:28:49.076 INFO:tasks.workunit.client.0.mira105.stdout:Result: FAIL
2016-01-04T13:28:49.076 INFO:tasks.workunit:Stopping ['suites/pjd.sh'] on client.0...
Okay, I believe the important sequence is:
- client creates inode, mds replies unsafe
- client requests inode change gid 65533, mds replies unsafe
- client requests inode change gid 65532, mds crashes
- mds goes into replay
- client sends off caps, which include inode on gid 65533
- mds does not recognize inode number
- mds drops client cap update on clientreplay_start, because it can't find inode
- client replays create and two setattrs
- client does not update trace on replies because it has already has same seq (1)
- client sends cap update, which still has gid 65533, mds accepts it
So I'm not quite sure what to do here. We have tids set by the client, and we have seqs set by the server, but seqs get reset when the MDS does and cap tids aren't related to regular op tids. I think for any missing inodes we ought to be trying to apply the caps after the messages get replayed, rather than just dropping them, but that actually doesn't help us here because we'd be going backwards anyway.
...actually, I don't see what keeps a client cap flush from overwriting a setattr change under non-failure cases, if for some reason the response manages to take long enough for caps to get flushed in the middle.
Okay, this is running into code from the uid/gid enforcement stuff. From https://github.com/ceph/ceph/commit/1957aeddbf05f2ecf3be0a760ff5b5c313370eea in particular, which is justified by the comment in https://github.com/ceph/ceph/commit/ac031443 — but I don't think that's a true statement. Many of our other update functions invoke check_caps, but _setattr doesn't and even if it did (prior to making changes) there's no reason to think the dirtied cap bits would force an immediate flush. So I think we need to force a flush if we're doing a sync setattr while we have dirty caps.
But that's not a complete fix. We still need a way of ordering cap updates with respect to requested setattr operations.
- Status changed from New to 12
- Priority changed from Urgent to High
Okay, so the only way we can get into this trouble is if:
1) the inode isn't found prior to replay (ie, it got created but not journaled)
2) we have competing local dirty cap state and outstanding setattr requests.
And we can't get those competing states outside of the weird allowed-by-the-capabilities-but-not-our-security state case we're looking at now. So we just need to handle that.
- Assignee set to Zheng Yan
I interpret this differently.
- client creates inode, mds replies unsafe
- client requests inode change gid 65533, client marks Ax dirty
- client requests inode change gid 65532, client sends setattr request to MDS
- mds crashes and goes into reconnect stage
- client re-sends the setattr request (client does not got unsafe, see Client::resend_unsafe_requests)
- mds decide not to handle the setattr request in client_replay stage. (see Server::dispatch)
- client re-sends the cap flush with gid 65533 (see Client::early_kick_flushing_caps)
- mds goes to client_replay stags
- mds drops client cap update on clientreplay_start, because it can't find inode
- mds replays the create
- mds goto to active
- mds handles the setattr request and send replay to client (
- client re-sends the cap flush with gid 65533 again (see Client::kick_flushing_caps)
- client does not update trace on replies because it has already has same seq (1)
- client sends cap update, which still has gid 65533, mds accepts it
- Status changed from 12 to Fix Under Review
- Status changed from Fix Under Review to Resolved
Also available in: Atom
PDF