Bug #1302: mds: mds_caps_wanted vs migration - CephFS - Ceph

Actions

Copy link

Bug #1302

closed

mds: mds_caps_wanted vs migration

Added by Sage Weil almost 13 years ago. Updated over 7 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Greg Farnum

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Actions

Copy link

Updated by Greg Farnum almost 13 years ago

More detail, please?

Actions

Copy link

Updated by Sage Weil almost 13 years ago

When a client opens a file via an MDS replica, the replica sends the auth a message letting them know which caps are wanted (this goes into map<int,int> CInode::mds_caps_wanted). When that set of wanted caps changes on the replica (client says it no longer wants caps), it sends another message.

This breaks when we race with migration:

- client opens file via replica mds0
- mds0 sends message to mds1 [auth] to update wanted
- mds1 exports inode to mds2
- mds1 receives update, ignores it
-> mds2 doesn't learn of mds0's wanted update

We either need
- some (semi-)intelligent retry when we race (probably similar to the cache expire? send to both old and new auth, wait if ambigauth)
- resend on any migration (high overhead!)
- ack wanted updates (high overhead)

Pretty sure we should mimic whatever cache expire is doing. That works very well, and the migration protocol already does a bunch of work (in the form of notify messages) to facilitate it.

Actions

Copy link

Updated by Sage Weil almost 13 years ago

Category deleted (1)

Actions

Copy link

Updated by Sage Weil almost 13 years ago

Assignee deleted (~~Sage Weil~~)

Actions

Copy link

Updated by Greg Farnum almost 13 years ago

Assignee set to Greg Farnum

Actions

Copy link

Updated by Greg Farnum almost 13 years ago

Category set to 1
Status changed from New to Resolved

Okay, so this is actually already implemented: The replica will put the message on a waiter if the inode has an ambiguous auth. There's a minor mismatch in that the replica will send the message while the auth is in rejoin and the the auth will drop it until it's past rejoin, but that's apparently not what happened here.

However, there were a few problems in the sender function itself that I've resolved:
1) The test to drop all caps seems to have been backwards since writing; it would only drop all caps if the time-to-drop was AFTER the current time.
2) The replica would record itself as having sent a new set of wanted caps even if it didn't actually send the message (because the auth wasn't into or past the REJOIN state yet).

Both of these have been fixed and pushed as of commit:a2c761e62acdb3cff941867c224ae295cf6337b3.

Actions

Copy link