Project

General

Profile

Actions

Bug #17294

open

mds client didn't update directory content after short network break

Added by ren li over 7 years ago. Updated over 7 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client, ceph-fuse
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Client: ceph-fuse 0.8.5, 10.2.2
Server: ceph 10.2.1, single active mds

Sympton:
1. mount same cephfs on 2 nodes ( A, B )
2. list content of dir1 on A
3. break network connection between A and mds longer than 60s (with default mds_session_timeout), for example using iptables to drop packages
4. add new file in dir1 on B
5. list content of dir1 on A, cannot see the new file

Update content dir1 on A will recover.

Cause:
Capabilities between client and mds are inconsistent after client becoming stale and mds stops revoking cap of client.

Actions #1

Updated by Zheng Yan over 7 years ago

anyone interesting in this. this is easy to reproduce and it's good opportunity to how capability works

Actions #2

Updated by ren li over 7 years ago

Zheng Yan wrote:

anyone interesting in this. this is easy to reproduce and it's good opportunity to how capability works

I was tring to figure out a patch. Is there any source for the design of capability other than code and test cases (not very helpfule)?

Actions #3

Updated by Zheng Yan over 7 years ago

I checked this again. This issue happens only when mds' session_timeout config is non-default (which is smaller than the mds_session_timeout value in mdsmap)

Actions #4

Updated by Daniel Oliveira over 7 years ago

Nathan / Zhen,
I will take a look at it.

Actions #5

Updated by John Spray over 7 years ago

So we should change this to use the same value everywhere (from the mdsmap!)

Actions #6

Updated by Jeff Layton over 7 years ago

Ok, so to summarize (based on speculation here -- I haven't reproduced this).

After the first ls -l, client A has enough caps to cache the directory contents. We have a network partition, and then create a file in the dir on client B. MDS tries to recall the caps from client A, but it's unresponsive. At that point, it kicks out the client's session.

Network partition is then healed, and application does another ls -l. Client thinks it has all the caps it needs and that it doesn't need to contact the MDS, and so satisfies the ls -l out of cache.

How is the client expected to discover that its session has been killed off, if it doesn't need to contact the MDS after the network partition? Is there some sort of periodic session renewal that must occur?

Actions #7

Updated by Greg Farnum over 7 years ago

Yes, there's session renewal, and the MDS shouldn't unilaterally revoke caps until that time has passed.

I think we do have two different timeout periods, one for caps and one for session; I've never really understood the reasons and it might be that's causing failures here? But Zheng's comment implies we just have the session timeout represented in two (inconsistent, here) places and need to unify them.

Actions #8

Updated by John Spray over 7 years ago

Oh dear, the MDS takes mds_session_timeout from its local config, but the client users mdsmap->get_session_timeout(). They'll only match if mds_session_timeout was set the same on the MDS and the mon at the time the map was created.

So it's safe to increase the MDS's local session timeout (the client will be more pessimistic and not use stale data), but if one decreased the MDS's local session timeout then the client would keep the longer optimistic value and would exhibit this bad behaviour.

I think we should use the MDSMap value everywhere, especially as that's what's used in the kclient too. We will need to add getter/setter calls for it in MDSMonitor and retire the config value.

Actions #9

Updated by Greg Farnum over 7 years ago

That sounds good to me.

It'll actually be a little finicky though — if we generate a new MDSMap that cuts the timeout in half, we don't want to boot out clients who are still on the previous epoch until the timeout we believe they saw has passed!

Actions

Also available in: Atom PDF