Project

General

Profile

Bug #10915

client: hangs on umount if it had an MDS session evicted

Added by John Spray almost 4 years ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Administration/Usability
Target version:
Start date:
02/19/2015
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
luminous,jewel
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
task(intern)
Pull request ID:

Description

Seen like this with fuse client: * Start 2 active MDSs * Do some activity such that sessions are open with both MDSs * ceph daemon mds.b session evict <client id> * Umount client

Client does this:

2015-02-19 10:20:38.580641 7f0e2b7c87c0  2 client.4121 _close_mds_session mds.0 seq 4
2015-02-19 10:20:38.580657 7f0e2b7c87c0  2 client.4121 _close_mds_session mds.1 seq 3
2015-02-19 10:20:38.580664 7f0e2b7c87c0  2 client.4121 waiting for 2 mds sessions to close
2015-02-19 10:20:38.591696 7f0e227fc700 10 client.4121 handle_client_session client_session(close) v1 from mds.0
2015-02-19 10:20:38.591771 7f0e227fc700 10 client.4121 remove_session_caps mds.0
2015-02-19 10:20:38.591783 7f0e227fc700 10 client.4121 kick_requests_closed for mds.0
2015-02-19 10:20:38.591792 7f0e227fc700 10 client.4121 unmounting: trim pass, size was 0+0
2015-02-19 10:20:38.591794 7f0e227fc700 20 client.4121 trim_cache size 0 max 0
2015-02-19 10:20:38.591796 7f0e227fc700 10 client.4121 unmounting: trim pass, size still 0+0
2015-02-19 10:20:38.591801 7f0e2b7c87c0  2 client.4121 waiting for 1 mds sessions to close

But the MDS where we evicted the session ignored it:

2015-02-19 10:20:36.922299 7f2266057700  1 -- 172.16.79.251:6813/35691 <== client.4120 172.16.79.251:0/35776 2093931643 ==== client_session(request_close seq 2) v1 ==== 28+0+0 (817140596 0 0) 0x4680000 con 0x4a27e80
2015-02-19 10:20:36.922319 7f2266057700 20 mds.1.server get_session have 0x4a7a700 client.4120 172.16.79.251:0/35776 state closed
2015-02-19 10:20:36.922323 7f2266057700  3 mds.1.server handle_client_session client_session(request_close seq 2) v1 from client.4120
2015-02-19 10:20:36.922326 7f2266057700 10 mds.1.server already closed|closing|killing, dropping this req

I suppose we should be always acknowledging client request_close messages, so that the client can terminate itself.


Related issues

Related to fs - Bug #23975: qa: TestVolumeClient.test_lifecycle needs updated for new eviction behavior Pending Backport 05/02/2018
Related to fs - Bug #24053: qa: kernel_mount.py umount must handle timeout arg Resolved 05/08/2018
Precedes fs - Bug #24054: kceph: umount on evicted client blocks forever Resolved 02/20/2015 02/20/2015
Copied to fs - Backport #23990: jewel: client: hangs on umount if it had an MDS session evicted Rejected
Copied to fs - Backport #23991: luminous: client: hangs on umount if it had an MDS session evicted Resolved

History

#1 Updated by Greg Farnum almost 4 years ago

Mmmm, that should be a pretty easy change MDS-side; I'm trying to figure out if it could get us in trouble though. And do we really want the client to be clean if we evicted it? There's probably going to be dirty data...Actually I think if there is dirty data it will block on that rather than not getting a close.

On the other hand, we might also want the client to be able to shut down happily if a server or the network goes away but it has no dirty data. I don't think there's much harm cluster-side to the client disappearing in that case, so maybe it should time out the close session request and just exit?

#2 Updated by Greg Farnum over 2 years ago

  • Category set to Administration/Usability

#3 Updated by Patrick Donnelly 11 months ago

  • Subject changed from client hangs on umount if it had an MDS session evicted to client: hangs on umount if it had an MDS session evicted
  • Target version set to v13.0.0
  • Tags set to intern
  • Component(FS) Client, MDS added

#4 Updated by Rishabh Dave 11 months ago

The issue is also reproducible with the kernel client.

Patrick, can you assign this issue to me?

#5 Updated by Patrick Donnelly 11 months ago

  • Assignee set to Rishabh Dave

#6 Updated by Patrick Donnelly 11 months ago

  • Status changed from New to In Progress

#7 Updated by Patrick Donnelly 10 months ago

  • Status changed from In Progress to Need Review

#8 Updated by Patrick Donnelly 9 months ago

  • Status changed from Need Review to Pending Backport
  • Tags deleted (intern)
  • Backport set to luminous,jewel
  • Component(FS) deleted (MDS)
  • Labels (FS) task(intern) added

#9 Updated by Patrick Donnelly 9 months ago

  • Related to Bug #23975: qa: TestVolumeClient.test_lifecycle needs updated for new eviction behavior added

#10 Updated by Nathan Cutler 9 months ago

  • Copied to Backport #23990: jewel: client: hangs on umount if it had an MDS session evicted added

#11 Updated by Nathan Cutler 9 months ago

  • Copied to Backport #23991: luminous: client: hangs on umount if it had an MDS session evicted added

#12 Updated by Patrick Donnelly 9 months ago

  • Related to Bug #24053: qa: kernel_mount.py umount must handle timeout arg added

#13 Updated by Patrick Donnelly 9 months ago

  • Precedes Bug #24054: kceph: umount on evicted client blocks forever added

#14 Updated by Patrick Donnelly 3 months ago

  • Status changed from Pending Backport to Resolved

Also available in: Atom PDF