Project

General

Profile

Actions

Bug #10915

closed

client: hangs on umount if it had an MDS session evicted

Added by John Spray about 9 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Administration/Usability
Target version:
% Done:

0%

Source:
other
Tags:
Backport:
luminous,jewel
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client
Labels (FS):
task(intern)
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Seen like this with fuse client: * Start 2 active MDSs * Do some activity such that sessions are open with both MDSs * ceph daemon mds.b session evict <client id> * Umount client

Client does this:

2015-02-19 10:20:38.580641 7f0e2b7c87c0  2 client.4121 _close_mds_session mds.0 seq 4
2015-02-19 10:20:38.580657 7f0e2b7c87c0  2 client.4121 _close_mds_session mds.1 seq 3
2015-02-19 10:20:38.580664 7f0e2b7c87c0  2 client.4121 waiting for 2 mds sessions to close
2015-02-19 10:20:38.591696 7f0e227fc700 10 client.4121 handle_client_session client_session(close) v1 from mds.0
2015-02-19 10:20:38.591771 7f0e227fc700 10 client.4121 remove_session_caps mds.0
2015-02-19 10:20:38.591783 7f0e227fc700 10 client.4121 kick_requests_closed for mds.0
2015-02-19 10:20:38.591792 7f0e227fc700 10 client.4121 unmounting: trim pass, size was 0+0
2015-02-19 10:20:38.591794 7f0e227fc700 20 client.4121 trim_cache size 0 max 0
2015-02-19 10:20:38.591796 7f0e227fc700 10 client.4121 unmounting: trim pass, size still 0+0
2015-02-19 10:20:38.591801 7f0e2b7c87c0  2 client.4121 waiting for 1 mds sessions to close

But the MDS where we evicted the session ignored it:

2015-02-19 10:20:36.922299 7f2266057700  1 -- 172.16.79.251:6813/35691 <== client.4120 172.16.79.251:0/35776 2093931643 ==== client_session(request_close seq 2) v1 ==== 28+0+0 (817140596 0 0) 0x4680000 con 0x4a27e80
2015-02-19 10:20:36.922319 7f2266057700 20 mds.1.server get_session have 0x4a7a700 client.4120 172.16.79.251:0/35776 state closed
2015-02-19 10:20:36.922323 7f2266057700  3 mds.1.server handle_client_session client_session(request_close seq 2) v1 from client.4120
2015-02-19 10:20:36.922326 7f2266057700 10 mds.1.server already closed|closing|killing, dropping this req

I suppose we should be always acknowledging client request_close messages, so that the client can terminate itself.


Related issues 5 (0 open5 closed)

Related to CephFS - Bug #23975: qa: TestVolumeClient.test_lifecycle needs updated for new eviction behaviorResolvedPatrick Donnelly05/02/2018

Actions
Related to CephFS - Bug #24053: qa: kernel_mount.py umount must handle timeout argResolvedZheng Yan05/08/2018

Actions
Precedes CephFS - Bug #24054: kceph: umount on evicted client blocks foreverResolvedZheng Yan02/20/201502/20/2015

Actions
Copied to CephFS - Backport #23990: jewel: client: hangs on umount if it had an MDS session evictedRejectedActions
Copied to CephFS - Backport #23991: luminous: client: hangs on umount if it had an MDS session evictedResolvedPatrick DonnellyActions
Actions #1

Updated by Greg Farnum about 9 years ago

Mmmm, that should be a pretty easy change MDS-side; I'm trying to figure out if it could get us in trouble though. And do we really want the client to be clean if we evicted it? There's probably going to be dirty data...Actually I think if there is dirty data it will block on that rather than not getting a close.

On the other hand, we might also want the client to be able to shut down happily if a server or the network goes away but it has no dirty data. I don't think there's much harm cluster-side to the client disappearing in that case, so maybe it should time out the close session request and just exit?

Actions #2

Updated by Greg Farnum almost 8 years ago

  • Category set to Administration/Usability
Actions #3

Updated by Patrick Donnelly about 6 years ago

  • Subject changed from client hangs on umount if it had an MDS session evicted to client: hangs on umount if it had an MDS session evicted
  • Target version set to v13.0.0
  • Tags set to intern
  • Component(FS) Client, MDS added
Actions #4

Updated by Rishabh Dave about 6 years ago

The issue is also reproducible with the kernel client.

Patrick, can you assign this issue to me?

Actions #5

Updated by Patrick Donnelly about 6 years ago

  • Assignee set to Rishabh Dave
Actions #6

Updated by Patrick Donnelly about 6 years ago

  • Status changed from New to In Progress
Actions #7

Updated by Patrick Donnelly about 6 years ago

  • Status changed from In Progress to Fix Under Review
Actions #8

Updated by Patrick Donnelly almost 6 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Tags deleted (intern)
  • Backport set to luminous,jewel
  • Component(FS) deleted (MDS)
  • Labels (FS) task(intern) added
Actions #9

Updated by Patrick Donnelly almost 6 years ago

  • Related to Bug #23975: qa: TestVolumeClient.test_lifecycle needs updated for new eviction behavior added
Actions #10

Updated by Nathan Cutler almost 6 years ago

  • Copied to Backport #23990: jewel: client: hangs on umount if it had an MDS session evicted added
Actions #11

Updated by Nathan Cutler almost 6 years ago

  • Copied to Backport #23991: luminous: client: hangs on umount if it had an MDS session evicted added
Actions #12

Updated by Patrick Donnelly almost 6 years ago

  • Related to Bug #24053: qa: kernel_mount.py umount must handle timeout arg added
Actions #13

Updated by Patrick Donnelly almost 6 years ago

  • Precedes Bug #24054: kceph: umount on evicted client blocks forever added
Actions #14

Updated by Patrick Donnelly over 5 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF