Project

General

Profile

Actions

Bug #4564

closed

client: Close session doesn't wait for outstanding requests

Added by Sam Lang about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Ran into another failure related to testing #4451 on the client where the following occurs:

client sends create/unlink requests
gets back unsafe replies
kill mds on openc
mds restarts
client does unmount request
client sends replay requests for unsafe requests
mds queues replay requests for later
client sends reconnect
mds enters clientreplay state
mds starts replaying queued requests
mds waits for root inode (mydir) -- again queuing replayed requests
once mydir is populated, starts first replay request
session enters closing state
replay requests are dropped because session is closing

This uncovers two problems

1. The client now assert fails because there are outstanding requests when the session is closed:

../../src/include/xlist.h: In function 'xlist<T>::~xlist() [with T = MetaRequest*]' thread 7f84ea7fc700 time 2013-03-26 17:34:04.329235
../../src/include/xlist.h: 69: FAILED assert(size == 0)
ceph version 0.59-524-g0dcb897 (0dcb897e680de5119b77a216ca5133a2265bc446)
1: (ceph::
_ceph_assert_fail(char const*, char const*, int, char const*)+0x9b) [0x7f84f338a791]
2: (xlist<MetaRequest*>::~xlist()+0x3c) [0x7f84f31a7ff2]
3: (MetaSession::~MetaSession()+0x65) [0x7f84f31f29f5]
4: (Client::_closed_mds_session(MetaSession*)+0xbf) [0x7f84f314dd5b]
5: (Client::handle_client_session(MClientSession*)+0x456) [0x7f84f314e1c0]
6: (Client::ms_dispatch(Message*)+0x2d9) [0x7f84f3150785]
7: (Messenger::ms_deliver_dispatch(Message*)+0xa1) [0x7f84f3261a75]
8: (DispatchQueue::entry()+0x54f) [0x7f84f326103f]
9: (DispatchQueue::DispatchThread::entry()+0x22) [0x7f84f336f96a]

2. The mds drops replay requests that the client has already received unsafe replies to.

Proposed fix at the client: wait for all outstanding requests on a session to complete. The mds shouldn't return session_close until all outstanding requests have completed, but for mds restart scenarios like this one, we might want to wait instead of asserting.

Proposed fix at the mds: delay a session close request until clientreplay is complete

Logs attached.


Files

mds.a.log (6.43 MB) mds.a.log Sam Lang, 03/27/2013 07:13 AM
client.admin.log (581 KB) client.admin.log Sam Lang, 03/27/2013 07:13 AM
Actions #1

Updated by Sam Lang about 11 years ago

  • Status changed from New to Fix Under Review

Pushed a fix to wip-4564.

Actions #2

Updated by Ian Colle about 11 years ago

  • Assignee set to Sam Lang
Actions #3

Updated by Sage Weil about 11 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF