Bug #634: Kernel client takes too long to recover after a MDS restart - Linux kernel client - Ceph

Actions

Copy link

Bug #634

closed

Kernel client takes too long to recover after a MDS restart

Added by Ravi Pinjala over 13 years ago. Updated over 13 years ago.

Status:

Can't reproduce

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Crash signature (v1):

Crash signature (v2):

Description

[208292.940934] libceph: mds0 192.168.1.11:6800 socket closed
[208293.050282] libceph: mds0 192.168.1.11:6800 connection failed
[208343.050057] ceph: mds0 caps stale
[208358.050075] ceph: mds0 caps stale
[208545.126700] ceph: mds0 reconnect start
[208545.280853] ceph: mds0 reconnect success
[208546.581244] ceph: mds0 recovery completed

This is after restarting the MDS (so that the daemon came back up after a few seconds). Note the timestamps - the kernel client waited several minutes to attempt a reconnect. During this time, all I/O operations were hanging, until it reconnected (at which point everything worked).

I guess we don't want to barrage the server with connections if something more permanent happens to the MDS, so some kind of bounded exponential backoff might be appropriate here.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Linux kernel client

Custom queries

Bug #634

Kernel client takes too long to recover after a MDS restart

Updated by Sage Weil over 13 years ago

Updated by Greg Farnum over 13 years ago

Updated by Sage Weil over 13 years ago