Bug #1001: dead mds remains up, won't let others take over - Ceph - Ceph

Actions

Copy link

Bug #1001

closed

dead mds remains up, won't let others take over

Added by Alexandre Oliva about 13 years ago. Updated about 13 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Sage Weil

Category:

Monitor

Target version:

v0.27

% Done:

Source:

Tags:

Backport:

Regression:

Severity:

Reviewed:

Affected Versions:

ceph-qa-suite:

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

3-node cluster, with 3 mons, 3 mdses (all configured for standby-replay), 3 osdes (but node 0 down, because I'm using a kernel ceph mount on it, and that tends to deadlock when uploading lots of data).

While loading up data from node 0, I started a highly-parallel build on node 1, on the btrfs that also holds data for mon1 (but not osd1). Shortly thereafter, the ceph filesystem came to a halt shortly thereafter, and mdses started to disappear from the mds dump output, although ceph -w didn't report any changes for a few minutes.

The mdses eventually came back into standby or standby-replay, but they wouldn't be activated.

The mdsmap history for mon0, and the mon.0.log starting some 90 minutes before the pause, are attached. The problem ocurred between 15:35 and 15:41.

Files

mon0-mdsmap.tar.xz (1.83 MB) mon0-mdsmap.tar.xz

mon0 logs and mdsmap history

Alexandre Oliva, 04/12/2011 01:09 PM

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph

Custom queries

Bug #1001

dead mds remains up, won't let others take over

Updated by Alexandre Oliva about 13 years ago

Updated by Sage Weil about 13 years ago

Updated by Sage Weil about 13 years ago

Updated by Sage Weil about 13 years ago