Project

General

Profile

Actions

Bug #1001

closed

dead mds remains up, won't let others take over

Added by Alexandre Oliva about 13 years ago. Updated about 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Monitor
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

3-node cluster, with 3 mons, 3 mdses (all configured for standby-replay), 3 osdes (but node 0 down, because I'm using a kernel ceph mount on it, and that tends to deadlock when uploading lots of data).

While loading up data from node 0, I started a highly-parallel build on node 1, on the btrfs that also holds data for mon1 (but not osd1). Shortly thereafter, the ceph filesystem came to a halt shortly thereafter, and mdses started to disappear from the mds dump output, although ceph -w didn't report any changes for a few minutes.

The mdses eventually came back into standby or standby-replay, but they wouldn't be activated.

The mdsmap history for mon0, and the mon.0.log starting some 90 minutes before the pause, are attached. The problem ocurred between 15:35 and 15:41.


Files

mon0-mdsmap.tar.xz (1.83 MB) mon0-mdsmap.tar.xz mon0 logs and mdsmap history Alexandre Oliva, 04/12/2011 01:09 PM
Actions

Also available in: Atom PDF