Project

General

Profile

Actions

Bug #2299

closed

all MDS commit suicide on startup

Added by Martin Scheffler about 12 years ago. Updated over 7 years ago.

Status:
Rejected
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

my setup is: 1 MON, 2 MDS and 4 OSD.
ceph version is commit:1e76a8713feac6883c648512dcdc28c83f7ff69e.

after copying about 300GB into the cluster and some reboots, the MDS servers choke on startup.
"ceph -s":
2012-04-14 18:25:18.838752 pg v49910: 594 pgs: 594 active+clean; 294 GB data, 584 GB used, 2426 GB / 3052 GB avail
2012-04-14 18:25:18.844659 mds e9201: 1/1/1 up {0=1=up:reconnect(laggy or crashed)}
2012-04-14 18:25:18.845061 osd e302: 4 osds: 4 up, 4 in
2012-04-14 18:25:18.845514 log 2012-04-14 18:13:57.753223 osd.0 192.168.32.177:6801/1505 163 : [WRN] mds.0 192.168.32.185:6800/6108 misdirected mds.0.63:45 1.b9 to osd.0 not [1,0] in e302/302
2012-04-14 18:25:18.853380 mon e2: 1 mons at {0=192.168.32.177:6789/0}

attached is a (short) log from starting one of the MDS.
if you need more detailed logs, i have an exhaustive log with debug level 99999999, but this is >250MB uncompressed (and 6MB compressed).

my question is: how do i repair this.

and the MDS should be changed to cope with this error condition instead of bailing out.


Files

mds.1-short.log (29.6 KB) mds.1-short.log Martin Scheffler, 04/14/2012 09:39 AM
Actions #1

Updated by Martin Scheffler about 12 years ago

after i told osd.0 to get lost and reformatted it, the cluster started resyncing.
then (magically) mds.0 started up ok.
but, the underlying problem with the MDS-server still needs to be fixed.
imho, the MDS could probe other OSDs for the blob in question.
trying to understand the source code right now.

Actions #2

Updated by Martin Scheffler about 12 years ago

this issue can be closed, there was an error in the underlying fileystem of osd.0 :)

Actions #3

Updated by Sage Weil about 12 years ago

  • Status changed from New to Rejected
Actions #4

Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Actions

Also available in: Atom PDF