Bug #2299
closedall MDS commit suicide on startup
0%
Description
my setup is: 1 MON, 2 MDS and 4 OSD.
ceph version is commit:1e76a8713feac6883c648512dcdc28c83f7ff69e.
after copying about 300GB into the cluster and some reboots, the MDS servers choke on startup.
"ceph -s":
2012-04-14 18:25:18.838752 pg v49910: 594 pgs: 594 active+clean; 294 GB data, 584 GB used, 2426 GB / 3052 GB avail
2012-04-14 18:25:18.844659 mds e9201: 1/1/1 up {0=1=up:reconnect(laggy or crashed)}
2012-04-14 18:25:18.845061 osd e302: 4 osds: 4 up, 4 in
2012-04-14 18:25:18.845514 log 2012-04-14 18:13:57.753223 osd.0 192.168.32.177:6801/1505 163 : [WRN] mds.0 192.168.32.185:6800/6108 misdirected mds.0.63:45 1.b9 to osd.0 not [1,0] in e302/302
2012-04-14 18:25:18.853380 mon e2: 1 mons at {0=192.168.32.177:6789/0}
attached is a (short) log from starting one of the MDS.
if you need more detailed logs, i have an exhaustive log with debug level 99999999, but this is >250MB uncompressed (and 6MB compressed).
my question is: how do i repair this.
and the MDS should be changed to cope with this error condition instead of bailing out.
Files