Bug #1001
closeddead mds remains up, won't let others take over
Description
3-node cluster, with 3 mons, 3 mdses (all configured for standby-replay), 3 osdes (but node 0 down, because I'm using a kernel ceph mount on it, and that tends to deadlock when uploading lots of data).
While loading up data from node 0, I started a highly-parallel build on node 1, on the btrfs that also holds data for mon1 (but not osd1). Shortly thereafter, the ceph filesystem came to a halt shortly thereafter, and mdses started to disappear from the mds dump output, although ceph -w didn't report any changes for a few minutes.
The mdses eventually came back into standby or standby-replay, but they wouldn't be activated.
The mdsmap history for mon0, and the mon.0.log starting some 90 minutes before the pause, are attached. The problem ocurred between 15:35 and 15:41.
Files