Bug #5920
closedmon daemon crashes
0%
Description
Hello!
I'd like to report about the problem. May be it will help you to improve Ceph.
I've created Ceph cluster on a test farm to evaluate its ability to recover from the serious failures.
3 identical servers were used in my cluster. Each server is equipped by 2 SATA hard drives. Each hard drive has 3 partitions.
/dev/sda1 and /dev/sdb1 partitions are used in RAID 1 array for swap area. /dev/sda2 and /dev/sdb2 partitions are used in RAID1 array for the root filesystem.
I was going to simulate the failure of the hard drive. I pulled out hard drive /dev/sdb on node sn2. Nothing serious was happened. As soon as the cluster returned back to the working condition, I repeated the test by pulling out hard drive /dev/sda on node sn2. The root filesystem was immediately remounted in read only mode and the system became poorly responsive and unstable. After further experiments, I was unable to make the OS working if the primary hard drive (/dev/sda) is lost despite of the fact that RAID 1 was used. Ceph cluster was unable to sync with the node sn2 and I decided to reformat partitions /dev/sda3 and /dev/sdb3 on node sn2. My idea was to clean the diskspace and re-sync the node as it was a new node. Of course, I was wrong. Then I followed by this tutorial [[http://ceph.com/w/index.php?title=Replacing_a_failed_disk/OSD&redirect=no]]. I didn't follow by it exactly as I used xfs filesystem instead of brtfs. Finally, osd daemons were able to run and have been successfully synchronized with the other nodes, sn1 and sn3.
The only problem is that mon daemon crashes during the start.
Please see the attached files.
Best regards,
Dmitry
Files
Updated by Dmitry Panov over 10 years ago
OS - Ubuntu 12.04.2 x64.
Ceph packages were download from Ceph repository.
Updated by Sage Weil over 10 years ago
- Status changed from New to Need More Info
- Source changed from other to Community (user)
Hi Dmitry,
Can you create a tar.gz of your mon data directory (/var/lib/ceph/mon/*) and post it somewhere? (Or contact us on irc to get login info for a secure place to upload it). Thanks!
Updated by Dmitry Panov over 10 years ago
Hi Sage,
Sure. You can download it here: [[https://dl.dropboxusercontent.com/u/22489421/mon.tgz]]
Updated by Sage Weil over 10 years ago
- Status changed from Need More Info to Duplicate
Hi Dmitry,
This looks like fallout from a rare paxos bug (#5750), fixed in 17aa2d6d16c77028bae1d2a77903cdfd81efa096 (post v0.61.7). There is a pgmap entry that is simply missing, which is the same thing we saw from 5750. I'm surprised a real user was able to hit this (lucky you!). If the other mons are working, nuking the failed one and re-adding it should work. Otherwise, just repeat this process with the latest cuttlefish branch and you should not be able to hit this (chances are you won't be able to hit it with 0.61.7 without a lot effort either).
Thanks for the debug info and report!
Updated by Dmitry Panov over 10 years ago
Hi Dmitry,
Hi Sage, please see my replies embedded.
This looks like fallout from a rare paxos bug (#5750), fixed in 17aa2d6d16c77028bae1d2a77903cdfd81efa096 (post >v0.61.7).
root@sn2:~# dpkg -p ceph
Package: ceph
Priority: optional
Section: admin
Installed-Size: 25777
Maintainer: Laszlo Boszormenyi (GCS) <gcs@debian.hu>
Architecture: amd64
Version: 0.61.7-1precise
Depends: binutils, ceph-common, cryptsetup-bin | cryptsetup, gdisk, parted, python, python-argparse, python-lockfile, sdparm | hdparm, uuid-runtime, xfsprogs, libaio1 (>= 0.3.93), libboost-thread1.46.1 (>= 1.46.1-1), libc6 (>= 2.15), libgcc1 (>= 1:4.1.1), libgoogle-perftools0, libnspr4 (>= 1.8.0.10), libnss3 (>= 3.12.0~1.9b1), libsnappy1, libstdc++6 (>= 4.6), libuuid1 (>= 2.16)
Recommends: btrfs-tools, ceph-mds, librados2, librbd1
Size: 7792542
Description: distributed storage and file system
Ceph is a distributed storage system designed to provide excellent
performance, reliability, and scalability.
.
This package contains all server daemons and management tools for creating,
running, and administering a Ceph storage cluster, with the exception of the
metadata server, which is necessary for using the distributed file system and is
provided by the ceph-mds package.
Homepage: http://ceph.com/
There is a pgmap entry that is simply missing, which is the same thing we saw from 5750. I'm surprised a real > user was able to hit this (lucky you!).
You're welcome! :))
If the other mons are working, nuking the failed one and re-adding it should work.
I've already guessed it.
Otherwise, just repeat this process with the latest cuttlefish branch
It's already the latest version.
and you should not be able to hit this (chances are you won't be able to hit it with 0.61.7 without a lot effort either).
Thanks for the debug info and report!
Updated by Dmitry Panov over 10 years ago
- File ceph.conf ceph.conf added
- File ceph.log ceph.log added
- File ceph-mon.b.log ceph-mon.b.log added
- File ceph-osd.2.log ceph-osd.2.log added
- File ceph-osd.3.log ceph-osd.3.log added
- File ceph-status.txt ceph-status.txt added
- File core.bz2 core.bz2 added
Hi Sage!
I've cleaned up everything and have re-created the cluster. This time I replaced node sn2 by node sn4.
Fortunately for you, the hard drive was failed exactly during the create process. :))
It caused the crash of daemon osd.2. Ah, the crash again... :))
A 300 MB dump was created. The current status of the cluster is HEALTH_OK.
Please see the attachments.
Updated by Sage Weil over 10 years ago
Dmitry Panov wrote:
Hi Sage!
I've cleaned up everything and have re-created the cluster. This time I replaced node sn2 by node sn4.
Fortunately for you, the hard drive was failed exactly during the create process. :))
It caused the crash of daemon osd.2. Ah, the crash again... :))
A 300 MB dump was created. The current status of the cluster is HEALTH_OK.
Please see the attachments.
the osd.2 log contains:
-15> 2013-08-14 11:09:28.555683 7f31d9ed6700 -1 journal FileJournal::do_write: pwrite(fd=27, hbp.length=4096) failed :(5) Input/output error
that is, we got EIO back from the file system. this is a bad disk or corrupt fs.. check dmesg output on that node