Bug #11488
closed
2 OSD segfaults after some commit
Added by Andrey Matyashov about 9 years ago.
Updated almost 9 years ago.
Description
I have cluster with 5 nodes and 14 OSD's. After write on ceph some data, 2 nodes is crached (osd segfaults and system of a down with kernel panic, my kernel: Linux virt-node-05 3.10.0-7-pve #1 SMP Fri Mar 6 08:37:49 CET 2015 x86_64 GNU/Linux). I disable autostart ceph and start manually mon, mds and one by one osd's. After start osd.9 or osd.19, 5-10 seconds later, osd segfault and node down with kernel panic.
May be possible recover this osd's without loss data?
P. s.: ceph --version
ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e)
I already reported for similar bug ( http://tracker.ceph.com/issues/10670 )
Thanks.
Files
Logs for crached osd's attached.
This bug and bug 10670 during creating backup on my cluster (during creating snapshots)
I start one die OSD in with debug:
root@virt-node-06:~# /usr/bin/ceph-osd -i 9 -d --debug_osd 9 --pid-file /var/run/ceph/osd.9.pid -c /etc/ceph/ceph.conf --cluster ceph
2015-04-28 11:47:51.128102 7ffd0c805840 0 ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e), process ceph-osd, pid 62880
starting osd.9 at :/0 osd_data /var/lib/ceph/osd/ceph-9 /var/lib/ceph/osd/ceph-9/journal
2015-04-28 11:47:51.154820 7ffd0c805840 0 filestore(/var/lib/ceph/osd/ceph-9) backend xfs (magic 0x58465342)
2015-04-28 11:47:51.154829 7ffd0c805840 1 filestore(/var/lib/ceph/osd/ceph-9) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs
2015-04-28 11:47:51.247797 7ffd0c805840 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_features: FIEMAP ioctl is supported and appears to work
2015-04-28 11:47:51.247808 7ffd0c805840 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-04-28 11:47:51.256126 7ffd0c805840 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_features: syscall(SYS_syncfs, fd) fully supported
2015-04-28 11:47:51.256198 7ffd0c805840 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_feature: extsize is disabled by conf
2015-04-28 11:47:51.533040 7ffd0c805840 0 filestore(/var/lib/ceph/osd/ceph-9) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2015-04-28 11:47:52.129280 7ffd0c805840 1 journal _open /var/lib/ceph/osd/ceph-9/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
*** Caught signal (Segmentation fault) **
in thread 7ffd0c80584
- Status changed from New to Duplicate
- Regression set to No
Also available in: Atom
PDF