Bug #11488
closed2 OSD segfaults after some commit
0%
Description
I have cluster with 5 nodes and 14 OSD's. After write on ceph some data, 2 nodes is crached (osd segfaults and system of a down with kernel panic, my kernel: Linux virt-node-05 3.10.0-7-pve #1 SMP Fri Mar 6 08:37:49 CET 2015 x86_64 GNU/Linux). I disable autostart ceph and start manually mon, mds and one by one osd's. After start osd.9 or osd.19, 5-10 seconds later, osd segfault and node down with kernel panic.
May be possible recover this osd's without loss data?
P. s.: ceph --version
ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e)
I already reported for similar bug ( http://tracker.ceph.com/issues/10670 )
Thanks.
Files
Updated by Andrey Matyashov about 9 years ago
- File ceph_osd_logs.tar.gz ceph_osd_logs.tar.gz added
Logs for crached osd's attached.
Updated by Andrey Matyashov about 9 years ago
This bug and bug 10670 during creating backup on my cluster (during creating snapshots)
Updated by Andrey Matyashov almost 9 years ago
I start one die OSD in with debug:
root@virt-node-06:~# /usr/bin/ceph-osd -i 9 -d --debug_osd 9 --pid-file /var/run/ceph/osd.9.pid -c /etc/ceph/ceph.conf --cluster ceph 2015-04-28 11:47:51.128102 7ffd0c805840 0 ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e), process ceph-osd, pid 62880 starting osd.9 at :/0 osd_data /var/lib/ceph/osd/ceph-9 /var/lib/ceph/osd/ceph-9/journal 2015-04-28 11:47:51.154820 7ffd0c805840 0 filestore(/var/lib/ceph/osd/ceph-9) backend xfs (magic 0x58465342) 2015-04-28 11:47:51.154829 7ffd0c805840 1 filestore(/var/lib/ceph/osd/ceph-9) disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs 2015-04-28 11:47:51.247797 7ffd0c805840 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_features: FIEMAP ioctl is supported and appears to work 2015-04-28 11:47:51.247808 7ffd0c805840 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-04-28 11:47:51.256126 7ffd0c805840 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_features: syscall(SYS_syncfs, fd) fully supported 2015-04-28 11:47:51.256198 7ffd0c805840 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_feature: extsize is disabled by conf 2015-04-28 11:47:51.533040 7ffd0c805840 0 filestore(/var/lib/ceph/osd/ceph-9) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2015-04-28 11:47:52.129280 7ffd0c805840 1 journal _open /var/lib/ceph/osd/ceph-9/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1 *** Caught signal (Segmentation fault) ** in thread 7ffd0c80584
Updated by Loïc Dachary almost 9 years ago
- Status changed from New to Duplicate
- Regression set to No