Project

General

Profile

Actions

Bug #11488

closed

2 OSD segfaults after some commit

Added by Andrey Matyashov almost 9 years ago. Updated almost 9 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
other
Tags:
Backport:
Regression:
No
Severity:
1 - critical
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I have cluster with 5 nodes and 14 OSD's. After write on ceph some data, 2 nodes is crached (osd segfaults and system of a down with kernel panic, my kernel: Linux virt-node-05 3.10.0-7-pve #1 SMP Fri Mar 6 08:37:49 CET 2015 x86_64 GNU/Linux). I disable autostart ceph and start manually mon, mds and one by one osd's. After start osd.9 or osd.19, 5-10 seconds later, osd segfault and node down with kernel panic.

May be possible recover this osd's without loss data?

P. s.: ceph --version
ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e)

I already reported for similar bug ( http://tracker.ceph.com/issues/10670 )

Thanks.


Files

ceph_osd_logs.tar.gz (38.1 KB) ceph_osd_logs.tar.gz Andrey Matyashov, 04/28/2015 07:40 AM

Related issues 1 (0 open1 closed)

Is duplicate of Ceph - Bug #10670: osd segfaultRejected01/28/2015

Actions
Actions #1

Updated by Andrey Matyashov almost 9 years ago

Logs for crached osd's attached.

Actions #2

Updated by Andrey Matyashov almost 9 years ago

This bug and bug 10670 during creating backup on my cluster (during creating snapshots)

Actions #3

Updated by Andrey Matyashov almost 9 years ago

I start one die OSD in with debug:

root@virt-node-06:~# /usr/bin/ceph-osd -i 9 -d --debug_osd 9 --pid-file /var/run/ceph/osd.9.pid -c /etc/ceph/ceph.conf --cluster ceph
2015-04-28 11:47:51.128102 7ffd0c805840  0 ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e), process ceph-osd, pid 62880
starting osd.9 at :/0 osd_data /var/lib/ceph/osd/ceph-9 /var/lib/ceph/osd/ceph-9/journal
2015-04-28 11:47:51.154820 7ffd0c805840  0 filestore(/var/lib/ceph/osd/ceph-9) backend xfs (magic 0x58465342)
2015-04-28 11:47:51.154829 7ffd0c805840  1 filestore(/var/lib/ceph/osd/ceph-9)  disabling 'filestore replica fadvise' due to known issues with fadvise(DONTNEED) on xfs
2015-04-28 11:47:51.247797 7ffd0c805840  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_features: FIEMAP ioctl is supported and appears to work
2015-04-28 11:47:51.247808 7ffd0c805840  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-04-28 11:47:51.256126 7ffd0c805840  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_features: syscall(SYS_syncfs, fd) fully supported
2015-04-28 11:47:51.256198 7ffd0c805840  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-9) detect_feature: extsize is disabled by conf
2015-04-28 11:47:51.533040 7ffd0c805840  0 filestore(/var/lib/ceph/osd/ceph-9) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2015-04-28 11:47:52.129280 7ffd0c805840  1 journal _open /var/lib/ceph/osd/ceph-9/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
*** Caught signal (Segmentation fault) **
 in thread 7ffd0c80584
Actions #4

Updated by Loïc Dachary almost 9 years ago

  • Status changed from New to Duplicate
  • Regression set to No
Actions

Also available in: Atom PDF