Project

General

Profile

Bug #7843

OSD fails to start

Added by gustavo panizzo about 10 years ago. Updated over 9 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

one of our OSD suddenly crashed, after that it no longer starts. the osd was new to the cluster so it was recovering.
we run our osd on xfs (/dev/sdb2) and a partition (/dev/sda1) for journal

i've applied the patch from bug #6101 but it didn't change a thing, i've open other bug because our filestore is ok, no crashes, no disk errors, xfs_repair finish ok. the problem occurred on 03/19

ceph-start-log - start error of ceph-osd (4.36 KB) gustavo panizzo, 03/25/2014 07:04 AM

ceph-osd.3.log.7.gz - before the crash (79.9 KB) gustavo panizzo, 03/25/2014 07:04 AM

ceph-client.admin.log.6.gz - during the crash (20 Bytes) gustavo panizzo, 03/25/2014 07:04 AM

ceph.conf View - ceph.conf (2.33 KB) gustavo panizzo, 03/25/2014 07:04 AM

ceph-osd.3.log.6.gz - during the crash, correct file (102 KB) gustavo panizzo, 03/25/2014 07:05 AM

ceph-osd.3.log View (42.9 KB) gustavo panizzo, 03/25/2014 02:31 PM

History

#2 Updated by Greg Farnum about 10 years ago

  • Tracker changed from Bug to Support

#6101 has nothing to do with this. :)
Looks like something has gone wrong with the OSD classes or some data passed to them. You might get more attention if you move this to the mailing list, though.

#3 Updated by gustavo panizzo about 10 years ago

this is the trace when it fails

ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
1: /usr/bin/ceph-osd() [0x99b742]
2: (()+0xf030) [0x7f97e1a06030]
3: (gsignal()+0x35) [0x7f97e0122475]
4: (abort()+0x180) [0x7f97e01256f0]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7f97e097789d]
6: (()+0x63996) [0x7f97e0975996]
7: (()+0x639c3) [0x7f97e09759c3]
8: (()+0x63bee) [0x7f97e0975bee]
9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x127) [0xa5d9e7]
10: (OSDMap::decode(ceph::buffer::list::iterator&)+0x2c) [0xa2f84c]
11: (OSDMap::decode(ceph::buffer::list&)+0x3e) [0xa3015e]
12: (OSDService::try_get_map(unsigned int)+0x38b) [0x722a2b]
13: (OSDService::get_map(unsigned int)+0x16) [0x77a206]
14: (OSD::init()+0x15b2) [0x73bd82]
15: (main()+0x238a) [0x6eaa1a]
16: (__libc_start_main()+0xfd) [0x7f97e010eead]
17: /usr/bin/ceph-osd() [0x6edd39]

attached is the complete log

i don't think is a config issue because all my other OSD are running fine, i will check the mailing list do

#4 Updated by Loïc Dachary over 9 years ago

  • Tracker changed from Support to Bug
  • Status changed from New to Can't reproduce

Feel free to re-open if you have a HOWTO reproduce the issue. If you figured out what was wrong, it would be nice if you could add a note for the record ;-)

Also available in: Atom PDF