Bug #7843
OSD fails to start
0%
Description
one of our OSD suddenly crashed, after that it no longer starts. the osd was new to the cluster so it was recovering.
we run our osd on xfs (/dev/sdb2) and a partition (/dev/sda1) for journal
i've applied the patch from bug #6101 but it didn't change a thing, i've open other bug because our filestore is ok, no crashes, no disk errors, xfs_repair finish ok. the problem occurred on 03/19
History
#1 Updated by gustavo panizzo about 10 years ago
- File ceph-osd.3.log.6.gz added
#2 Updated by Greg Farnum about 10 years ago
- Tracker changed from Bug to Support
#6101 has nothing to do with this. :)
Looks like something has gone wrong with the OSD classes or some data passed to them. You might get more attention if you move this to the mailing list, though.
#3 Updated by gustavo panizzo about 10 years ago
- File ceph-osd.3.log View added
this is the trace when it fails
ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
1: /usr/bin/ceph-osd() [0x99b742]
2: (()+0xf030) [0x7f97e1a06030]
3: (gsignal()+0x35) [0x7f97e0122475]
4: (abort()+0x180) [0x7f97e01256f0]
5: (_gnu_cxx::_verbose_terminate_handler()+0x11d) [0x7f97e097789d]
6: (()+0x63996) [0x7f97e0975996]
7: (()+0x639c3) [0x7f97e09759c3]
8: (()+0x63bee) [0x7f97e0975bee]
9: (ceph::buffer::list::iterator::copy(unsigned int, char*)+0x127) [0xa5d9e7]
10: (OSDMap::decode(ceph::buffer::list::iterator&)+0x2c) [0xa2f84c]
11: (OSDMap::decode(ceph::buffer::list&)+0x3e) [0xa3015e]
12: (OSDService::try_get_map(unsigned int)+0x38b) [0x722a2b]
13: (OSDService::get_map(unsigned int)+0x16) [0x77a206]
14: (OSD::init()+0x15b2) [0x73bd82]
15: (main()+0x238a) [0x6eaa1a]
16: (__libc_start_main()+0xfd) [0x7f97e010eead]
17: /usr/bin/ceph-osd() [0x6edd39]
attached is the complete log
i don't think is a config issue because all my other OSD are running fine, i will check the mailing list do
#4 Updated by Loïc Dachary over 9 years ago
- Tracker changed from Support to Bug
- Status changed from New to Can't reproduce
Feel free to re-open if you have a HOWTO reproduce the issue. If you figured out what was wrong, it would be nice if you could add a note for the record ;-)