Project

General

Profile

Actions

Bug #331

closed

OSD crash: OSDMap::Incremental::decode

Added by Wido den Hollander over 13 years ago. Updated over 13 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

After upgrading to the latest unstable i'm seeing a OSD crash in my whole cluster (30 OSD's).

Core was generated by `/usr/bin/cosd -i 24 -c /etc/ceph/ceph.conf'.
Program terminated with signal 6, Aborted.
#0  0x00007f4969998a75 in raise () from /lib/libc.so.6
(gdb) bt
#0  0x00007f4969998a75 in raise () from /lib/libc.so.6
#1  0x00007f496999c5c0 in abort () from /lib/libc.so.6
#2  0x00007f496a24d8e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#3  0x00007f496a24bd16 in ?? () from /usr/lib/libstdc++.so.6
#4  0x00007f496a24bd43 in std::terminate() () from /usr/lib/libstdc++.so.6
#5  0x00007f496a24be3e in __cxa_throw () from /usr/lib/libstdc++.so.6
#6  0x000000000045a957 in ceph::buffer::list::iterator::advance (this=<value optimized out>, len=4, dest=0x7f4962f3e2c4 "")
    at ./include/buffer.h:568
#7  ceph::buffer::list::iterator::copy (this=<value optimized out>, len=4, dest=0x7f4962f3e2c4 "") at ./include/buffer.h:615
#8  0x000000000051420c in OSDMap::Incremental::decode(ceph::buffer::list::iterator&) ()
#9  0x00000000004e78ca in OSD::handle_osd_map (this=0x21c40e0, m=<value optimized out>) at osd/OSD.cc:2343
#10 0x00000000004f0140 in OSD::_dispatch (this=0x21c40e0, m=0x7f49565fc100) at osd/OSD.cc:1996
#11 0x00000000004f0b69 in OSD::ms_dispatch (this=0x21c40e0, m=0x7f49565fc100) at osd/OSD.cc:1878
#12 0x0000000000462799 in Messenger::ms_deliver_dispatch (this=0x21b6a30) at msg/Messenger.h:97
#13 SimpleMessenger::dispatch_entry (this=0x21b6a30) at msg/SimpleMessenger.cc:342
#14 0x000000000045967c in SimpleMessenger::DispatchThread::entry (this=0x21b6eb8) at msg/SimpleMessenger.h:540
#15 0x000000000046d76a in Thread::_entry_func (arg=0x5619) at ./common/Thread.h:39
#16 0x00007f496a82b9ca in start_thread () from /lib/libpthread.so.0
#17 0x00007f4969a4b6cd in clone () from /lib/libc.so.6
#18 0x0000000000000000 in ?? ()
(gdb)
Core was generated by `/usr/bin/cosd -i 27 -c /etc/ceph/ceph.conf'.
Program terminated with signal 6, Aborted.
#0  0x00007ffe34a94a75 in raise () from /lib/libc.so.6
(gdb) bt
#0  0x00007ffe34a94a75 in raise () from /lib/libc.so.6
#1  0x00007ffe34a985c0 in abort () from /lib/libc.so.6
#2  0x00007ffe353498e5 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#3  0x00007ffe35347d16 in ?? () from /usr/lib/libstdc++.so.6
#4  0x00007ffe35347d43 in std::terminate() () from /usr/lib/libstdc++.so.6
#5  0x00007ffe35347e3e in __cxa_throw () from /usr/lib/libstdc++.so.6
#6  0x000000000045a957 in ceph::buffer::list::iterator::copy(unsigned int, char*) ()
#7  0x000000000051420c in OSDMap::Incremental::decode(ceph::buffer::list::iterator&) ()
#8  0x00000000004e78ca in OSD::handle_osd_map(MOSDMap*) ()
#9  0x00000000004f0140 in OSD::_dispatch(Message*) ()
#10 0x00000000004f0b69 in OSD::ms_dispatch(Message*) ()
#11 0x0000000000462799 in SimpleMessenger::dispatch_entry() ()
#12 0x000000000045967c in SimpleMessenger::DispatchThread::entry() ()
#13 0x000000000046d76a in Thread::_entry_func(void*) ()
#14 0x00007ffe359279ca in start_thread () from /lib/libpthread.so.0
#15 0x00007ffe34b476cd in clone () from /lib/libc.so.6
#16 0x0000000000000000 in ?? ()
(gdb)

I've uploaded a few coredumps, binary and logs to logger.ceph.widodh.nl in /srv/ceph/issues/cosd_iterator_advance

The backtraces seem to show bit a difference, one shows ceph::buffer::list::iterator::copy and the other ceph::buffer::list::iterator::advance

Actions #1

Updated by Wido den Hollander over 13 years ago

Fixed by 5b5c0066f1bbfdc8c03cfacffab8969e23377f90

Actions #2

Updated by Greg Farnum over 13 years ago

  • Status changed from New to Closed

Fixed by 5b5c0066f1bbfdc8c03cfacffab8969e23377f90. Only applied if you upgraded your machines with an in-place filesystem; nobody else is going to hit it since it was only up for an hour or so.

Actions

Also available in: Atom PDF