Project

General

Profile

Actions

Bug #377

closed

Permission denied and OSD crash when creating or listing snapshots

Added by Wido den Hollander over 13 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm trying to create some snapshots from my RBD devices, but this fails.

My whole cluster is running the latest unstable ( 98cf3c36cb0f040172f5a3cc0fcd3675c20f95a7 )

root@node13:~# rbd snap create --snap=alpha001 alpha
terminate called after throwing an instance of 'ceph::buffer::end_of_buffer*'
Aborted
root@node13:~#

When i then tried again, a few seconds later:

root@node13:~# rbd snap create --snap=alpha001 alpha
list_snaps failed: Operation not supported
error searching for snapshot: Operation not supported
root@node13:~#

I then found out that 2 OSD's had crashed while doing so:

Core was generated by `/usr/bin/cosd -i 11 -c /etc/ceph/ceph.conf'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00000000004efe4b in operator<< (this=0x1e0c000, m=0x6a46900) at ./msg/Message.h:413
#2  OSD::_dispatch (this=0x1e0c000, m=0x6a46900) at osd/OSD.cc:1997
#3  0x00000000004f01ec in OSD::do_waiters (this=0x1e0c000) at osd/OSD.cc:1988
#4  0x00000000004f0938 in OSD::ms_dispatch (this=0x1e0c000, m=0x759d840) at osd/OSD.cc:1904
#5  0x0000000000462219 in Messenger::ms_deliver_dispatch (this=0x1e0a000) at msg/Messenger.h:97
#6  SimpleMessenger::dispatch_entry (this=0x1e0a000) at msg/SimpleMessenger.cc:342
#7  0x000000000045910c in SimpleMessenger::DispatchThread::entry (this=0x1e0a488) at msg/SimpleMessenger.h:540
#8  0x000000000046d0ca in Thread::_entry_func (arg=0x6a46900) at ./common/Thread.h:39
#9  0x00007f60b90a29ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#10 0x00007f60b805a6fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#11 0x0000000000000000 in ?? ()
(gdb)
Core was generated by `/usr/bin/cosd -i 9 -c /etc/ceph/ceph.conf'.
Program terminated with signal 11, Segmentation fault.
#0  ptr (this=0x3ca97a8, other=...) at ./include/buffer.h:344
344    ./include/buffer.h: No such file or directory.
    in ./include/buffer.h
(gdb) bt
#0  ptr (this=0x3ca97a8, other=...) at ./include/buffer.h:344
#1  __gnu_cxx::new_allocator<ceph::buffer::ptr>::construct (this=0x3ca97a8, other=...) at /usr/include/c++/4.4/ext/new_allocator.h:105
#2  std::list<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::_M_create_node (this=0x3ca97a8, other=...) at /usr/include/c++/4.4/bits/stl_list.h:464
#3  std::list<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::_M_insert (this=0x3ca97a8, other=...) at /usr/include/c++/4.4/bits/stl_list.h:1407
#4  std::list<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::push_back (this=0x3ca97a8, other=...) at /usr/include/c++/4.4/bits/stl_list.h:920
#5  _M_initialize_dispatch<std::_List_const_iterator<ceph::buffer::ptr> > (this=0x3ca97a8, other=...) at /usr/include/c++/4.4/bits/stl_list.h:1361
#6  list (this=0x3ca97a8, other=...) at /usr/include/c++/4.4/bits/stl_list.h:533
#7  list (this=0x3ca97a8, other=...) at ./include/buffer.h:712
#8  0x00000000004b324f in OSDOp (__first=<value optimized out>, __last=..., __result=0x3ca9780) at osd/osd_types.h:1375
#9  uninitialized_copy<__gnu_cxx::__normal_iterator<OSDOp const*, std::vector<OSDOp, std::allocator<OSDOp> > >, OSDOp*> (__first=<value optimized out>, __last=..., __result=0x3ca9780) at /usr/include/c++/4.4/bits/stl_uninitialized.h:74
#10 uninitialized_copy<__gnu_cxx::__normal_iterator<OSDOp const*, std::vector<OSDOp, std::allocator<OSDOp> > >, OSDOp*> (__first=<value optimized out>, __last=..., __result=0x3ca9780) at /usr/include/c++/4.4/bits/stl_uninitialized.h:117
#11 std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<OSDOp const*, std::vector<OSDOp, std::allocator<OSDOp> > >, OSDOp*, OSDOp> (__first=<value optimized out>, __last=..., __result=0x3ca9780)
    at /usr/include/c++/4.4/bits/stl_uninitialized.h:257
#12 0x00000000004b34cf in _M_allocate_and_copy<__gnu_cxx::__normal_iterator<OSDOp const*, std::vector<OSDOp, std::allocator<OSDOp> > > > (this=0x41d8fa0, __x=<value optimized out>) at /usr/include/c++/4.4/bits/stl_vector.h:966
#13 std::vector<OSDOp, std::allocator<OSDOp> >::operator= (this=0x41d8fa0, __x=<value optimized out>) at /usr/include/c++/4.4/bits/vector.tcc:165
#14 0x00000000004c1146 in MOSDOpReply (this=0x1e33000, op=0x50fbd80, err=-2) at ./messages/MOSDOpReply.h:70
#15 OSD::reply_op_error (this=0x1e33000, op=0x50fbd80, err=-2) at osd/OSD.cc:4361
#16 0x000000000049ad13 in ReplicatedPG::do_op (this=0x23f1700, op=0x50fbd80) at osd/ReplicatedPG.cc:251
#17 0x00000000004d893c in OSD::dequeue_op (this=0x1e33000, pg=0x23f1700) at osd/OSD.cc:4736
#18 0x00000000005c1a2f in ThreadPool::worker (this=0x1e334e0) at common/WorkQueue.cc:44
#19 0x00000000004f8dbd in ThreadPool::WorkThread::entry() ()
#20 0x000000000046d0ca in Thread::_entry_func (arg=0x20) at ./common/Thread.h:39
#21 0x00007f68243be9ca in start_thread () from /lib/libpthread.so.0
#22 0x00007f68233776cd in clone () from /lib/libc.so.6
#23 0x0000000000000000 in ?? ()
(gdb) 
This seem to be two different bugs:
  • One bug gives the "Operation Not Supported"
  • The other bug crashes OSD's

Although the other two bugs are actually OSD bugs, i'll keep it to one report, we might expand this two a second "issue" in the Ceph/OSD category.

For now i've uploaded the cores, binaries (multiple crashes) and logs to logger.ceph.widodh.nl into /srv/ceph/issues/osd_crash_rbd_snap

Even listing all the available snapshots crashed the OSD:

root@node13:~# rbd snap ls alpha
terminate called after throwing an instance of 'ceph::buffer::end_of_buffer*'
Aborted
root@node13:~#

On my cluster i have 3 RBD images:
root@node13:~# rbd ls
alpha
beta
charlie
root@node13:~#

It seems that OSD 9 crashes first, but when this one is down, OSD 11 crashes when you try a snapshot operation.

Actions #1

Updated by Wido den Hollander over 13 years ago

I just found out that OSD 11 actually crashes when i create a new RBD image:

root@client01:~# rbd create --size 1024 delta
terminate called after throwing an instance of 'ceph::buffer::end_of_buffer*'
Aborted
root@client01:~#

While this worked this morning, it seems something in the cluster has changed which causes it to crash. Same goes for creating snapshots, it gave a "Operation not supported" not supported this morning, but now also causes crashes.

Actions #2

Updated by Sage Weil over 13 years ago

  • Status changed from New to Resolved

Both bugs should be resolved by f429dc8aaac1a1e02f50381d16206d501c49656b.

Actions #3

Updated by Sage Weil over 13 years ago

  • Project changed from 3 to Ceph
  • Category deleted (8)
Actions

Also available in: Atom PDF