Bug #377
closedPermission denied and OSD crash when creating or listing snapshots
0%
Description
I'm trying to create some snapshots from my RBD devices, but this fails.
My whole cluster is running the latest unstable ( 98cf3c36cb0f040172f5a3cc0fcd3675c20f95a7 )
root@node13:~# rbd snap create --snap=alpha001 alpha terminate called after throwing an instance of 'ceph::buffer::end_of_buffer*' Aborted root@node13:~#
When i then tried again, a few seconds later:
root@node13:~# rbd snap create --snap=alpha001 alpha list_snaps failed: Operation not supported error searching for snapshot: Operation not supported root@node13:~#
I then found out that 2 OSD's had crashed while doing so:
Core was generated by `/usr/bin/cosd -i 11 -c /etc/ceph/ceph.conf'. Program terminated with signal 11, Segmentation fault. #0 0x0000000000000000 in ?? () (gdb) bt #0 0x0000000000000000 in ?? () #1 0x00000000004efe4b in operator<< (this=0x1e0c000, m=0x6a46900) at ./msg/Message.h:413 #2 OSD::_dispatch (this=0x1e0c000, m=0x6a46900) at osd/OSD.cc:1997 #3 0x00000000004f01ec in OSD::do_waiters (this=0x1e0c000) at osd/OSD.cc:1988 #4 0x00000000004f0938 in OSD::ms_dispatch (this=0x1e0c000, m=0x759d840) at osd/OSD.cc:1904 #5 0x0000000000462219 in Messenger::ms_deliver_dispatch (this=0x1e0a000) at msg/Messenger.h:97 #6 SimpleMessenger::dispatch_entry (this=0x1e0a000) at msg/SimpleMessenger.cc:342 #7 0x000000000045910c in SimpleMessenger::DispatchThread::entry (this=0x1e0a488) at msg/SimpleMessenger.h:540 #8 0x000000000046d0ca in Thread::_entry_func (arg=0x6a46900) at ./common/Thread.h:39 #9 0x00007f60b90a29ca in start_thread (arg=<value optimized out>) at pthread_create.c:300 #10 0x00007f60b805a6fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #11 0x0000000000000000 in ?? () (gdb)
Core was generated by `/usr/bin/cosd -i 9 -c /etc/ceph/ceph.conf'. Program terminated with signal 11, Segmentation fault. #0 ptr (this=0x3ca97a8, other=...) at ./include/buffer.h:344 344 ./include/buffer.h: No such file or directory. in ./include/buffer.h (gdb) bt #0 ptr (this=0x3ca97a8, other=...) at ./include/buffer.h:344 #1 __gnu_cxx::new_allocator<ceph::buffer::ptr>::construct (this=0x3ca97a8, other=...) at /usr/include/c++/4.4/ext/new_allocator.h:105 #2 std::list<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::_M_create_node (this=0x3ca97a8, other=...) at /usr/include/c++/4.4/bits/stl_list.h:464 #3 std::list<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::_M_insert (this=0x3ca97a8, other=...) at /usr/include/c++/4.4/bits/stl_list.h:1407 #4 std::list<ceph::buffer::ptr, std::allocator<ceph::buffer::ptr> >::push_back (this=0x3ca97a8, other=...) at /usr/include/c++/4.4/bits/stl_list.h:920 #5 _M_initialize_dispatch<std::_List_const_iterator<ceph::buffer::ptr> > (this=0x3ca97a8, other=...) at /usr/include/c++/4.4/bits/stl_list.h:1361 #6 list (this=0x3ca97a8, other=...) at /usr/include/c++/4.4/bits/stl_list.h:533 #7 list (this=0x3ca97a8, other=...) at ./include/buffer.h:712 #8 0x00000000004b324f in OSDOp (__first=<value optimized out>, __last=..., __result=0x3ca9780) at osd/osd_types.h:1375 #9 uninitialized_copy<__gnu_cxx::__normal_iterator<OSDOp const*, std::vector<OSDOp, std::allocator<OSDOp> > >, OSDOp*> (__first=<value optimized out>, __last=..., __result=0x3ca9780) at /usr/include/c++/4.4/bits/stl_uninitialized.h:74 #10 uninitialized_copy<__gnu_cxx::__normal_iterator<OSDOp const*, std::vector<OSDOp, std::allocator<OSDOp> > >, OSDOp*> (__first=<value optimized out>, __last=..., __result=0x3ca9780) at /usr/include/c++/4.4/bits/stl_uninitialized.h:117 #11 std::__uninitialized_copy_a<__gnu_cxx::__normal_iterator<OSDOp const*, std::vector<OSDOp, std::allocator<OSDOp> > >, OSDOp*, OSDOp> (__first=<value optimized out>, __last=..., __result=0x3ca9780) at /usr/include/c++/4.4/bits/stl_uninitialized.h:257 #12 0x00000000004b34cf in _M_allocate_and_copy<__gnu_cxx::__normal_iterator<OSDOp const*, std::vector<OSDOp, std::allocator<OSDOp> > > > (this=0x41d8fa0, __x=<value optimized out>) at /usr/include/c++/4.4/bits/stl_vector.h:966 #13 std::vector<OSDOp, std::allocator<OSDOp> >::operator= (this=0x41d8fa0, __x=<value optimized out>) at /usr/include/c++/4.4/bits/vector.tcc:165 #14 0x00000000004c1146 in MOSDOpReply (this=0x1e33000, op=0x50fbd80, err=-2) at ./messages/MOSDOpReply.h:70 #15 OSD::reply_op_error (this=0x1e33000, op=0x50fbd80, err=-2) at osd/OSD.cc:4361 #16 0x000000000049ad13 in ReplicatedPG::do_op (this=0x23f1700, op=0x50fbd80) at osd/ReplicatedPG.cc:251 #17 0x00000000004d893c in OSD::dequeue_op (this=0x1e33000, pg=0x23f1700) at osd/OSD.cc:4736 #18 0x00000000005c1a2f in ThreadPool::worker (this=0x1e334e0) at common/WorkQueue.cc:44 #19 0x00000000004f8dbd in ThreadPool::WorkThread::entry() () #20 0x000000000046d0ca in Thread::_entry_func (arg=0x20) at ./common/Thread.h:39 #21 0x00007f68243be9ca in start_thread () from /lib/libpthread.so.0 #22 0x00007f68233776cd in clone () from /lib/libc.so.6 #23 0x0000000000000000 in ?? () (gdb)This seem to be two different bugs:
- One bug gives the "Operation Not Supported"
- The other bug crashes OSD's
Although the other two bugs are actually OSD bugs, i'll keep it to one report, we might expand this two a second "issue" in the Ceph/OSD category.
For now i've uploaded the cores, binaries (multiple crashes) and logs to logger.ceph.widodh.nl into /srv/ceph/issues/osd_crash_rbd_snap
Even listing all the available snapshots crashed the OSD:
root@node13:~# rbd snap ls alpha terminate called after throwing an instance of 'ceph::buffer::end_of_buffer*' Aborted root@node13:~#
On my cluster i have 3 RBD images:
root@node13:~# rbd ls
alpha
beta
charlie
root@node13:~#
It seems that OSD 9 crashes first, but when this one is down, OSD 11 crashes when you try a snapshot operation.
Updated by Wido den Hollander over 13 years ago
I just found out that OSD 11 actually crashes when i create a new RBD image:
root@client01:~# rbd create --size 1024 delta terminate called after throwing an instance of 'ceph::buffer::end_of_buffer*' Aborted root@client01:~#
While this worked this morning, it seems something in the cluster has changed which causes it to crash. Same goes for creating snapshots, it gave a "Operation not supported" not supported this morning, but now also causes crashes.
Updated by Sage Weil over 13 years ago
- Status changed from New to Resolved
Both bugs should be resolved by f429dc8aaac1a1e02f50381d16206d501c49656b.
Updated by Sage Weil over 13 years ago
- Project changed from 3 to Ceph
- Category deleted (
8)