Actions
Bug #57152
closedsegfault in librados via libcephsqlite
% Done:
0%
Source:
Community (user)
Tags:
backport_processed
Backport:
quincy,pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
librados
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
We have a post on the ML about a segfault in the mgr:
"[ceph-users] Quincy: Corrupted devicehealth sqlite3 database from MGR crashing bug External"
The user kindly shared the core dump which is available here:
smithi098 in /tmp/sqlite-seg
root@smithi098:/tmp/sqlite-seg# gdb -q /usr/bin/sqlite3 ... (gdb) core CoreDump [New LWP 481006] [New LWP 481010] [New LWP 480968] [New LWP 481002] [New LWP 481013] [New LWP 481003] [New LWP 481004] [New LWP 481011] [New LWP 481014] [New LWP 481005] [New LWP 481015] [New LWP 481012] [New LWP 481045] [New LWP 481016] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `sqlite3'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007fc1bacfac5b in ceph::buffer::v15_2_0::list::buffers_t::clear_and_dispose (this=0x55d3974d3ed0) at ./src/include/buffer.h:599 599 ./src/include/buffer.h: No such file or directory. [Current thread is 1 (Thread 0x7fc1b6d7c700 (LWP 481006))] (gdb) bt #0 0x00007fc1bacfac5b in ceph::buffer::v15_2_0::list::buffers_t::clear_and_dispose (this=0x55d3974d3ed0) at ./src/include/buffer.h:599 #1 ceph::buffer::v15_2_0::list::buffers_t::operator= (other=..., this=0x55d3974d3ed0) at ./src/include/buffer.h:505 #2 ceph::buffer::v15_2_0::list::operator= (this=0x55d3974d3ed0, other=...) at ./src/include/buffer.h:970 #3 0x00007fc1bad35606 in Message::claim_data (bl=..., this=0x7fc1a001b680) at ./src/msg/Message.h:431 #4 Objecter::handle_osd_op_reply (this=0x55d397435b90, m=0x7fc1a001b680) at ./src/osdc/Objecter.cc:3509 #5 0x00007fc1bad37064 in Objecter::ms_dispatch (this=0x55d397435b90, m=0x7fc1a001b680) at ./src/osdc/Objecter.cc:984 #6 0x00007fc1bad3add6 in non-virtual thunk to Objecter::ms_fast_dispatch(Message*) () at ./src/osdc/Objecter.h:2659 #7 0x00007fc1ba588b00 in Messenger::ms_fast_dispatch (m=..., this=<optimized out>) at ./src/msg/DispatchQueue.cc:75 #8 DispatchQueue::fast_dispatch (this=0x55d3974356a0, m=...) at ./src/msg/DispatchQueue.cc:74 #9 0x00007fc1ba68b794 in DispatchQueue::fast_dispatch (m=0x7fc1a001b680, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/smart_ptr/intrusive_ptr.hpp:67 #10 ProtocolV2::handle_message (this=<optimized out>) at ./src/msg/async/ProtocolV2.cc:1496 #11 0x00007fc1ba6a17f0 in ProtocolV2::handle_read_frame_dispatch (this=0x55d3974ee540) at ./src/msg/async/ProtocolV2.cc:1157 #12 0x00007fc1ba6a1ae5 in ProtocolV2::_handle_read_frame_epilogue_main (this=0x55d3974ee540) at ./src/msg/async/ProtocolV2.cc:1347 #13 0x00007fc1ba6a2e14 in ProtocolV2::handle_read_frame_epilogue_main (this=0x55d3974ee540, buffer=..., r=0) at ./src/msg/async/ProtocolV2.cc:1324 #14 0x00007fc1ba686b79 in ProtocolV2::run_continuation (this=0x55d3974ee540, continuation=...) at ./src/msg/async/ProtocolV2.cc:49 #15 0x00007fc1ba649504 in std::function<void (char*, long)>::operator()(char*, long) const (__args#1=<optimized out>, __args#0=<optimized out>, this=0x55d3973b9c08) at /usr/include/c++/9/bits/std_function.h:683 #16 AsyncConnection::process (this=0x55d3973b9870) at ./src/msg/async/AsyncConnection.cc:454 #17 0x00007fc1ba6ac185 in EventCenter::process_events (this=this@entry=0x55d3974392a0, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7fc1b6d7be28) at /usr/include/c++/9/bits/basic_ios.h:282 #18 0x00007fc1ba6b464b in NetworkStack::<lambda()>::operator() (__closure=<optimized out>, __closure=<optimized out>) at ./src/msg/async/Stack.cc:50 #19 std::_Function_handler<void(), NetworkStack::add_thread(Worker*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/9/bits/std_function.h:300 #20 0x00007fc1b9f6bde4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 #21 0x00007fc1bb404609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #22 0x00007fc1bb323133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) thread 3 [Switching to thread 3 (Thread 0x7fc1bb1cf740 (LWP 480968))] #0 futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x55d397944770) at ../sysdeps/nptl/futex-internal.h:183 183 ../sysdeps/nptl/futex-internal.h: No such file or directory. (gdb) bt #0 futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x55d397944770) at ../sysdeps/nptl/futex-internal.h:183 #1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55d397944720, cond=0x55d397944748) at pthread_cond_wait.c:508 #2 __pthread_cond_wait (cond=0x55d397944748, mutex=0x55d397944720) at pthread_cond_wait.c:647 #3 0x00007fc1b9f65e30 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib/x86_64-linux-gnu/libstdc++.so.6 #4 0x00007fc1bac9ee5b in std::condition_variable::wait<librados::AioCompletionImpl::wait_for_complete()::{lambda()#1}>(std::unique_lock<std::mutex>&, librados::AioCompletionImpl::wait_for_complete()::{lambda()#1}) (__p=..., __lock=..., this=0x55d397944748) at ./src/librados/AioCompletionImpl.h:63 #5 librados::AioCompletionImpl::wait_for_complete (this=0x55d397944720) at ./src/librados/AioCompletionImpl.h:63 #6 librados::v14_2_0::AioCompletion::wait_for_complete (this=<optimized out>) at ./src/librados/librados_cxx.cc:1050 #7 0x00007fc1bae649c0 in SimpleRADOSStriper::read (this=0x55d3974e6c70, data=data@entry=0x55d3974eee38, len=len@entry=65536, off=off@entry=4129788) at /usr/include/c++/9/bits/unique_ptr.h:360 #8 0x00007fc1bae2d428 in Read (file=0x55d3974dd260, buf=0x55d3974eee38, len=65536, off=4129788) at /usr/include/c++/9/bits/unique_ptr.h:360 #9 0x000055d396b6eccb in sqlite3OsRead (offset=<optimized out>, amt=<optimized out>, pBuf=0x55d3974eee38, id=0x55d3974dd260) at sqlite3.c:53758 #10 pager_playback_one_page (pPager=pPager@entry=0x55d3974dcfd8, pOffset=pOffset@entry=0x55d3974dd038, pDone=pDone@entry=0x0, isMainJrnl=isMainJrnl@entry=1, isSavepnt=isSavepnt@entry=0) at sqlite3.c:53758 #11 0x000055d396b6f7ac in pager_playback (pPager=pPager@entry=0x55d3974dcfd8, isHot=1) at sqlite3.c:54349 #12 0x000055d396b70c2d in sqlite3PagerSharedLock (pPager=0x55d3974dcfd8) at sqlite3.c:56768 #13 0x000055d396b713f8 in lockBtree (pBt=0x55d397343f98) at sqlite3.c:67193 #14 sqlite3BtreeBeginTrans (p=0x55d3974dd698, wrflag=wrflag@entry=0, pSchemaVersion=pSchemaVersion@entry=0x0) at sqlite3.c:2032 #15 0x000055d396ba6c21 in sqlite3InitOne (db=0x55d3973b7eb8, iDb=iDb@entry=0, pzErrMsg=pzErrMsg@entry=0x7ffecd8e4078, mFlags=mFlags@entry=0) at sqlite3.c:127249 #16 0x000055d396ba6dec in sqlite3Init (db=db@entry=0x55d3973b7eb8, pzErrMsg=pzErrMsg@entry=0x7ffecd8e4078) at sqlite3.c:127434 #17 0x000055d396ba6e2f in sqlite3ReadSchema (pParse=pParse@entry=0x7ffecd8e4070) at sqlite3.c:127460 #18 0x000055d396bb489d in sqlite3Pragma (pParse=pParse@entry=0x7ffecd8e4070, pId1=pId1@entry=0x7ffecd8e36b8, pId2=pId2@entry=0x7ffecd8e36d0, pValue=pValue@entry=0x0, minusFlag=minusFlag@entry=0) at sqlite3.c:124887 #19 0x000055d396bb6e79 in yy_reduce (yyLookahead=<optimized out>, pParse=0x7ffecd8e4070, yyLookaheadToken=..., yyruleno=239, yypParser=0x7ffecd8e3670) at sqlite3.c:156709 #20 sqlite3Parser (yyminor=..., yymajor=<optimized out>, yyp=0x7ffecd8e3670) at sqlite3.c:26203 #21 sqlite3RunParser (pParse=0x7ffecd8e4070, zSql=0x55d396bf007e "", pzErrMsg=0x7ffecd8e4068) at sqlite3.c:27477 #22 0x000055d396bbbd33 in sqlite3Prepare (db=db@entry=0x55d3973b7eb8, zSql=zSql@entry=0x55d396bf006a "PRAGMA database_list", nBytes=nBytes@entry=-1, prepFlags=prepFlags@entry=128, pReprepare=pReprepare@entry=0x0, ppStmt=ppStmt@entry=0x7ffecd8e4360, pzTail=0x0) at sqlite3.c:127661 #23 0x000055d396bbc0c7 in sqlite3LockAndPrepare (db=0x55d3973b7eb8, zSql=0x55d396bf006a "PRAGMA database_list", nBytes=-1, prepFlags=prepFlags@entry=128, pOld=pOld@entry=0x0, ppStmt=0x7ffecd8e4360, pzTail=0x0) at sqlite3.c:127733 #24 0x000055d396bbc1da in sqlite3_prepare_v2 (db=<optimized out>, zSql=<optimized out>, nBytes=<optimized out>, ppStmt=<optimized out>, pzTail=<optimized out>) at sqlite3.c:127817 #25 0x000055d396b059e4 in do_meta_command (zLine=<optimized out>, p=<optimized out>) at shell.c:17672 #26 0x000055d396b09ff1 in process_input (p=0x7ffecd8e5c60) at shell.c:18365 #27 0x000055d396ae9c27 in main (argc=<optimized out>, argv=<optimized out>) at shell.c:19156 (gdb) frame 7 #7 0x00007fc1bae649c0 in SimpleRADOSStriper::read (this=0x55d3974e6c70, data=data@entry=0x55d3974eee38, len=len@entry=65536, off=off@entry=4129788) at /usr/include/c++/9/bits/unique_ptr.h:360 360 get() const noexcept (gdb) print reads $1 = std::vector of length 2, capacity 2 = {{first = {_buffers = {_root = {next = 0x7fc1a00068b0}, _tail = 0x7fc1a00068b0}, _carriage = 0x7fc1bae0d180 <ceph::buffer::v15_2_0::list::always_empty_bptr>, _len = 0, _num = 0, static always_empty_bptr = {<ceph::buffer::v15_2_0::ptr_hook> = { next = 0x0}, <ceph::buffer::v15_2_0::ptr> = {_raw = 0x0, _off = 0, _len = 0}, <No data fields>}}, second = std::unique_ptr<librados::v14_2_0::AioCompletion> = { get() = 0x7fc1a0000be0}}, {first = {_buffers = {_root = {next = 0x7fc1a00068d8}, _tail = 0x7fc1a00068d8}, _carriage = 0x7fc1bae0d180 <ceph::buffer::v15_2_0::list::always_empty_bptr>, _len = 0, _num = 0, static always_empty_bptr = {<ceph::buffer::v15_2_0::ptr_hook> = {next = 0x0}, <ceph::buffer::v15_2_0::ptr> = {_raw = 0x0, _off = 0, _len = 0}, <No data fields>}}, second = std::unique_ptr<librados::v14_2_0::AioCompletion> = {get() = 0x55d3978dc290}}} (gdb) thread 1 [Switching to thread 1 (Thread 0x7fc1b6d7c700 (LWP 481006))] #0 0x00007fc1bacfac5b in ceph::buffer::v15_2_0::list::buffers_t::clear_and_dispose (this=0x55d3974d3ed0) at ./src/include/buffer.h:599 599 ./src/include/buffer.h: No such file or directory. (gdb) bt #0 0x00007fc1bacfac5b in ceph::buffer::v15_2_0::list::buffers_t::clear_and_dispose (this=0x55d3974d3ed0) at ./src/include/buffer.h:599 #1 ceph::buffer::v15_2_0::list::buffers_t::operator= (other=..., this=0x55d3974d3ed0) at ./src/include/buffer.h:505 #2 ceph::buffer::v15_2_0::list::operator= (this=0x55d3974d3ed0, other=...) at ./src/include/buffer.h:970 #3 0x00007fc1bad35606 in Message::claim_data (bl=..., this=0x7fc1a001b680) at ./src/msg/Message.h:431 #4 Objecter::handle_osd_op_reply (this=0x55d397435b90, m=0x7fc1a001b680) at ./src/osdc/Objecter.cc:3509 #5 0x00007fc1bad37064 in Objecter::ms_dispatch (this=0x55d397435b90, m=0x7fc1a001b680) at ./src/osdc/Objecter.cc:984 #6 0x00007fc1bad3add6 in non-virtual thunk to Objecter::ms_fast_dispatch(Message*) () at ./src/osdc/Objecter.h:2659 #7 0x00007fc1ba588b00 in Messenger::ms_fast_dispatch (m=..., this=<optimized out>) at ./src/msg/DispatchQueue.cc:75 #8 DispatchQueue::fast_dispatch (this=0x55d3974356a0, m=...) at ./src/msg/DispatchQueue.cc:74 #9 0x00007fc1ba68b794 in DispatchQueue::fast_dispatch (m=0x7fc1a001b680, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/smart_ptr/intrusive_ptr.hpp:67 #10 ProtocolV2::handle_message (this=<optimized out>) at ./src/msg/async/ProtocolV2.cc:1496 #11 0x00007fc1ba6a17f0 in ProtocolV2::handle_read_frame_dispatch (this=0x55d3974ee540) at ./src/msg/async/ProtocolV2.cc:1157 #12 0x00007fc1ba6a1ae5 in ProtocolV2::_handle_read_frame_epilogue_main (this=0x55d3974ee540) at ./src/msg/async/ProtocolV2.cc:1347 #13 0x00007fc1ba6a2e14 in ProtocolV2::handle_read_frame_epilogue_main (this=0x55d3974ee540, buffer=..., r=0) at ./src/msg/async/ProtocolV2.cc:1324 #14 0x00007fc1ba686b79 in ProtocolV2::run_continuation (this=0x55d3974ee540, continuation=...) at ./src/msg/async/ProtocolV2.cc:49 #15 0x00007fc1ba649504 in std::function<void (char*, long)>::operator()(char*, long) const (__args#1=<optimized out>, __args#0=<optimized out>, this=0x55d3973b9c08) at /usr/include/c++/9/bits/std_function.h:683 #16 AsyncConnection::process (this=0x55d3973b9870) at ./src/msg/async/AsyncConnection.cc:454 #17 0x00007fc1ba6ac185 in EventCenter::process_events (this=this@entry=0x55d3974392a0, timeout_microseconds=<optimized out>, timeout_microseconds@entry=30000000, working_dur=working_dur@entry=0x7fc1b6d7be28) at /usr/include/c++/9/bits/basic_ios.h:282 #18 0x00007fc1ba6b464b in NetworkStack::<lambda()>::operator() (__closure=<optimized out>, __closure=<optimized out>) at ./src/msg/async/Stack.cc:50 #19 std::_Function_handler<void(), NetworkStack::add_thread(Worker*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/9/bits/std_function.h:300 #20 0x00007fc1b9f6bde4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 #21 0x00007fc1bb404609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #22 0x00007fc1bb323133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) print *this $2 = {_root = {next = 0x0}, _tail = 0x55d39730c010}
relevant cephsqlite code is here:
https://github.com/ceph/ceph/blob/v17.2.3/src/SimpleRADOSStriper.cc#L491-L501
I'm not seeing anything obviously wrong in terms of an API violation. It looks like a bug in librados?
Actions