Project

General

Profile

Bug #51372

pacific: libcephsqlite: segmentation fault

Added by Patrick Donnelly almost 3 years ago. Updated over 1 year ago.

Status:
Duplicate
Priority:
Normal
Target version:
% Done:

0%

Source:
Q/A
Tags:
Backport:
pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-06-22T20:58:49.307 INFO:tasks.workunit.client.0.smithi032.stderr:+ kill -CONT -- 24739
2021-06-22T20:58:49.307 INFO:tasks.workunit.client.0.smithi032.stderr:+ sleep 10
2021-06-22T20:58:49.328 INFO:tasks.workunit.client.0.smithi032.stderr:Error: near line 3: disk I/O error
2021-06-22T20:58:59.309 INFO:tasks.workunit.client.0.smithi032.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/rados/test_libcephsqlite.sh: line 129: 24739 Segmentation fault      (core dumped) sqlite3 -cmd '.output /dev/null' -cmd '.load libcephsqlite.so' -cmd 'pragma journal_mode = PERSIST' -cmd ".open file:///$pool:$ns/baz.db?vfs=ceph" -cmd '.output stdout' <<< "$a" 

From: /teuthology/yuriw-2021-06-22_16:47:38-rados-wip-yuri5-testing-2021-06-22-0805-pacific-distro-basic-smithi/6184840/teuthology.log


Related issues

Duplicates cephsqlite - Bug #50503: tasks/libcephsqlite throws "std::out_of_range" Resolved

History

#1 Updated by Patrick Donnelly almost 3 years ago

(gdb) bt
#0  __memcmp_avx2_movbe () at ../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:247
#1  0x00007fbfcf06d0a4 in std::_Rb_tree<std::basic_string_view<char, std::char_traits<char> >, std::pair<std::basic_string_view<char, std::char_traits<char> > const, Option const&>, std::_Select1st<std::pair<std::basic_string_view<char, std::char_traits<char> > const, Option const&> >, std::less<std::basic_string_view<char, std::char_traits<char> > >, std::allocator<std::pair<std::basic_string_view<char, std::char_traits<char> > const, Option const&> > >::find(std::basic_string_view<char, std::char_traits<char> > const&) const ()
   from /usr/lib/ceph/libceph-common.so.2
#2  0x00007fbfcf09b3c1 in md_config_t::find_option(std::basic_string_view<char, std::char_traits<char> >) const () from /usr/lib/ceph/libceph-common.so.2
#3  0x00007fbfcf09edcb in md_config_t::_get_val(ConfigValues const&, std::basic_string_view<char, std::char_traits<char> >, boost::container::small_vector<std::pair<Option const*, boost::variant<boost::blank, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned long, long, double, bool, entity_addr_t, entity_addrvec_t, std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000l> >, Option::size_t, uuid_d> const*>, 4ul, void, void>*, std::ostream*) const ()
   from /usr/lib/ceph/libceph-common.so.2
#4  0x00007fbfcf09f0b2 in md_config_t::get_val_generic[abi:cxx11](ConfigValues const&, std::basic_string_view<char, std::char_traits<char> >) const () from /usr/lib/ceph/libceph-common.so.2
#5  0x00007fbfcf2354f6 in AsyncConnection::maybe_start_delay_thread() () from /usr/lib/ceph/libceph-common.so.2
#6  0x00007fbfcf26fc5f in ProtocolV2::ready() () from /usr/lib/ceph/libceph-common.so.2
#7  0x00007fbfcf278570 in ProtocolV2::handle_server_ident(ceph::buffer::v15_2_0::list&) () from /usr/lib/ceph/libceph-common.so.2
#8  0x00007fbfcf2827fb in ProtocolV2::handle_frame_payload() () from /usr/lib/ceph/libceph-common.so.2
#9  0x00007fbfcf282b20 in ProtocolV2::handle_read_frame_dispatch() () from /usr/lib/ceph/libceph-common.so.2
#10 0x00007fbfcf282c89 in ProtocolV2::_handle_read_frame_epilogue_main() () from /usr/lib/ceph/libceph-common.so.2
#11 0x00007fbfcf283041 in ProtocolV2::_handle_read_frame_segment() () from /usr/lib/ceph/libceph-common.so.2
#12 0x00007fbfcf2841fc in ProtocolV2::handle_read_frame_segment(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int) ()
   from /usr/lib/ceph/libceph-common.so.2
#13 0x00007fbfcf26c814 in ProtocolV2::run_continuation(Ct<ProtocolV2>&) () from /usr/lib/ceph/libceph-common.so.2
#14 0x00007fbfcf237918 in AsyncConnection::process() () from /usr/lib/ceph/libceph-common.so.2
#15 0x00007fbfcf28d50d in EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*) () from /usr/lib/ceph/libceph-common.so.2
#16 0x00007fbfcf295310 in ?? () from /usr/lib/ceph/libceph-common.so.2
#17 0x00007fbfce85f6df in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#18 0x00007fbfd92626db in start_thread (arg=0x7fbfc98e7700) at pthread_create.c:463
#19 0x00007fbfd8d8771f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

#2 Updated by Patrick Donnelly almost 3 years ago

The problem here seems to be that the cct is invalid. The state of the program here is that sqlite3 is exiting:

(gdb) thread 11
[Switching to thread 11 (Thread 0x7fbfd9e90740 (LWP 24739))]
#0  0x00007fbfccd58bb1 in ?? () from /usr/lib/x86_64-linux-gnu/liblttng-ust-tracepoint.so.0
(gdb) bt
#0  0x00007fbfccd58bb1 in ?? () from /usr/lib/x86_64-linux-gnu/liblttng-ust-tracepoint.so.0
#1  0x00007fbfccd59e67 in tracepoint_unregister_lib () from /usr/lib/x86_64-linux-gnu/liblttng-ust-tracepoint.so.0
#2  0x00007fbfd7c23710 in ?? () from /usr/lib/librados.so.2
#3  0x00007fbfd9c8ed13 in _dl_fini () at dl-fini.c:138
#4  0x00007fbfd8ca9161 in __run_exit_handlers (status=1, listp=0x7fbfd9051718 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#5  0x00007fbfd8ca925a in __GI_exit (status=<optimized out>) at exit.c:139
#6  0x00007fbfd8c87bfe in __libc_start_main (main=0x5608ef051590, argc=11, argv=0x7ffeb3a4d078, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7ffeb3a4d068) at ../csu/libc-start.c:344
#7  0x00005608ef052a5a in ?? ()

So libcephsqlite structures are already destructed including its handle to Rados:

https://github.com/ceph/ceph/blob/26df5df247a002290a8e4da463adb417becedda4/src/libcephsqlite.cc#L135-L138

Apparently the handle to cct was also invalidated:

https://github.com/ceph/ceph/blob/26df5df247a002290a8e4da463adb417becedda4/src/librados/RadosClient.h#L52-L53

which caused the segfault. Looks like RadosClient was destructed but not the messenger threads.

I think the best way to resolve this is for RadosClient to tear down its messenger in its destructor?

#3 Updated by Patrick Donnelly almost 3 years ago

  • Related to Bug #50503: tasks/libcephsqlite throws "std::out_of_range" added

#4 Updated by Patrick Donnelly almost 3 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 50503

#6 Updated by Kefu Chai over 2 years ago

  • Status changed from Fix Under Review to Duplicate
  • Pull request ID deleted (50503)

#7 Updated by Kefu Chai over 2 years ago

  • Related to deleted (Bug #50503: tasks/libcephsqlite throws "std::out_of_range")

#8 Updated by Kefu Chai over 2 years ago

  • Duplicates Bug #50503: tasks/libcephsqlite throws "std::out_of_range" added

#9 Updated by Aishwarya Mathuria over 1 year ago

/a/yuriw-2022-11-28_21:10:48-rados-wip-yuri10-testing-2022-11-28-1042-pacific-distro-default-smithi/7095357
Coredump is available.

Also available in: Atom PDF