Project

General

Profile

Bug #41354

RBD image manipulation using python API crashing since Nautilus

Added by Nikola Ciprich 11 months ago. Updated 10 months ago.

Status:
Pending Backport
Priority:
Normal
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
luminous,mimic,nautilus
Regression:
Yes
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature:

Description

Since Nautilus, our python based management tools keep crashing. By examining GDB backtraces we think there might be some locking issue. I'm attaching simple reproducer scripts, multithreaded one crashes within seconds, single threaded within minutes. Backtrace attached as well.

test_volumes.py View (842 Bytes) Nikola Ciprich, 08/20/2019 04:07 PM

ceph_coredump_20190820 (7.37 KB) Nikola Ciprich, 08/20/2019 04:07 PM

test_volumes_thread.py View (1.38 KB) Nikola Ciprich, 08/20/2019 04:07 PM


Related issues

Copied to rbd - Backport #41770: mimic: RBD image manipulation using python API crashing since Nautilus In Progress
Copied to rbd - Backport #41771: nautilus: RBD image manipulation using python API crashing since Nautilus Resolved
Copied to rbd - Backport #41772: luminous: RBD image manipulation using python API crashing since Nautilus Resolved

History

#1 Updated by Jason Dillaman 11 months ago

  • Project changed from Linux kernel client to rbd
  • Category deleted (rbd)

#2 Updated by Jason Dillaman 11 months ago

  • Status changed from New to In Progress
  • Assignee set to Jason Dillaman
  • Backport set to nautilus

It looks like a bug in librados associated w/ retrieving config overrides from MON config store, but I'll take a look.

#3 Updated by Jason Dillaman 11 months ago

  • Backport changed from nautilus to luminous,mimic,nautilus

Regression introduced via [1]. It's already been backported to luminous and it's pending for mimic.

[1] https://tracker.ceph.com/issues/24823

#4 Updated by Jason Dillaman 11 months ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 29809

#5 Updated by Jason Dillaman 11 months ago

/home/jdillaman/ceph_nautilus/src/common/config_proxy.h: In function 'void ConfigProxy::call_gate_leave(ConfigProxy::md_config_obs_t*)' thread 7fffd6ffd700 time 2019-08-21 23:08:51.961269
/home/jdillaman/ceph_nautilus/src/common/config_proxy.h: 70: FAILED ceph_assert(p != obs_call_gate.end())
 ceph version 14.2.2-421-g5286e37857 (5286e37857aad3901636f3c9a3a301c0eaa35f68) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14f) [0x7fffe1574fcb]
 2: (()+0x2771b5) [0x7fffe15751b5]
 3: (()+0x65ba92) [0x7fffe1959a92]
 4: (FunctionContext::finish(int)+0x2c) [0x7fffe16588dc]
 5: (Context::complete(int)+0x9) [0x7fffe160a639]
 6: (Finisher::finisher_thread_entry()+0x15e) [0x7fffe1612c0e]
 7: (()+0x85a2) [0x7ffff7a635a2]
 8: (clone()+0x43) [0x7ffff7ec7303]

Thread 2198290 "fn_anonymous" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffd6ffd700 (LWP 10321)]
0x00007ffff7e03e75 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff7e03e75 in raise () from /lib64/libc.so.6
#1  0x00007ffff7dee895 in abort () from /lib64/libc.so.6
#2  0x00007fffe1575026 in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at /home/jdillaman/ceph_nautilus/src/common/assert.cc:73
#3  0x00007fffe15751b5 in ceph::__ceph_assert_fail (ctx=...) at /home/jdillaman/ceph_nautilus/src/common/assert.cc:78
#4  0x00007fffe1959a92 in ConfigProxy::call_gate_leave (obs=<optimized out>, this=0x55555576b248) at /usr/include/c++/9/bits/stl_tree.h:1010
#5  ConfigProxy::call_observers (rev_obs=std::map with 1 element = {...}, this=0x55555576b248) at /home/jdillaman/ceph_nautilus/src/common/config_proxy.h:89
#6  ConfigProxy::set_mon_vals(CephContext*, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, std::function<bool (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>) (
    config_cb=..., kv=..., cct=<optimized out>, this=0x55555576b248) at /home/jdillaman/ceph_nautilus/src/common/config_proxy.h:291
#7  MonClient::<lambda(int)>::operator() (__closure=0x7fffbc013d10, __closure=0x7fffbc013d10, r=<optimized out>) at /home/jdillaman/ceph_nautilus/src/mon/MonClient.cc:418
#8  boost::detail::function::void_function_obj_invoker1<MonClient::handle_config(MConfig*)::<lambda(int)>, void, int>::invoke(boost::detail::function::function_buffer &, int) (function_obj_ptr=..., a0=<optimized out>)
    at /home/jdillaman/ceph_nautilus/build/boost/include/boost/function/function_template.hpp:159
#9  0x00007fffe16588dc in boost::function1<void, int>::operator() (a0=<optimized out>, this=<optimized out>) at /home/jdillaman/ceph_nautilus/build/boost/include/boost/function/function_template.hpp:682
#10 FunctionContext::finish (this=<optimized out>, r=<optimized out>) at /home/jdillaman/ceph_nautilus/src/include/Context.h:487
#11 0x00007fffe160a639 in Context::complete (this=0x7fffbc013d00, r=<optimized out>) at /home/jdillaman/ceph_nautilus/src/include/Context.h:77
#12 0x00007fffe1612c0e in Finisher::finisher_thread_entry (this=0x5555557749c0) at /home/jdillaman/ceph_nautilus/src/common/Finisher.cc:67
#13 0x00007ffff7a635a2 in start_thread () from /lib64/libpthread.so.0
#14 0x00007ffff7ec7303 in clone () from /lib64/libc.so.6

#6 Updated by Patrick Donnelly 10 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Target version changed from v14.2.2 to v15.0.0

#7 Updated by Nathan Cutler 10 months ago

  • Copied to Backport #41770: mimic: RBD image manipulation using python API crashing since Nautilus added

#8 Updated by Nathan Cutler 10 months ago

  • Copied to Backport #41771: nautilus: RBD image manipulation using python API crashing since Nautilus added

#9 Updated by Nathan Cutler 10 months ago

  • Copied to Backport #41772: luminous: RBD image manipulation using python API crashing since Nautilus added

Also available in: Atom PDF