Project

General

Profile

Bug #41354

RBD image manipulation using python API crashing since Nautilus

Added by Nikola Ciprich 30 days ago. Updated 8 days ago.

Status:
Pending Backport
Priority:
Normal
Target version:
Start date:
08/20/2019
Due date:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
luminous,mimic,nautilus
Regression:
Yes
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:

Description

Since Nautilus, our python based management tools keep crashing. By examining GDB backtraces we think there might be some locking issue. I'm attaching simple reproducer scripts, multithreaded one crashes within seconds, single threaded within minutes. Backtrace attached as well.

test_volumes.py View (842 Bytes) Nikola Ciprich, 08/20/2019 04:07 PM

ceph_coredump_20190820 (7.37 KB) Nikola Ciprich, 08/20/2019 04:07 PM

test_volumes_thread.py View (1.38 KB) Nikola Ciprich, 08/20/2019 04:07 PM


Related issues

Copied to rbd - Backport #41770: mimic: RBD image manipulation using python API crashing since Nautilus New
Copied to rbd - Backport #41771: nautilus: RBD image manipulation using python API crashing since Nautilus New
Copied to rbd - Backport #41772: luminous: RBD image manipulation using python API crashing since Nautilus In Progress

History

#1 Updated by Jason Dillaman 30 days ago

  • Project changed from Linux kernel client to rbd
  • Category deleted (rbd)

#2 Updated by Jason Dillaman 30 days ago

  • Status changed from New to In Progress
  • Assignee set to Jason Dillaman
  • Backport set to nautilus

It looks like a bug in librados associated w/ retrieving config overrides from MON config store, but I'll take a look.

#3 Updated by Jason Dillaman 28 days ago

  • Backport changed from nautilus to luminous,mimic,nautilus

Regression introduced via [1]. It's already been backported to luminous and it's pending for mimic.

[1] https://tracker.ceph.com/issues/24823

#4 Updated by Jason Dillaman 28 days ago

  • Status changed from In Progress to Need Review
  • Pull request ID set to 29809

#5 Updated by Jason Dillaman 28 days ago

/home/jdillaman/ceph_nautilus/src/common/config_proxy.h: In function 'void ConfigProxy::call_gate_leave(ConfigProxy::md_config_obs_t*)' thread 7fffd6ffd700 time 2019-08-21 23:08:51.961269
/home/jdillaman/ceph_nautilus/src/common/config_proxy.h: 70: FAILED ceph_assert(p != obs_call_gate.end())
 ceph version 14.2.2-421-g5286e37857 (5286e37857aad3901636f3c9a3a301c0eaa35f68) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14f) [0x7fffe1574fcb]
 2: (()+0x2771b5) [0x7fffe15751b5]
 3: (()+0x65ba92) [0x7fffe1959a92]
 4: (FunctionContext::finish(int)+0x2c) [0x7fffe16588dc]
 5: (Context::complete(int)+0x9) [0x7fffe160a639]
 6: (Finisher::finisher_thread_entry()+0x15e) [0x7fffe1612c0e]
 7: (()+0x85a2) [0x7ffff7a635a2]
 8: (clone()+0x43) [0x7ffff7ec7303]

Thread 2198290 "fn_anonymous" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffd6ffd700 (LWP 10321)]
0x00007ffff7e03e75 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff7e03e75 in raise () from /lib64/libc.so.6
#1  0x00007ffff7dee895 in abort () from /lib64/libc.so.6
#2  0x00007fffe1575026 in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at /home/jdillaman/ceph_nautilus/src/common/assert.cc:73
#3  0x00007fffe15751b5 in ceph::__ceph_assert_fail (ctx=...) at /home/jdillaman/ceph_nautilus/src/common/assert.cc:78
#4  0x00007fffe1959a92 in ConfigProxy::call_gate_leave (obs=<optimized out>, this=0x55555576b248) at /usr/include/c++/9/bits/stl_tree.h:1010
#5  ConfigProxy::call_observers (rev_obs=std::map with 1 element = {...}, this=0x55555576b248) at /home/jdillaman/ceph_nautilus/src/common/config_proxy.h:89
#6  ConfigProxy::set_mon_vals(CephContext*, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, std::function<bool (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>) (
    config_cb=..., kv=..., cct=<optimized out>, this=0x55555576b248) at /home/jdillaman/ceph_nautilus/src/common/config_proxy.h:291
#7  MonClient::<lambda(int)>::operator() (__closure=0x7fffbc013d10, __closure=0x7fffbc013d10, r=<optimized out>) at /home/jdillaman/ceph_nautilus/src/mon/MonClient.cc:418
#8  boost::detail::function::void_function_obj_invoker1<MonClient::handle_config(MConfig*)::<lambda(int)>, void, int>::invoke(boost::detail::function::function_buffer &, int) (function_obj_ptr=..., a0=<optimized out>)
    at /home/jdillaman/ceph_nautilus/build/boost/include/boost/function/function_template.hpp:159
#9  0x00007fffe16588dc in boost::function1<void, int>::operator() (a0=<optimized out>, this=<optimized out>) at /home/jdillaman/ceph_nautilus/build/boost/include/boost/function/function_template.hpp:682
#10 FunctionContext::finish (this=<optimized out>, r=<optimized out>) at /home/jdillaman/ceph_nautilus/src/include/Context.h:487
#11 0x00007fffe160a639 in Context::complete (this=0x7fffbc013d00, r=<optimized out>) at /home/jdillaman/ceph_nautilus/src/include/Context.h:77
#12 0x00007fffe1612c0e in Finisher::finisher_thread_entry (this=0x5555557749c0) at /home/jdillaman/ceph_nautilus/src/common/Finisher.cc:67
#13 0x00007ffff7a635a2 in start_thread () from /lib64/libpthread.so.0
#14 0x00007ffff7ec7303 in clone () from /lib64/libc.so.6

#6 Updated by Patrick Donnelly 8 days ago

  • Status changed from Need Review to Pending Backport
  • Target version changed from v14.2.2 to v15.0.0

#7 Updated by Nathan Cutler 8 days ago

  • Copied to Backport #41770: mimic: RBD image manipulation using python API crashing since Nautilus added

#8 Updated by Nathan Cutler 8 days ago

  • Copied to Backport #41771: nautilus: RBD image manipulation using python API crashing since Nautilus added

#9 Updated by Nathan Cutler 8 days ago

  • Copied to Backport #41772: luminous: RBD image manipulation using python API crashing since Nautilus added

Also available in: Atom PDF