Project

General

Profile

Actions

Bug #41354

closed

RBD image manipulation using python API crashing since Nautilus

Added by Nikola Ciprich over 4 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Jason Dillaman
Target version:
% Done:

0%

Source:
Community (user)
Tags:
Backport:
luminous,mimic,nautilus
Regression:
Yes
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Since Nautilus, our python based management tools keep crashing. By examining GDB backtraces we think there might be some locking issue. I'm attaching simple reproducer scripts, multithreaded one crashes within seconds, single threaded within minutes. Backtrace attached as well.


Files

test_volumes.py (842 Bytes) test_volumes.py Nikola Ciprich, 08/20/2019 04:07 PM
ceph_coredump_20190820 (7.37 KB) ceph_coredump_20190820 Nikola Ciprich, 08/20/2019 04:07 PM
test_volumes_thread.py (1.38 KB) test_volumes_thread.py Nikola Ciprich, 08/20/2019 04:07 PM

Related issues 4 (0 open4 closed)

Related to RADOS - Bug #62832: common: config_proxy deadlock during shutdown (and possibly other times)ResolvedPatrick Donnelly

Actions
Copied to rbd - Backport #41770: mimic: RBD image manipulation using python API crashing since NautilusRejectedActions
Copied to rbd - Backport #41771: nautilus: RBD image manipulation using python API crashing since NautilusResolvedNathan CutlerActions
Copied to rbd - Backport #41772: luminous: RBD image manipulation using python API crashing since NautilusResolvedJason DillamanActions
Actions #1

Updated by Jason Dillaman over 4 years ago

  • Project changed from Linux kernel client to rbd
  • Category deleted (rbd)
Actions #2

Updated by Jason Dillaman over 4 years ago

  • Status changed from New to In Progress
  • Assignee set to Jason Dillaman
  • Backport set to nautilus

It looks like a bug in librados associated w/ retrieving config overrides from MON config store, but I'll take a look.

Actions #3

Updated by Jason Dillaman over 4 years ago

  • Backport changed from nautilus to luminous,mimic,nautilus

Regression introduced via [1]. It's already been backported to luminous and it's pending for mimic.

[1] https://tracker.ceph.com/issues/24823

Actions #4

Updated by Jason Dillaman over 4 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 29809
Actions #5

Updated by Jason Dillaman over 4 years ago

/home/jdillaman/ceph_nautilus/src/common/config_proxy.h: In function 'void ConfigProxy::call_gate_leave(ConfigProxy::md_config_obs_t*)' thread 7fffd6ffd700 time 2019-08-21 23:08:51.961269
/home/jdillaman/ceph_nautilus/src/common/config_proxy.h: 70: FAILED ceph_assert(p != obs_call_gate.end())
 ceph version 14.2.2-421-g5286e37857 (5286e37857aad3901636f3c9a3a301c0eaa35f68) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14f) [0x7fffe1574fcb]
 2: (()+0x2771b5) [0x7fffe15751b5]
 3: (()+0x65ba92) [0x7fffe1959a92]
 4: (FunctionContext::finish(int)+0x2c) [0x7fffe16588dc]
 5: (Context::complete(int)+0x9) [0x7fffe160a639]
 6: (Finisher::finisher_thread_entry()+0x15e) [0x7fffe1612c0e]
 7: (()+0x85a2) [0x7ffff7a635a2]
 8: (clone()+0x43) [0x7ffff7ec7303]

Thread 2198290 "fn_anonymous" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffd6ffd700 (LWP 10321)]
0x00007ffff7e03e75 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff7e03e75 in raise () from /lib64/libc.so.6
#1  0x00007ffff7dee895 in abort () from /lib64/libc.so.6
#2  0x00007fffe1575026 in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at /home/jdillaman/ceph_nautilus/src/common/assert.cc:73
#3  0x00007fffe15751b5 in ceph::__ceph_assert_fail (ctx=...) at /home/jdillaman/ceph_nautilus/src/common/assert.cc:78
#4  0x00007fffe1959a92 in ConfigProxy::call_gate_leave (obs=<optimized out>, this=0x55555576b248) at /usr/include/c++/9/bits/stl_tree.h:1010
#5  ConfigProxy::call_observers (rev_obs=std::map with 1 element = {...}, this=0x55555576b248) at /home/jdillaman/ceph_nautilus/src/common/config_proxy.h:89
#6  ConfigProxy::set_mon_vals(CephContext*, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, std::function<bool (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>) (
    config_cb=..., kv=..., cct=<optimized out>, this=0x55555576b248) at /home/jdillaman/ceph_nautilus/src/common/config_proxy.h:291
#7  MonClient::<lambda(int)>::operator() (__closure=0x7fffbc013d10, __closure=0x7fffbc013d10, r=<optimized out>) at /home/jdillaman/ceph_nautilus/src/mon/MonClient.cc:418
#8  boost::detail::function::void_function_obj_invoker1<MonClient::handle_config(MConfig*)::<lambda(int)>, void, int>::invoke(boost::detail::function::function_buffer &, int) (function_obj_ptr=..., a0=<optimized out>)
    at /home/jdillaman/ceph_nautilus/build/boost/include/boost/function/function_template.hpp:159
#9  0x00007fffe16588dc in boost::function1<void, int>::operator() (a0=<optimized out>, this=<optimized out>) at /home/jdillaman/ceph_nautilus/build/boost/include/boost/function/function_template.hpp:682
#10 FunctionContext::finish (this=<optimized out>, r=<optimized out>) at /home/jdillaman/ceph_nautilus/src/include/Context.h:487
#11 0x00007fffe160a639 in Context::complete (this=0x7fffbc013d00, r=<optimized out>) at /home/jdillaman/ceph_nautilus/src/include/Context.h:77
#12 0x00007fffe1612c0e in Finisher::finisher_thread_entry (this=0x5555557749c0) at /home/jdillaman/ceph_nautilus/src/common/Finisher.cc:67
#13 0x00007ffff7a635a2 in start_thread () from /lib64/libpthread.so.0
#14 0x00007ffff7ec7303 in clone () from /lib64/libc.so.6

Actions #6

Updated by Patrick Donnelly over 4 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Target version changed from v14.2.2 to v15.0.0
Actions #7

Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #41770: mimic: RBD image manipulation using python API crashing since Nautilus added
Actions #8

Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #41771: nautilus: RBD image manipulation using python API crashing since Nautilus added
Actions #9

Updated by Nathan Cutler over 4 years ago

  • Copied to Backport #41772: luminous: RBD image manipulation using python API crashing since Nautilus added
Actions #10

Updated by Nathan Cutler about 3 years ago

  • Status changed from Pending Backport to Resolved

While running with --resolve-parent, the script "backport-create-issue" noticed that all backports of this issue are in status "Resolved" or "Rejected".

Actions #11

Updated by Patrick Donnelly 7 months ago

  • Related to Bug #62832: common: config_proxy deadlock during shutdown (and possibly other times) added
Actions

Also available in: Atom PDF