Project

General

Profile

Bug #19348

Updated by Kefu Chai about 7 years ago

# start a cluster with 3 monitors: mon.a, mon.b and mon.c 
 # stop mon.c 
 # ceph ping mon.c --connect-timeout=5 

 it prints out following backtrace 
 <pre> 
 timeout =    5 
 (-4, None, 'Interrupted!') 
 /var/ceph/ceph/src/msg/async/Event.cc: In function 'EventCenter::~EventCenter()' thread 7f2ccdffb700 time 2017-03-22 16:57:47.437992 
 /var/ceph/ceph/src/msg/async/Event.cc: 174: FAILED assert(time_events.empty()) 
  ceph version Development (no_version) 
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x137) [0x7f2cdd3516b8] 
  2: (EventCenter::~EventCenter()+0xd2) [0x7f2cdd56ebc6] 
  3: (Worker::~Worker()+0x7f) [0x7f2cdd57fb89] 
  4: (PosixWorker::~PosixWorker()+0x2a) [0x7f2cdd58298c] 
  5: (PosixWorker::~PosixWorker()+0x18) [0x7f2cdd5829a8] 
  6: (NetworkStack::~NetworkStack()+0xa9) [0x7f2cdd57fc71] 
  7: (PosixNetworkStack::~PosixNetworkStack()+0x4a) [0x7f2cdd582a00] 
  8: (void __gnu_cxx::new_allocator<PosixNetworkStack>::destroy<PosixNetworkStack>(PosixNetworkStack*)+0x23) [0x7f2cdd57e701] 
  9: (void std::allocator_traits<std::allocator<PosixNetworkStack> >::destroy<PosixNetworkStack>(std::allocator<PosixNetworkStack>&, PosixNetworkStack*)+0x23) 
  [0x7f2cdd57e6cd] 
  10: (std::_Sp_counted_ptr_inplace<PosixNetworkStack, std::allocator<PosixNetworkStack>, (__gnu_cxx::_Lock_policy)2>::_M_dispose()+0x37) [0x7f2cdd57e5b7] 
  11: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()+0x42) [0x7f2ce678b93c] 
  12: (std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()+0x27) [0x7f2ce6785a69] 
  13: (std::__shared_ptr<NetworkStack, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr()+0x1c) [0x7f2cdd56a750] 
  14: (std::shared_ptr<NetworkStack>::~shared_ptr()+0x18) [0x7f2cdd56a76c] 
  15: (StackSingleton::~StackSingleton()+0x34) [0x7f2cdd56a85c] 
  16: (CephContext::TypedSingletonWrapper<StackSingleton>::~TypedSingletonWrapper()+0x34) [0x7f2cdd56de92] 
  17: (CephContext::TypedSingletonWrapper<StackSingleton>::~TypedSingletonWrapper()+0x18) [0x7f2cdd56dec6] 
  18: (CephContext::~CephContext()+0x8f) [0x7f2cdd65f697] 
  19: (CephContext::put()+0x14a) [0x7f2cdd6600e8] 
  20: (()+0x1d5b9e) [0x7f2ce67bfb9e] 
  21: (()+0x1dcc87) [0x7f2ce67c6c87] 
  22: (std::function<void (CephContext*)>::operator()(CephContext*) const+0x49) [0x7f2ce67d5245] 
  23: (std::unique_ptr<CephContext, std::function<void (CephContext*)> >::~unique_ptr()+0x49) [0x7f2ce67d0f5b] 
  24: (librados::RadosClient::~RadosClient()+0x140) [0x7f2ce67c25a8] 
  25: (librados::RadosClient::~RadosClient()+0x18) [0x7f2ce67c25d0] 
  26: (rados_shutdown()+0x129) [0x7f2ce6751338] 
  27: (()+0x17f34) [0x7f2ce6b3ef34] 
  28: (PyEval_EvalFrameEx()+0x7a06) [0x555c52385736] 
  29: (PyEval_EvalCodeEx()+0x255) [0x555c5237c535] 
  30: (PyEval_EvalFrameEx()+0x6968) [0x555c52384698] 
  31: (PyEval_EvalFrameEx()+0x5eef) [0x555c52383c1f] 
  32: (PyEval_EvalCodeEx()+0x255) [0x555c5237c535] 
  33: (()+0x115cee) [0x555c52398cee] 
  34: (PyObject_Call()+0x43) [0x555c5236a673] 
  35: (()+0x12bfee) [0x555c523aefee] 
  36: (PyObject_Call()+0x43) [0x555c5236a673] 
  37: (PyEval_CallObjectWithKeywords()+0x30) [0x555c52388430] 
  38: (()+0x1ce8b2) [0x555c524518b2] 
  39: (()+0x7424) [0x7f2ce807d424] 
  40: (clone()+0x5f) [0x7f2ce749b9bf] 
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 
 Aborted 
 </pre> 

 instead we should just rely on the "timeout" of called Rados method, if the timeout param is not supported by the involved Rados method, use "client_mount_timeout" setting instead before connecting the monitor, like 

 <pre><code class="python"> 
 cluster_handle.conf_set("client_mount_timeout", str(timeout)) 
 </code></pre> 

 please note, we should allow SIGINT to terminate the waiting with the fix.    see @run_in_thread()@.

Back