Actions
Bug #19348
closed"ceph ping mon.c" cli prints assertion failure on timeout
Status:
Can't reproduce
Priority:
Low
Assignee:
-
Category:
Correctness/Safety
Target version:
-
% Done:
0%
Source:
Development
Tags:
low-hanging-fruit
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
MonClient, ceph cli, librados
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
- start a cluster with 3 monitors: mon.a, mon.b and mon.c
- stop mon.c
- ceph ping mon.c --connect-timeout=5
it prints out following backtrace
timeout = 5 (-4, None, 'Interrupted!') /var/ceph/ceph/src/msg/async/Event.cc: In function 'EventCenter::~EventCenter()' thread 7f2ccdffb700 time 2017-03-22 16:57:47.437992 /var/ceph/ceph/src/msg/async/Event.cc: 174: FAILED assert(time_events.empty()) ceph version Development (no_version) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x137) [0x7f2cdd3516b8] 2: (EventCenter::~EventCenter()+0xd2) [0x7f2cdd56ebc6] 3: (Worker::~Worker()+0x7f) [0x7f2cdd57fb89] 4: (PosixWorker::~PosixWorker()+0x2a) [0x7f2cdd58298c] 5: (PosixWorker::~PosixWorker()+0x18) [0x7f2cdd5829a8] 6: (NetworkStack::~NetworkStack()+0xa9) [0x7f2cdd57fc71] 7: (PosixNetworkStack::~PosixNetworkStack()+0x4a) [0x7f2cdd582a00] 8: (void __gnu_cxx::new_allocator<PosixNetworkStack>::destroy<PosixNetworkStack>(PosixNetworkStack*)+0x23) [0x7f2cdd57e701] 9: (void std::allocator_traits<std::allocator<PosixNetworkStack> >::destroy<PosixNetworkStack>(std::allocator<PosixNetworkStack>&, PosixNetworkStack*)+0x23) [0x7f2cdd57e6cd] 10: (std::_Sp_counted_ptr_inplace<PosixNetworkStack, std::allocator<PosixNetworkStack>, (__gnu_cxx::_Lock_policy)2>::_M_dispose()+0x37) [0x7f2cdd57e5b7] 11: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()+0x42) [0x7f2ce678b93c] 12: (std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count()+0x27) [0x7f2ce6785a69] 13: (std::__shared_ptr<NetworkStack, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr()+0x1c) [0x7f2cdd56a750] 14: (std::shared_ptr<NetworkStack>::~shared_ptr()+0x18) [0x7f2cdd56a76c] 15: (StackSingleton::~StackSingleton()+0x34) [0x7f2cdd56a85c] 16: (CephContext::TypedSingletonWrapper<StackSingleton>::~TypedSingletonWrapper()+0x34) [0x7f2cdd56de92] 17: (CephContext::TypedSingletonWrapper<StackSingleton>::~TypedSingletonWrapper()+0x18) [0x7f2cdd56dec6] 18: (CephContext::~CephContext()+0x8f) [0x7f2cdd65f697] 19: (CephContext::put()+0x14a) [0x7f2cdd6600e8] 20: (()+0x1d5b9e) [0x7f2ce67bfb9e] 21: (()+0x1dcc87) [0x7f2ce67c6c87] 22: (std::function<void (CephContext*)>::operator()(CephContext*) const+0x49) [0x7f2ce67d5245] 23: (std::unique_ptr<CephContext, std::function<void (CephContext*)> >::~unique_ptr()+0x49) [0x7f2ce67d0f5b] 24: (librados::RadosClient::~RadosClient()+0x140) [0x7f2ce67c25a8] 25: (librados::RadosClient::~RadosClient()+0x18) [0x7f2ce67c25d0] 26: (rados_shutdown()+0x129) [0x7f2ce6751338] 27: (()+0x17f34) [0x7f2ce6b3ef34] 28: (PyEval_EvalFrameEx()+0x7a06) [0x555c52385736] 29: (PyEval_EvalCodeEx()+0x255) [0x555c5237c535] 30: (PyEval_EvalFrameEx()+0x6968) [0x555c52384698] 31: (PyEval_EvalFrameEx()+0x5eef) [0x555c52383c1f] 32: (PyEval_EvalCodeEx()+0x255) [0x555c5237c535] 33: (()+0x115cee) [0x555c52398cee] 34: (PyObject_Call()+0x43) [0x555c5236a673] 35: (()+0x12bfee) [0x555c523aefee] 36: (PyObject_Call()+0x43) [0x555c5236a673] 37: (PyEval_CallObjectWithKeywords()+0x30) [0x555c52388430] 38: (()+0x1ce8b2) [0x555c524518b2] 39: (()+0x7424) [0x7f2ce807d424] 40: (clone()+0x5f) [0x7f2ce749b9bf] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Aborted
instead we should just rely on the "timeout" of called Rados method, if the timeout param is not supported by the involved Rados method, use "client_mount_timeout" setting instead before connecting the monitor, like
cluster_handle.conf_set("client_mount_timeout", str(timeout))
please note, we should allow SIGINT to terminate the waiting with the fix. see run_in_thread()
.
Updated by Greg Farnum almost 7 years ago
- Project changed from Ceph to RADOS
- Category changed from ceph cli to Correctness/Safety
- Component(RADOS) MonClient, ceph cli, librados added
Updated by Joao Eduardo Luis about 6 years ago
- Assignee deleted (
Anonymous) - Tags changed from low-hanging to low-hanging-fruit
Updated by Rishabh Dave about 6 years ago
Updated by Kefu Chai over 5 years ago
- Status changed from New to Can't reproduce
not able to reproduce with master HEAD anymore.
Actions