Project

General

Profile

Actions

Bug #52405

open

Osd deadlock occurs when calling “ceph daemon osd.x perf dump osd osd op_latency”

Added by changzhi tan over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When I use "ceph daemon osd.x perf dump osd osd op_latency", a deadlock occurs, which will cause osd to be marked as down by mon when mon_osd_report_timeout (900s) is exceeded, but osd does not actually exit and osd appears a large number of clock skew errors

ceph version: ceph-15.2.8
os: centos7.5

Thread 8 (Thread 0x7f24170c1700 (LWP 308221)):
#0 load (_m=std::memory_order_seq_cst, this=0x55983a3df130) at /usr/src/debug/ceph-15.2.8.1.0.0/src/common/perf_counters.h:211
#1 operator std::
_atomic_base<unsigned long>::__int_type (this=0x55983a3df130) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/atomic_base.h:259
#2 read_avg (this=0x55983a3df100) at /usr/src/debug/ceph-15.2.8.1.0.0/src/common/perf_counters.h:211
#3 ceph::common::PerfCounters::dump_formatted_generic(ceph::Formatter*, bool, bool, std::string const&) const () at /usr/src/debug/ceph-15.2.8.1.0.0/src/common/perf_counters.cc:422
#4 0x000055981b574bb6 in ceph::common::PerfCountersCollectionImpl::dump_formatted_generic (this=0x5598261437d0, f=0x559826ed2700, schema=false, histograms=false, logger="osd", counter="op_latency")
at /usr/src/debug/ceph-15.2.8.1.0.0/src/common/perf_counters.cc:139
#5 0x000055981b5767f7 in dump_formatted (counter=..., logger="osd", schema=false, f=0x559826ed2700, this=0x5598261437d0) at /usr/src/debug/ceph-15.2.8.1.0.0/src/common/perf_counters.h:329
#6 ceph::common::PerfCountersCollection::dump_formatted(ceph::Formatter*, bool, std::string const&, std::string const&) () at /usr/src/debug/ceph-15.2.8.1.0.0/src/common/perf_counters_collection.cc:41
#7 0x000055981b4b38b4 in ceph::common::CephContext::_do_command(std::basic_string_view<char, std::char_traits<char> >, std::map<std::string, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<void>, std::allocator<std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > > const&, ceph::Formatter*, std::ostream&, ceph::buffer::v15_2_0::list*) () at /usr/src/debug/ceph-15.2.8.1.0.0/src/common/ceph_context.cc:505
#8 0x000055981b4b5594 in ceph::common::CephContext::do_command(std::basic_string_view<char, std::char_traits<char> >, std::map<std::string, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<void>, std::allocator<std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > > const&, ceph::Formatter*, std::ostream&, ceph::buffer::v15_2_0::list*) () at /usr/src/debug/ceph-15.2.8.1.0.0/src/common/ceph_context.cc:471
#9 0x000055981b4b9474 in ceph::common::CephContextHook::call (this=<optimized out>, command=..., cmdmap=..., f=<optimized out>, errss=..., out=...) at /usr/src/debug/ceph-15.2.8.1.0.0/src/common/ceph_context.cc:422
Python Exception <type 'exceptions.ValueError'> Cannot find type const cmdmap_t::_Rep_type:
#10 0x000055981aece165 in AdminSocketHook::call_async(std::basic_string_view<char, std::char_traits<char> >, std::map<std::string, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<void>, std::allocator<std::pair<std::string const, boost::variant<std::string, bool, long, double, std::vector<std::string, std::allocator<std::string> >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > > const&, ceph::Formatter*, ceph::buffer::v15_2_0::list const&, std::function<void (int, std::string const&, ceph::buffer::v15_2_0::list&)>) (this=this@entry=0x55982611a660, command=..., cmdmap=std::map with 3 elements, f=f@entry=0x559826ed2700, inbl=..., on_finish=...)
at /usr/src/debug/ceph-15.2.8.1.0.0/src/include/buffer.h:594
#11 0x000055981b495e19 in AdminSocket::execute_command(std::vector<std::string, std::allocator<std::string> > const&, ceph::buffer::v15_2_0::list const&, std::function<void (int, std::string const&, ceph::buffer::v15_2_0::list&)>) () at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:87
---Type <return> to continue, or q <return> to quit---
#12 0x000055981b496395 in AdminSocket::execute_command(std::vector<std::string, std::allocator<std::string> > const&, ceph::buffer::v15_2_0::list const&, std::ostream&, ceph::buffer::v15_2_0::list*) ()
at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:87
#13 0x000055981b496c7a in AdminSocket::do_accept() () at /usr/src/debug/ceph-15.2.8.1.0.0/src/common/admin_socket.cc:367
#14 0x000055981b497c38 in AdminSocket::entry (this=0x559826154d80) at /usr/src/debug/ceph-15.2.8.1.0.0/src/common/admin_socket.cc:255
#15 0x000055981bb0ee4f in execute_native_thread_routine ()
#16 0x00007f241d867dd5 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f241c72cead in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f241403d700 (LWP 308233)):
#0 0x00007f241d86e4ed in _lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f241d869dcb in _L_lock_883 () from /lib64/libpthread.so.0
#2 0x00007f241d869c98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x000055981b66439b in __gthread_mutex_lock (
_mutex=0x5598261c0b80) at /opt/rh/devtoolset-8/root/usr/include/c++/8/x86_64-redhat-linux/bits/gthr-default.h:748
#4 lock (this=0x5598261c0b80) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_mutex.h:103
#5 lock_guard (__m=..., this=<synthetic pointer>) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_mutex.h:162
#6 MgrClient::update_daemon_health(std::vector<DaemonHealthMetric, std::allocator<DaemonHealthMetric> >&&) (this=0x5598261c0a18,
metrics=<unknown type in /usr/lib/debug/usr/bin/ceph-osd.debug, CU 0xb4c44af, DIE 0xb74e6d4>) at /usr/src/debug/ceph-15.2.8.1.0.0/src/mgr/MgrClient.cc:613
#7 0x000055981ae6bf9a in OSD::tick_without_osd_lock() () at /usr/src/debug/ceph-15.2.8.1.0.0/src/osd/OSD.cc:5916
#8 0x000055981ae9d0b9 in Context::complete (this=0x5598489a1490, r=<optimized out>) at /usr/src/debug/ceph-15.2.8.1.0.0/src/include/Context.h:77
#9 0x000055981b48a588 in SafeTimer::timer_thread() () at /usr/src/debug/ceph-15.2.8.1.0.0/src/common/Timer.cc:96
#10 0x000055981b48b97d in SafeTimerThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.2.8.1.0.0/src/common/Timer.cc:30
#11 0x00007f241d867dd5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f241c72cead in clone () from /lib64/libc.so.6

Thread 39 (Thread 0x7f2404971700 (LWP 312022)):
#0 0x00007f241d86e4ed in _lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f241d869dcb in _L_lock_883 () from /lib64/libpthread.so.0
#2 0x00007f241d869c98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x000055981b66671f in __gthread_mutex_lock (
_mutex=0x5598261c0b80) at /opt/rh/devtoolset-8/root/usr/include/c++/8/x86_64-redhat-linux/bits/gthr-default.h:748
#4 lock (this=0x5598261c0b80) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_mutex.h:103
#5 lock_guard (__m=..., this=<synthetic pointer>) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_mutex.h:162
#6 MgrClient::ms_handle_reset(Connection*) () at /usr/src/debug/ceph-15.2.8.1.0.0/src/mgr/MgrClient.cc:267
#7 0x000055981b79f732 in ms_deliver_handle_reset (con=0x55984bf19400, this=<optimized out>) at /usr/src/debug/ceph-15.2.8.1.0.0/src/msg/Messenger.h:772
#8 DispatchQueue::entry() () at /usr/src/debug/ceph-15.2.8.1.0.0/src/msg/DispatchQueue.cc:185
#9 0x000055981b601d8d in DispatchQueue::DispatchThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.2.8.1.0.0/src/msg/DispatchQueue.h:101
#10 0x00007f241d867dd5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f241c72cead in clone () from /lib64/libc.so.6

Thread 43 (Thread 0x7f240296d700 (LWP 312026)):
#0 0x00007f241d86e4ed in _lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f241d869dcb in _L_lock_883 () from /lib64/libpthread.so.0
#2 0x00007f241d869c98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x000055981b57691a in __gthread_mutex_lock (
_mutex=0x5598261437a8) at /opt/rh/devtoolset-8/root/usr/include/c++/8/x86_64-redhat-linux/bits/gthr-default.h:748
#4 lock (this=0x5598261437a8) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_mutex.h:103
#5 lock_guard (__m=..., this=<synthetic pointer>) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_mutex.h:162
#6 ceph::common::PerfCountersCollection::with_counters(std::function<void (std::map<std::string, ceph::common::PerfCountersCollectionImpl::PerfCounterRef, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::common::PerfCountersCollectionImpl::PerfCounterRef> > > const&)>) const () at /usr/src/debug/ceph-15.2.8.1.0.0/src/common/perf_counters_collection.cc:52
#7 0x000055981b666ac7 in MgrClient::_send_report() () at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:87
#8 0x000055981b667069 in MgrClient::_send_stats (this=0x5598261c0a18) at /usr/src/debug/ceph-15.2.8.1.0.0/src/mgr/MgrClient.cc:284
#9 0x000055981ae9d0b9 in Context::complete (this=0x559848780c80, r=<optimized out>) at /usr/src/debug/ceph-15.2.8.1.0.0/src/include/Context.h:77
#10 0x000055981b48a588 in SafeTimer::timer_thread() () at /usr/src/debug/ceph-15.2.8.1.0.0/src/common/Timer.cc:96
#11 0x000055981b48b97d in SafeTimerThread::entry (this=<optimized out>) at /usr/src/debug/ceph-15.2.8.1.0.0/src/common/Timer.cc:30
#12 0x00007f241d867dd5 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f241c72cead in clone () from /lib64/libc.so.6


Files

mon log.jpg (217 KB) mon log.jpg changzhi tan, 08/25/2021 07:42 AM
osd log.jpg (322 KB) osd log.jpg changzhi tan, 08/25/2021 07:42 AM
gdb threads info.txt (73 KB) gdb threads info.txt changzhi tan, 08/25/2021 07:43 AM

No data to display

Actions

Also available in: Atom PDF