Actions
Bug #57782
open[mon] high cpu usage by fn_monstore thread
Status:
Fix Under Review
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
We observed high cpu usage by ms_dispatch and fn_monstore thread (amounting to 100-99% in top) Ceph [ deployment was with rook ]
We ran gdb and perf tooling to understand better what might be the origin and found to be :
100.00% 0.00% ms_dispatch ceph-mon [.] Monitor::handle_command | ---Monitor::handle_command PaxosService::dispatch OSDMonitor::prepare_update OSDMonitor::prepare_command OSDMonitor::prepare_command_impl OSDMonitor::prepare_new_pool CrushTester::test_with_fork | --99.81%--__libc_close (inlined) | |--33.42%--entry_SYSCALL_64_after_hwframe | do_syscall_64 | | | |--22.22%--syscall_enter_from_user_mode | | | |--6.06%--__x64_sys_close | | | | | --4.57%--close_fd | | | | | --3.56%--pick_file | | | | | --2.72%--_raw_spin_lock | | | | | --1.38%--preempt_count_add | | | |--2.48%--syscall_exit_to_user_mode | | | | | --2.11%--syscall_exit_to_user_mode_prepare | | | | | --0.62%--__audit_syscall_exit | | | --1.84%--syscall_trace_enter.constprop.0 | | | --0.76%--__audit_syscall_entry | |--1.79%--__pthread_enable_asynccancel | --1.24%--__pthread_disable_asynccancel
CrushTester leading to libc_close consuming 99% of the cpu insteading of
creating a fork:
---Monitor::handle_command PaxosService::dispatch OSDMonitor::prepare_update OSDMonitor::prepare_command OSDMonitor::prepare_command_impl OSDMonitor::prepare_new_pool CrushTester::test_with_fork __libc_close (inlined) entry_SYSCALL_64_after_hwframe do_syscall_64 syscall_enter_from_user_mode
from mon logs termination can be observed
2022-09-15T14:29:06.155+0000 7fe451263700 10 -- [v2:10.6.125.48:3300/0,v1:10.6.125.48:6789/0] >> 10.6.137.218:0/260641845 conn(0x5638020d9400 msgr2=0x5637fa16f600 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1)._try_send sent bytes 96 remaining bytes 0 2022-09-15T14:29:06.209+0000 7fe458271700 -1 received signal: Terminated from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0 2022-09-15T14:29:06.209+0000 7fe458271700 -1 mon.a@0(leader) e1 *** Got Signal Terminated ***
there has been no assert failure or anything of sorts, adding it here to see if anyone has any feedback/hit this issue so far.
Actions