Project

General

Profile

Actions

Bug #57782

open

[mon] high cpu usage by fn_monstore thread

Added by Deepika Upadhyay over 1 year ago. Updated about 2 months ago.

Status:
Fix Under Review
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We observed high cpu usage by ms_dispatch and fn_monstore thread (amounting to 100-99% in top) Ceph [ deployment was with rook ]

We ran gdb and perf tooling to understand better what might be the origin and found to be :

   100.00%     0.00%  ms_dispatch  ceph-mon             [.] Monitor::handle_command
            |
            ---Monitor::handle_command
               PaxosService::dispatch
               OSDMonitor::prepare_update
               OSDMonitor::prepare_command
               OSDMonitor::prepare_command_impl
               OSDMonitor::prepare_new_pool
               CrushTester::test_with_fork
               |          
                --99.81%--__libc_close (inlined)
                          |          
                          |--33.42%--entry_SYSCALL_64_after_hwframe
                          |          do_syscall_64
                          |          |          
                          |          |--22.22%--syscall_enter_from_user_mode
                          |          |          
                          |          |--6.06%--__x64_sys_close
                          |          |          |          
                          |          |           --4.57%--close_fd
                          |          |                     |          
                          |          |                      --3.56%--pick_file
                          |          |                                |          
                          |          |                                 --2.72%--_raw_spin_lock
                          |          |                                           |          
                          |          |                                            --1.38%--preempt_count_add
                          |          |          
                          |          |--2.48%--syscall_exit_to_user_mode
                          |          |          |          
                          |          |           --2.11%--syscall_exit_to_user_mode_prepare
                          |          |                     |          
                          |          |                      --0.62%--__audit_syscall_exit
                          |          |          
                          |           --1.84%--syscall_trace_enter.constprop.0
                          |                     |          
                          |                      --0.76%--__audit_syscall_entry
                          |          
                          |--1.79%--__pthread_enable_asynccancel
                          |          
                           --1.24%--__pthread_disable_asynccancel

CrushTester leading to libc_close consuming 99% of the cpu insteading of
creating a fork:

            ---Monitor::handle_command
               PaxosService::dispatch
               OSDMonitor::prepare_update
               OSDMonitor::prepare_command
               OSDMonitor::prepare_command_impl
               OSDMonitor::prepare_new_pool
               CrushTester::test_with_fork
               __libc_close (inlined)
               entry_SYSCALL_64_after_hwframe
               do_syscall_64
               syscall_enter_from_user_mode

from mon logs termination can be observed

2022-09-15T14:29:06.155+0000 7fe451263700 10 -- [v2:10.6.125.48:3300/0,v1:10.6.125.48:6789/0] >> 10.6.137.218:0/260641845 conn(0x5638020d9400 msgr2=0x5637fa16f600 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1)._try_send sent bytes 96 remaining bytes 0
2022-09-15T14:29:06.209+0000 7fe458271700 -1 received  signal: Terminated from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
2022-09-15T14:29:06.209+0000 7fe458271700 -1 mon.a@0(leader) e1 *** Got Signal Terminated ***

there has been no assert failure or anything of sorts, adding it here to see if anyone has any feedback/hit this issue so far.


Related issues 2 (2 open0 closed)

Related to RADOS - Bug #46266: Monitor crashed in creating pool in CrushTester::test_with_fork()Need More Info

Actions
Related to RADOS - Feature #58168: extra debugs for: [mon] high cpu usage by fn_monstore threadPending Backport

Actions
Actions

Also available in: Atom PDF