Project

General

Profile

Bug #57782

[mon] high cpu usage by fn_monstore thread

Added by Deepika Upadhyay over 1 year ago. Updated 8 months ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We observed high cpu usage by ms_dispatch and fn_monstore thread (amounting to 100-99% in top) Ceph [ deployment was with rook ]

We ran gdb and perf tooling to understand better what might be the origin and found to be :

   100.00%     0.00%  ms_dispatch  ceph-mon             [.] Monitor::handle_command
            |
            ---Monitor::handle_command
               PaxosService::dispatch
               OSDMonitor::prepare_update
               OSDMonitor::prepare_command
               OSDMonitor::prepare_command_impl
               OSDMonitor::prepare_new_pool
               CrushTester::test_with_fork
               |          
                --99.81%--__libc_close (inlined)
                          |          
                          |--33.42%--entry_SYSCALL_64_after_hwframe
                          |          do_syscall_64
                          |          |          
                          |          |--22.22%--syscall_enter_from_user_mode
                          |          |          
                          |          |--6.06%--__x64_sys_close
                          |          |          |          
                          |          |           --4.57%--close_fd
                          |          |                     |          
                          |          |                      --3.56%--pick_file
                          |          |                                |          
                          |          |                                 --2.72%--_raw_spin_lock
                          |          |                                           |          
                          |          |                                            --1.38%--preempt_count_add
                          |          |          
                          |          |--2.48%--syscall_exit_to_user_mode
                          |          |          |          
                          |          |           --2.11%--syscall_exit_to_user_mode_prepare
                          |          |                     |          
                          |          |                      --0.62%--__audit_syscall_exit
                          |          |          
                          |           --1.84%--syscall_trace_enter.constprop.0
                          |                     |          
                          |                      --0.76%--__audit_syscall_entry
                          |          
                          |--1.79%--__pthread_enable_asynccancel
                          |          
                           --1.24%--__pthread_disable_asynccancel

CrushTester leading to libc_close consuming 99% of the cpu insteading of
creating a fork:

            ---Monitor::handle_command
               PaxosService::dispatch
               OSDMonitor::prepare_update
               OSDMonitor::prepare_command
               OSDMonitor::prepare_command_impl
               OSDMonitor::prepare_new_pool
               CrushTester::test_with_fork
               __libc_close (inlined)
               entry_SYSCALL_64_after_hwframe
               do_syscall_64
               syscall_enter_from_user_mode

from mon logs termination can be observed

2022-09-15T14:29:06.155+0000 7fe451263700 10 -- [v2:10.6.125.48:3300/0,v1:10.6.125.48:6789/0] >> 10.6.137.218:0/260641845 conn(0x5638020d9400 msgr2=0x5637fa16f600 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1)._try_send sent bytes 96 remaining bytes 0
2022-09-15T14:29:06.209+0000 7fe458271700 -1 received  signal: Terminated from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
2022-09-15T14:29:06.209+0000 7fe458271700 -1 mon.a@0(leader) e1 *** Got Signal Terminated ***

there has been no assert failure or anything of sorts, adding it here to see if anyone has any feedback/hit this issue so far.


Related issues

Related to RADOS - Bug #46266: Monitor crashed in creating pool in CrushTester::test_with_fork() Need More Info
Related to RADOS - Feature #58168: extra debugs for: [mon] high cpu usage by fn_monstore thread Pending Backport

History

#1 Updated by Radoslaw Zarzynski over 1 year ago

  • Related to Bug #46266: Monitor crashed in creating pool in CrushTester::test_with_fork() added

#2 Updated by Radoslaw Zarzynski over 1 year ago

  • Status changed from New to Need More Info

It looks we're burning CPU in close(2). The single call site I can spot is in write_data_set_to_csv. Let's analyze the path to it from test_with_fork.

int CrushTester::test_with_fork(int timeout)
{
  ostringstream sink;
  int r = fork_function(timeout, sink, [&]() {
      return test();
    });
  if (r == -ETIMEDOUT) {
    err << "timed out during smoke test (" << timeout << " seconds)";
  }
  return r;
}

int CrushTester::test()
{
  // ...
  for (int r = min_rule; r < crush.get_max_rules() && r <= max_rule; r++) {
    // ...
    for (int nr = min_rep; nr <= max_rep; nr++) {
      // ...
      if (output_csv)
        write_data_set_to_csv(output_data_file_name+rule_tag,tester_data);
    }

What those loops' conditions are wrong? Unfortunately, this code entirely lacks debugs, so verifying this hypothesis is problematic. How about putting them to code?

#3 Updated by Deepika Upadhyay over 1 year ago

Hey Radek,

makes sense, I created a debug branch https://github.com/ceph/ceph-ci/pull/new/wip-crush-debug and might have packages tomorrow available, can rerun in either failing env or some teuthology test so that we can understand this better, will update soon.

#5 Updated by Deepika Upadhyay over 1 year ago

  • Pull request ID set to 48513

#6 Updated by Deepika Upadhyay over 1 year ago

  • Pull request ID deleted (48513)

#7 Updated by Yuri Weinstein about 1 year ago

Radoslaw Zarzynski wrote:

NOT A FIX (extra debugs): https://github.com/ceph/ceph/pull/48513

merged

#8 Updated by Radoslaw Zarzynski about 1 year ago

  • Related to Feature #58168: extra debugs for: [mon] high cpu usage by fn_monstore thread added

#9 Updated by Lucian Petrut 11 months ago

We hit the same issue while creating pools:

#0  0x00007fd77d91b5a0 in close@plt () from /usr/lib64/ceph/libceph-common.so.2
#1  0x00007fd77df6c26b in CrushTester::test_with_fork(int) () from /usr/lib64/ceph/libceph-common.so.2
#2  0x000055beecf7825a in OSDMonitor::prepare_new_pool(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned int, unsigned int, unsigned int, unsigned long, unsigned long, float, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned long, OSDMonitor::FastReadType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, std::ostream*) ()
    at /usr/src/debug/ceph-17.2.5-0.el8.x86_64/src/mon/OSDMonitor.cc:8058
#3  0x000055beecf97315 in OSDMonitor::prepare_command_impl(boost::intrusive_ptr<MonOpRequest>, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > > const&) () at /usr/src/debug/ceph-17.2.5-0.el8.x86_64/src/mon/OSDMonitor.cc:13100
#4  0x000055beecf9accd in OSDMonitor::prepare_command(boost::intrusive_ptr<MonOpRequest>) () at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/atomic_base.h:377
#5  0x000055beecf9b31b in OSDMonitor::prepare_update(boost::intrusive_ptr<MonOpRequest>) () at /opt/rh/gcc-toolset-11/root/usr/include/c++/11/bits/atomic_base.h:377
#6  0x000055beecf0ac0e in PaxosService::dispatch (this=0x55beef035000, op=...) at /usr/src/debug/ceph-17.2.5-0.el8.x86_64/src/mon/PaxosService.cc:96

The fd close hangs and takes 100% cpu. We were able to work around it by disabling the crush smoke test: mon_osd_crush_smoke_test = false.

In our case, it only occurs while running ceph-mon in a container on top of flatcar and runs fine when running the ceph-mon container on top of an ubuntu host.

#10 Updated by Radoslaw Zarzynski 11 months ago

In the upcoming v17.2.6 we'll have the extra debugs (see https://github.com/ceph/ceph/pull/50406). Would you mind to retaking there? The alternative would to make a custom build.

#11 Updated by Peter Goron 9 months ago

Hello,

Reporting same issue with a ceph-16.2.10 cluster deployed on top of kubernetes (rook-1.9.12) when adding a new pool to an existing cluster.
ceph-mon subprocess doing 100% cpu.

(gdb) thread apply all bt

Thread 1 (Thread 0x7fcc3f68f700 (LWP 2216616)):
#0  0x00007fcc4af16a77 in close () from target:/lib64/libpthread.so.0
#1  0x00007fcc4d799da4 in CrushTester::test_with_fork(int) () from target:/usr/lib64/ceph/libceph-common.so.2
#2  0x000055d49cf23dfc in OSDMonitor::prepare_new_pool(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned int, unsigned int, unsigned int, unsigned long, unsigned long, float, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned long, OSDMonitor::FastReadType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, std::ostream*) ()
#3  0x000055d49cf52403 in OSDMonitor::prepare_command_impl(boost::intrusive_ptr<MonOpRequest>, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > > const&) ()
#4  0x000055d49cf575d4 in OSDMonitor::prepare_command(boost::intrusive_ptr<MonOpRequest>) ()
#5  0x000055d49cf5b8b3 in OSDMonitor::prepare_update(boost::intrusive_ptr<MonOpRequest>) ()
#6  0x000055d49cedbecd in PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>) ()
#7  0x000055d49cdc1834 in Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>) ()
#8  0x000055d49cdc6300 in Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>) ()
#9  0x000055d49cdc7650 in Monitor::_ms_dispatch(Message*) ()
#10 0x000055d49cdf654c in Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&) ()
#11 0x00007fcc4d421baa in DispatchQueue::entry() () from target:/usr/lib64/ceph/libceph-common.so.2
#12 0x00007fcc4d4d3891 in DispatchQueue::DispatchThread::entry() () from target:/usr/lib64/ceph/libceph-common.so.2
#13 0x00007fcc4af0d17a in start_thread () from target:/lib64/libpthread.so.0
#14 0x00007fcc49c3bdc3 in clone () from target:/lib64/libc.so.6

strace extract:

strace: Process 2216616 attached
close(661150481)                        = -1 EBADF (Bad file descriptor)
close(661150482)                        = -1 EBADF (Bad file descriptor)
close(661150483)                        = -1 EBADF (Bad file descriptor)
close(661150484)                        = -1 EBADF (Bad file descriptor)
close(661150485)                        = -1 EBADF (Bad file descriptor)
close(661150486)                        = -1 EBADF (Bad file descriptor)
close(661150487)                        = -1 EBADF (Bad file descriptor)
close(661150488)                        = -1 EBADF (Bad file descriptor)
close(661150489)                        = -1 EBADF (Bad file descriptor)
close(661150490)                        = -1 EBADF (Bad file descriptor)
close(661150491)                        = -1 EBADF (Bad file descriptor)
close(661150492)                        = -1 EBADF (Bad file descriptor)
close(661150493)                        = -1 EBADF (Bad file descriptor)
close(661150494)                        = -1 EBADF (Bad file descriptor)
close(661150495)                        = -1 EBADF (Bad file descriptor)
close(661150496)                        = -1 EBADF (Bad file descriptor)
close(661150497)                        = -1 EBADF (Bad file descriptor)
...

I tried to reproduce smoketest with crushtool -i /tmp/crushmap --test --min_x 0 --min_x 50 --rule 8 --show-mappings without success so far.

https://github.com/ceph/ceph/blob/v16.2.10/src/mon/OSDMonitor.cc#L7986

  if (g_conf()->mon_osd_crush_smoke_test) {
    CrushWrapper newcrush;
    _get_pending_crush(newcrush);
    ostringstream err;
    CrushTester tester(newcrush, err);
    tester.set_min_x(0);
    tester.set_max_x(50);
    tester.set_rule(crush_rule);
    auto start = ceph::coarse_mono_clock::now();
    r = tester.test_with_fork(g_conf()->mon_lease);
    ...

I don't understand how "close" syscall can be invoked knowing that CrushTester csv_output is disabled by default, and not enabled by smoketest. I would bet somehow on a memory corruption.

#12 Updated by Radoslaw Zarzynski 8 months ago

Would you mind trying with a fresh quincy (has some extra debugs)? Custom pacific build also might be a way.

#13 Updated by Peter Goron 8 months ago

Hi Radoslaw,
I will try to reproduce first. We have disabled mon_osd_crush_smoke_test to create our pools.
If after re-enabling mon_osd_crush_smoke_test, crushmap is still in a state that triggers the problem, then I can try to do a custom build with your logs (for various reason we are stuck on 16.2.10 for now)

it will take a bit of time because I will have to create a custom build chain (we need to rebuild rpms & integrate non-official rpm sources in our docker images)

Also available in: Atom PDF