Project

General

Profile

Actions

Bug #64103

open

mon: ms_dispatch thread taking 100% CPU for ~2 minutes

Added by Ernesto Puerta 4 months ago. Updated 4 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Community (dev)
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Messenger, Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Issue

During the start-up phase of a minimalistic vstart cluster (v18.2.1, 1 mon, 1 mgr, 3 memstore OSD) inside a container (Fedora 38, Docker CE 24.0.7, build afdd53b), upon receiving the `ceph osd pool create rbd...` ceph-mon's ms_distpach threads start consuming 100% during ~120 seconds. After that another it produces another 100% CPU, and then the cluster comes back to normal.

An strace -kk -f -p $(pidof -s ceph-mon) inside the container during the 100% phase, produces the following dump:

1034  close(64692)                      = -1 EBADF (Bad file descriptor)
 > /usr/lib64/libc.so.6(__close+0x3b) [0x13f3cb]
 > /usr/lib64/ceph/libceph-common.so.2(CrushTester::test_with_fork(ceph::common::CephContext*, int)+0x3c2) [0x6222b2]
 > /usr/bin/ceph-mon(OSDMonitor::prepare_new_pool(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocat
or<char> >&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int,
 unsigned int, unsigned int, unsigned int, unsigned long, unsigned long, float, std::__cxx11::basic_string<char, std::c
har_traits<char>, std::allocator<char> > const&, unsigned int, unsigned long, OSDMonitor::FastReadType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, std::ostream*)+0x532) [0x47a8a2]
 > /usr/bin/ceph-mon(OSDMonitor::prepare_command_impl(boost::intrusive_ptr<MonOpRequest>, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > > const&)+0x1af07) [0x4a0167]
 > /usr/bin/ceph-mon(OSDMonitor::prepare_command(boost::intrusive_ptr<MonOpRequest>)+0x314) [0x484a14]
 > /usr/bin/ceph-mon(OSDMonitor::prepare_update(boost::intrusive_ptr<MonOpRequest>)+0x14a) [0x44c17a]
 > /usr/bin/ceph-mon(PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x81a) [0x422d6a]
 > /usr/bin/ceph-mon(Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x23ef) [0x32c0bf]
 > /usr/bin/ceph-mon(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x9e7) [0x3380e7]
 > /usr/bin/ceph-mon(Monitor::_ms_dispatch(Message*)+0x7b0) [0x339390]
 > /usr/bin/ceph-mon(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x4d) [0x2f3e8d]
 > /usr/lib64/ceph/libceph-common.so.2(DispatchQueue::entry()+0x539) [0x358cc9]
 > /usr/lib64/ceph/libceph-common.so.2(DispatchQueue::DispatchThread::entry()+0x10) [0x3e9750]
 > /usr/lib64/libc.so.6(start_thread+0x2d1) [0xa1911]
 > /usr/lib64/libc.so.6(__clone+0x43) [0x3f313]
1034  close(64693)                      = -1 EBADF (Bad file descriptor)

For the first 0 to 46 file descriptors, the close succeeds, but from 47 onwards it produces a EBADF error.

Ceph mon logs during that CPU peak are like:

2024-01-18T00:54:22.254+0000 7f1dbd151640  1 --2- [v2:192.168.13.2:10000/0,v1:192.168.13.2:10001/0] >>  conn(0x558310bde000 0x558310fad080 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._handle_peer_banner_payload supported=3 required=0
2024-01-18T00:54:22.254+0000 7f1dbd151640 10 mon.a@0(leader) e1 handle_auth_request con 0x558310bde000 (start) method 2 payload 22
2024-01-18T00:54:22.254+0000 7f1dbd151640 10 mon.a@0(leader).auth v6 _assign_global_id 4156 (max 14096)
2024-01-18T00:54:22.254+0000 7f1dbd151640 10 mon.a@0(leader) e1 handle_auth_request con 0x558310bde000 (more) method 2 payload 36
2024-01-18T00:54:22.254+0000 7f1dbd151640 10 mon.a@0(leader) e1 ms_handle_fast_authentication session 0x558310c0f200 con 0x558310bde000 addr - MonSession(unknown.0  is open , features 0x0 (unknown))
2024-01-18T00:54:22.255+0000 7f1dbd151640  1 --2- [v2:192.168.13.2:10000/0,v1:192.168.13.2:10001/0] >> 192.168.13.2:0/2799323384 conn(0x558310bde000 0x558310fad080 secure :-1 s=READY pgs=1 cs=0 l=1 rev1=1 crypto rx=0x5583117633b0 tx=0x558311788960 comp rx=0 tx=0).ready entity=client.? client_cookie=e1596264e13397d6 server_cookie=0 in_seq=0 out_seq=0
2024-01-18T00:54:24.731+0000 7f1db8147640  1 -- 192.168.13.2:0/94 --> [v2:192.168.13.2:6800/263,v1:192.168.13.2:6801/263] -- mgrreport(mon.a +0-0 packed 1150) v9 -- 0x558310c0d180 con 0x558310ee5c00
2024-01-18T00:54:29.731+0000 7f1db8147640  1 -- 192.168.13.2:0/94 --> [v2:192.168.13.2:6800/263,v1:192.168.13.2:6801/263] -- mgrreport(mon.a +0-0 packed 1150) v9 -- 0x558310c0d500 con 0x558310ee5c00

Environment

Reproduced with Docker CE v24.0.7 (not with Moby-engine Docker v24.0.7, the Fedora default) in both Fedora and Photon OS (VMWare's OS for VMs).

Ulimit inside Docker CE environment:


ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) unlimited
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 126697
max locked memory           (kbytes, -l) 8192
max memory size             (kbytes, -m) unlimited
open files                          (-n) unlimited
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) unlimited
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

Ulimit inside Docker Moby-engine:


real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) unlimited
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 127006
max locked memory           (kbytes, -l) 8192
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) unlimited
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) unlimited
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

The vstart cluster is created and launched as in the ceph nvmeof repo. It's RPM based (v18.2.1 packages), but uses vstart for the configuration and launch of the services.

git clone https://github.com/ceph/ceph-nvmeof.git

cd ceph-nvmeof

make pull SVC=ceph
make up SVC=ceph

# In a different terminal
make shell SVC=ceph
top

Workaround

Setting ulimit -n 1024 in the container is enough to "fix" this issue.

Actions #1

Updated by Ernesto Puerta 4 months ago

  • Description updated (diff)
  • Component(RADOS) deleted (Messenger)
Actions #2

Updated by Ernesto Puerta 4 months ago

  • Component(RADOS) Messenger added
Actions #3

Updated by Ernesto Puerta 4 months ago

  • Description updated (diff)
Actions

Also available in: Atom PDF