Bug #64103
Updated by Ernesto Puerta 4 months ago
h3. Issue During the start-up phase of a minimalistic vstart cluster (v18.2.1, 1 mon, 1 mgr, 3 memstore OSD) inside a container (Fedora 38, Docker CE 24.0.7, build afdd53b), upon receiving the `ceph osd pool create rbd...` ceph-mon's ms_distpach threads start consuming 100% during ~120 seconds. After that another it produces another 100% CPU, and then the cluster comes back to normal. An @strace -kk -f -p $(pidof -s ceph-mon)@ inside the container during the 100% phase, produces the following dump: <pre><code class="text"> 1034 close(64692) = -1 EBADF (Bad file descriptor) > /usr/lib64/libc.so.6(__close+0x3b) [0x13f3cb] > /usr/lib64/ceph/libceph-common.so.2(CrushTester::test_with_fork(ceph::common::CephContext*, int)+0x3c2) [0x6222b2] > /usr/bin/ceph-mon(OSDMonitor::prepare_new_pool(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocat or<char> >&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, unsigned int, unsigned int, unsigned int, unsigned long, unsigned long, float, std::__cxx11::basic_string<char, std::c har_traits<char>, std::allocator<char> > const&, unsigned int, unsigned long, OSDMonitor::FastReadType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, std::ostream*)+0x532) [0x47a8a2] > /usr/bin/ceph-mon(OSDMonitor::prepare_command_impl(boost::intrusive_ptr<MonOpRequest>, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > > const&)+0x1af07) [0x4a0167] > /usr/bin/ceph-mon(OSDMonitor::prepare_command(boost::intrusive_ptr<MonOpRequest>)+0x314) [0x484a14] > /usr/bin/ceph-mon(OSDMonitor::prepare_update(boost::intrusive_ptr<MonOpRequest>)+0x14a) [0x44c17a] > /usr/bin/ceph-mon(PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x81a) [0x422d6a] > /usr/bin/ceph-mon(Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x23ef) [0x32c0bf] > /usr/bin/ceph-mon(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x9e7) [0x3380e7] > /usr/bin/ceph-mon(Monitor::_ms_dispatch(Message*)+0x7b0) [0x339390] > /usr/bin/ceph-mon(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x4d) [0x2f3e8d] > /usr/lib64/ceph/libceph-common.so.2(DispatchQueue::entry()+0x539) [0x358cc9] > /usr/lib64/ceph/libceph-common.so.2(DispatchQueue::DispatchThread::entry()+0x10) [0x3e9750] > /usr/lib64/libc.so.6(start_thread+0x2d1) [0xa1911] > /usr/lib64/libc.so.6(__clone+0x43) [0x3f313] 1034 close(64693) = -1 EBADF (Bad file descriptor) </code></pre> For the first 0 to 46 file descriptors, the close succeeds, but from 47 onwards it produces a EBADF error. Ceph mon logs during that CPU peak are like: <pre><code class="text"> 2024-01-18T00:54:22.254+0000 7f1dbd151640 1 --2- [v2:192.168.13.2:10000/0,v1:192.168.13.2:10001/0] >> conn(0x558310bde000 0x558310fad080 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._handle_peer_banner_payload supported=3 required=0 2024-01-18T00:54:22.254+0000 7f1dbd151640 10 mon.a@0(leader) e1 handle_auth_request con 0x558310bde000 (start) method 2 payload 22 2024-01-18T00:54:22.254+0000 7f1dbd151640 10 mon.a@0(leader).auth v6 _assign_global_id 4156 (max 14096) 2024-01-18T00:54:22.254+0000 7f1dbd151640 10 mon.a@0(leader) e1 handle_auth_request con 0x558310bde000 (more) method 2 payload 36 2024-01-18T00:54:22.254+0000 7f1dbd151640 10 mon.a@0(leader) e1 ms_handle_fast_authentication session 0x558310c0f200 con 0x558310bde000 addr - MonSession(unknown.0 is open , features 0x0 (unknown)) 2024-01-18T00:54:22.255+0000 7f1dbd151640 1 --2- [v2:192.168.13.2:10000/0,v1:192.168.13.2:10001/0] >> 192.168.13.2:0/2799323384 conn(0x558310bde000 0x558310fad080 secure :-1 s=READY pgs=1 cs=0 l=1 rev1=1 crypto rx=0x5583117633b0 tx=0x558311788960 comp rx=0 tx=0).ready entity=client.? client_cookie=e1596264e13397d6 server_cookie=0 in_seq=0 out_seq=0 2024-01-18T00:54:24.731+0000 7f1db8147640 1 -- 192.168.13.2:0/94 --> [v2:192.168.13.2:6800/263,v1:192.168.13.2:6801/263] -- mgrreport(mon.a +0-0 packed 1150) v9 -- 0x558310c0d180 con 0x558310ee5c00 2024-01-18T00:54:29.731+0000 7f1db8147640 1 -- 192.168.13.2:0/94 --> [v2:192.168.13.2:6800/263,v1:192.168.13.2:6801/263] -- mgrreport(mon.a +0-0 packed 1150) v9 -- 0x558310c0d500 con 0x558310ee5c00 </code></pre> h3. Environment Reproduced with "Docker CE v24.0.7":https://docs.docker.com/engine/install/fedora/ (not with Moby-engine Docker v24.0.7, the Fedora default) in both Fedora and Photon OS (VMWare's OS for VMs). The vstart cluster is created and launched as in "the ceph nvmeof repo":https://github.com/ceph/ceph-nvmeof. It's RPM based (v18.2.1 packages), but "uses vstart for the configuration and launch of the services":https://github.com/ceph/ceph-nvmeof/blob/devel/Dockerfile.ceph#L62-L93. <pre><code class="text"> git clone https://github.com/ceph/ceph-nvmeof.git cd ceph-nvmeof make pull SVC=ceph make up SVC=ceph # In a different terminal make shell SVC=ceph top </code></pre> h3. Workaround Setting @ulimit -n 1024@ in the container is enough to "fix" this issue.