Project

General

Profile

Bug #64103

Updated by Ernesto Puerta 4 months ago

h3. Issue 

 During the start-up phase of a minimalistic vstart cluster (v18.2.1, 1 mon, 1 mgr, 3 memstore OSD) inside a container (Fedora 38, Docker CE 24.0.7, build afdd53b), upon receiving the `ceph osd pool create rbd...` ceph-mon's ms_distpach threads start consuming 100% during ~120 seconds. After that another it produces another 100% CPU, and then the cluster comes back to normal. 

 An @strace -kk -f -p $(pidof -s ceph-mon)@ inside the container during the 100% phase, produces the following dump: 

 <pre><code class="text"> 
 1034    close(64692)                        = -1 EBADF (Bad file descriptor) 
  > /usr/lib64/libc.so.6(__close+0x3b) [0x13f3cb] 
  > /usr/lib64/ceph/libceph-common.so.2(CrushTester::test_with_fork(ceph::common::CephContext*, int)+0x3c2) [0x6222b2] 
  > /usr/bin/ceph-mon(OSDMonitor::prepare_new_pool(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocat 
 or<char> >&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, 
  unsigned int, unsigned int, unsigned int, unsigned long, unsigned long, float, std::__cxx11::basic_string<char, std::c 
 har_traits<char>, std::allocator<char> > const&, unsigned int, unsigned long, OSDMonitor::FastReadType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, std::ostream*)+0x532) [0x47a8a2] 
  > /usr/bin/ceph-mon(OSDMonitor::prepare_command_impl(boost::intrusive_ptr<MonOpRequest>, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > >, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, double, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<long, std::allocator<long> >, std::vector<double, std::allocator<double> > > > > > const&)+0x1af07) [0x4a0167] 
  > /usr/bin/ceph-mon(OSDMonitor::prepare_command(boost::intrusive_ptr<MonOpRequest>)+0x314) [0x484a14] 
  > /usr/bin/ceph-mon(OSDMonitor::prepare_update(boost::intrusive_ptr<MonOpRequest>)+0x14a) [0x44c17a] 
  > /usr/bin/ceph-mon(PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x81a) [0x422d6a] 
  > /usr/bin/ceph-mon(Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x23ef) [0x32c0bf] 
  > /usr/bin/ceph-mon(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x9e7) [0x3380e7] 
  > /usr/bin/ceph-mon(Monitor::_ms_dispatch(Message*)+0x7b0) [0x339390] 
  > /usr/bin/ceph-mon(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x4d) [0x2f3e8d] 
  > /usr/lib64/ceph/libceph-common.so.2(DispatchQueue::entry()+0x539) [0x358cc9] 
  > /usr/lib64/ceph/libceph-common.so.2(DispatchQueue::DispatchThread::entry()+0x10) [0x3e9750] 
  > /usr/lib64/libc.so.6(start_thread+0x2d1) [0xa1911] 
  > /usr/lib64/libc.so.6(__clone+0x43) [0x3f313] 
 1034    close(64693)                        = -1 EBADF (Bad file descriptor) 
 </code></pre> 

 For the first 0 to 46 file descriptors, the close succeeds, but from 47 onwards it produces a EBADF error. 

 Ceph mon logs during that CPU peak are like: 

 <pre><code class="text"> 
 2024-01-18T00:54:22.254+0000 7f1dbd151640    1 --2- [v2:192.168.13.2:10000/0,v1:192.168.13.2:10001/0] >>    conn(0x558310bde000 0x558310fad080 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._handle_peer_banner_payload supported=3 required=0 
 2024-01-18T00:54:22.254+0000 7f1dbd151640 10 mon.a@0(leader) e1 handle_auth_request con 0x558310bde000 (start) method 2 payload 22 
 2024-01-18T00:54:22.254+0000 7f1dbd151640 10 mon.a@0(leader).auth v6 _assign_global_id 4156 (max 14096) 
 2024-01-18T00:54:22.254+0000 7f1dbd151640 10 mon.a@0(leader) e1 handle_auth_request con 0x558310bde000 (more) method 2 payload 36 
 2024-01-18T00:54:22.254+0000 7f1dbd151640 10 mon.a@0(leader) e1 ms_handle_fast_authentication session 0x558310c0f200 con 0x558310bde000 addr - MonSession(unknown.0    is open , features 0x0 (unknown)) 
 2024-01-18T00:54:22.255+0000 7f1dbd151640    1 --2- [v2:192.168.13.2:10000/0,v1:192.168.13.2:10001/0] >> 192.168.13.2:0/2799323384 conn(0x558310bde000 0x558310fad080 secure :-1 s=READY pgs=1 cs=0 l=1 rev1=1 crypto rx=0x5583117633b0 tx=0x558311788960 comp rx=0 tx=0).ready entity=client.? client_cookie=e1596264e13397d6 server_cookie=0 in_seq=0 out_seq=0 
 2024-01-18T00:54:24.731+0000 7f1db8147640    1 -- 192.168.13.2:0/94 --> [v2:192.168.13.2:6800/263,v1:192.168.13.2:6801/263] -- mgrreport(mon.a +0-0 packed 1150) v9 -- 0x558310c0d180 con 0x558310ee5c00 
 2024-01-18T00:54:29.731+0000 7f1db8147640    1 -- 192.168.13.2:0/94 --> [v2:192.168.13.2:6800/263,v1:192.168.13.2:6801/263] -- mgrreport(mon.a +0-0 packed 1150) v9 -- 0x558310c0d500 con 0x558310ee5c00 
 </code></pre> 


 h3. Environment 

 Reproduced with "Docker CE v24.0.7":https://docs.docker.com/engine/install/fedora/ (not with Moby-engine Docker v24.0.7, the Fedora default) in both Fedora and Photon OS (VMWare's OS for VMs). 

 The vstart cluster is created and launched as in "the ceph nvmeof repo":https://github.com/ceph/ceph-nvmeof. It's RPM based (v18.2.1 packages), but "uses vstart for the configuration and launch of the services":https://github.com/ceph/ceph-nvmeof/blob/devel/Dockerfile.ceph#L62-L93. 

 <pre><code class="text"> 
 git clone https://github.com/ceph/ceph-nvmeof.git 

 cd ceph-nvmeof 

 make pull SVC=ceph 
 make up SVC=ceph 

 # In a different terminal 
 make shell SVC=ceph 
 top 
 </code></pre> 

 h3. Workaround 

 
 Setting @ulimit -n 1024@ in the container is enough to "fix" this issue.

Back