Bug #57674
closedfuse mount crashes the standby MDSes
0%
Description
fuse mount fs to large number of clients crashes standby MDSes and hangs df. Thus a 2000 fuse clients cannot be achieved. This happens more accurately when using a single filesystem (the default one). If the default fs 'a' is mounted 1000 times,
for i in {1..1000}; do sudo ./bin/ceph-fuse --client_mds_namespace a -m <ip>:<port> /mnt/cephfs$i/; donecrashes the standby MDSes exactly at 990th mount:
in thread 7fcd23364640 thread_name:msgr-worker-0 ceph version 17.0.0-14822-ge13e17a6b87 (e13e17a6b870dd12ea2b6f0a9e0a7306b626f23f) quincy (dev) 1: /home/jcollin/workspace/ceph/build/bin/ceph-mds(+0x6c5a27) [0x55ce344dda27] 2: /lib64/libc.so.6(+0x3ea70) [0x7fcd2483ea70] 3: /lib64/libc.so.6(+0x8ec4c) [0x7fcd2488ec4c] 4: raise() 5: abort() 6: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x24c) [0x7fcd25d9f630] 7: (Processor::accept()+0x4f9) [0x7fcd25f60233] 8: (Processor::C_processor_accept::do_request(unsigned long)+0xd) [0x7fcd25f680c5] 9: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0x293) [0x7fcd25fa34eb] 10: /home/jcollin/workspace/ceph/build/lib/libceph-common.so.2(+0x9a88e2) [0x7fcd25fa88e2] 11: /home/jcollin/workspace/ceph/build/lib/libceph-common.so.2(+0x9a8a17) [0x7fcd25fa8a17] 12: (std::function<void ()>::operator()() const+0xe) [0x7fcd25fa817e] 13: (std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::function<void ()> > > >::_M_run()+0xd) [0x7fcd25fa8195] 14: /lib64/libstdc++.so.6(+0xdbb73) [0x7fcd24cdbb73] 15: /lib64/libc.so.6(+0x8ce2d) [0x7fcd2488ce2d] 16: /lib64/libc.so.6(+0x1121b0) [0x7fcd249121b0] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
However, the exact mount cannot be found if more number of filesystems are used to achieve 2000 clients.
Updated by Venky Shankar over 1 year ago
- Status changed from New to Triaged
- Assignee set to Kotresh Hiremath Ravishankar
- Target version set to v18.0.0
Updated by Greg Farnum over 1 year ago
- Assignee changed from Kotresh Hiremath Ravishankar to Jos Collin
Jos said he could take more of a look at this.
Updated by Jos Collin over 1 year ago
- Status changed from Triaged to In Progress
Updated by Jos Collin over 1 year ago
- Status changed from In Progress to Closed
This is not a bug, just the limit reached.
Processor -- accept open file descriptions limit reached sd = 20 errno -24 (24) Too many open files
Updated by Xiubo Li over 1 year ago
Jos Collin wrote:
This is not a bug, just the limit reached.
Processor -- accept open file descriptions limit reached sd = 20 errno -24 (24) Too many open files
A similar issue with https://tracker.ceph.com/issues/43039.