Project

General

Profile

Bug #57674

fuse mount crashes the standby MDSes

Added by Jos Collin 2 months ago. Updated about 2 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
ceph-fuse
Labels (FS):
crash
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

fuse mount fs to large number of clients crashes standby MDSes and hangs df. Thus a 2000 fuse clients cannot be achieved. This happens more accurately when using a single filesystem (the default one). If the default fs 'a' is mounted 1000 times,

for i in {1..1000}; do sudo ./bin/ceph-fuse --client_mds_namespace a -m <ip>:<port> /mnt/cephfs$i/; done 
crashes the standby MDSes exactly at 990th mount:

 in thread 7fcd23364640 thread_name:msgr-worker-0

 ceph version 17.0.0-14822-ge13e17a6b87 (e13e17a6b870dd12ea2b6f0a9e0a7306b626f23f) quincy (dev)
 1: /home/jcollin/workspace/ceph/build/bin/ceph-mds(+0x6c5a27) [0x55ce344dda27]
 2: /lib64/libc.so.6(+0x3ea70) [0x7fcd2483ea70]
 3: /lib64/libc.so.6(+0x8ec4c) [0x7fcd2488ec4c]
 4: raise()
 5: abort()
 6: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x24c) [0x7fcd25d9f630]
 7: (Processor::accept()+0x4f9) [0x7fcd25f60233]
 8: (Processor::C_processor_accept::do_request(unsigned long)+0xd) [0x7fcd25f680c5]
 9: (EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0x293) [0x7fcd25fa34eb]
 10: /home/jcollin/workspace/ceph/build/lib/libceph-common.so.2(+0x9a88e2) [0x7fcd25fa88e2]
 11: /home/jcollin/workspace/ceph/build/lib/libceph-common.so.2(+0x9a8a17) [0x7fcd25fa8a17]
 12: (std::function<void ()>::operator()() const+0xe) [0x7fcd25fa817e]
 13: (std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::function<void ()> > > >::_M_run()+0xd) [0x7fcd25fa8195]
 14: /lib64/libstdc++.so.6(+0xdbb73) [0x7fcd24cdbb73]
 15: /lib64/libc.so.6(+0x8ce2d) [0x7fcd2488ce2d]
 16: /lib64/libc.so.6(+0x1121b0) [0x7fcd249121b0]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

However, the exact mount cannot be found if more number of filesystems are used to achieve 2000 clients.

History

#1 Updated by Venky Shankar 2 months ago

  • Status changed from New to Triaged
  • Assignee set to Kotresh Hiremath Ravishankar
  • Target version set to v18.0.0

#2 Updated by Greg Farnum 2 months ago

  • Assignee changed from Kotresh Hiremath Ravishankar to Jos Collin

Jos said he could take more of a look at this.

#3 Updated by Jos Collin 2 months ago

  • Status changed from Triaged to In Progress

#4 Updated by Jos Collin about 2 months ago

  • Status changed from In Progress to Closed

This is not a bug, just the limit reached.

Processor -- accept open file descriptions limit reached sd = 20 errno -24 (24) Too many open files

#5 Updated by Xiubo Li about 2 months ago

Jos Collin wrote:

This is not a bug, just the limit reached.

Processor -- accept open file descriptions limit reached sd = 20 errno -24 (24) Too many open files

A similar issue with https://tracker.ceph.com/issues/43039.

Also available in: Atom PDF