Bug #20670
closedOSD suicide on msgr exceeding fd limit
0%
Description
On fresh 12.1.0 install (and then 12.1.0-707), with bluestore + fuse + cephfs + ec_overwrites, I got some OSD flapping under write pressure.
Stacktrace here :
-2> <date> 7fe375393700 0 -- <osd_ip>:6821/53900 >> <ip>:0/1074864065 conn(0x7fe3bc26d800 :6821 s=STATE_OPEN pgs=1126 cs=1 l=1).process bad tag 50 -1> <date> 7fe375393700 0 -- <osd_ip>:6821/53900 >> <ip>:0/1074864065 conn(0x7fe3a3e6d000 :6821 s=STATE_OPEN pgs=1128 cs=1 l=1).process bad tag 50 0> <date> 7fe375393700 -1 *** Caught signal (Aborted) ** in thread 7fe375393700 thread_name:msgr-worker-0 ceph version 12.1.0-707-g5a197c5 (5a197c5817f591fc514f55b9929982e90d90084e) luminous (rc) 1: (()+0x9f2f71) [0x7fe37b24ef71] 2: (()+0xf370) [0x7fe377e8c370] 3: (gsignal()+0x37) [0x7fe376eb61d7] 4: (abort()+0x148) [0x7fe376eb78c8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fe3777ba9d5] 6: (()+0x5e946) [0x7fe3777b8946] 7: (()+0x5e973) [0x7fe3777b8973] 8: (()+0xb52c5) [0x7fe37780f2c5] 9: (()+0x7dc5) [0x7fe377e84dc5] 10: (clone()+0x6d) [0x7fe376f7876d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 2/ 2 mds 1/ 5 mds_balancer ...
Seems legit suicide on OSD side refering to src/msg/async/AsyncConnection.cc
.
Tried to find root cause without success.
Updated by Greg Farnum almost 7 years ago
- Status changed from New to Need More Info
Can you install the debug packages to get symbols? This is pretty unintelligible without them. :(
Updated by red ref over 6 years ago
In the meantime, I found the root cause by using ceph-fuse in interactive mode (not using fstab) and got "Too many open files" messages. Raising limits (ulimit) solved messages and OSD's problem.
Looking /proc/<pid>/fd/, number of file descriptors is slowly reaching number of OSD's during operations (more than my previous limit).
I will get back soon anyway with stacktrace.
Updated by Greg Farnum over 6 years ago
- Subject changed from OSD suicide on msgr bug (fuse client). to OSD suicide on msgr exceeding fd limit
Updated by Greg Farnum about 5 years ago
- Project changed from Ceph to Messengers
- Category deleted (
msgr)
Updated by Sage Weil about 5 years ago
- Status changed from Need More Info to Closed