Actions
Bug #1788
closedmsgr file descriptor leak
% Done:
0%
Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
With our Hadoop workload (lots of client connections), this problem occurs every couple hours -- although this is the first crash and most other instances the MDS stopped accepting requests.
Currently ulimit -n reports 65536 for the root user under which the MDS runs.
2011-12-05 15:45:10.569880 7f62483db700 -- 192.168.141.123:6800/1297 <== mon.0 192.168.141.123:6789/0 1080 ==== mdsbeacon(7997/a up:active seq 1032 v132) v2 ==== 103+0+0 (4146415485 0 0) 0x31e0780 con 0x1af68c0 2011-12-05 15:45:10.569930 7f62483db700 mds.0.6 handle_mds_beacon up:active seq 1032 rtt 0.000454 2011-12-05 15:45:10.669546 7f62483db700 mds.0.6 ms_handle_reset on 192.168.141.131:6800/1456 2011-12-05 15:45:10.669571 7f62483db700 -- 192.168.141.123:6800/1297 mark_down 0x2b4edc0 -- 0x430ac80 2011-12-05 15:45:10.669714 7f62483db700 mds.0.6 ms_handle_reset on 192.168.141.124:6800/1314 2011-12-05 15:45:10.669730 7f62483db700 -- 192.168.141.123:6800/1297 mark_down 0x1b1d000 -- 0x428ba00 2011-12-05 15:45:10.669899 7f61c5997700 -- 192.168.141.123:6800/1297 >> 192.168.141.124:6800/1314 pipe(0x31e0280 sd=-1 pgs=0 cs=0 l=0).connect couldn't created socket Too many open files msg/SimpleMessenger.cc: In function 'int SimpleMessenger::Pipe::connect()', in thread '7f61c5997700' msg/SimpleMessenger.cc: 1032: FAILED assert(0) ceph version 0.38-259-gd4aef20 (commit:d4aef20210d43e25eefe945009e6f77d5b045381) 1: (SimpleMessenger::Pipe::connect()+0xb10) [0x768220] 2: (SimpleMessenger::Pipe::writer()+0xc77) [0x76b3b7] 3: (SimpleMessenger::Pipe::Writer::entry()+0xd) [0x48edcd] 4: (()+0x7971) [0x7f624c650971] 5: (clone()+0x6d) [0x7f624aedf92d] ceph version 0.38-259-gd4aef20 (commit:d4aef20210d43e25eefe945009e6f77d5b045381) 1: (SimpleMessenger::Pipe::connect()+0xb10) [0x768220] 2: (SimpleMessenger::Pipe::writer()+0xc77) [0x76b3b7] 3: (SimpleMessenger::Pipe::Writer::entry()+0xd) [0x48edcd] 4: (()+0x7971) [0x7f624c650971] 5: (clone()+0x6d) [0x7f624aedf92d] *** Caught signal (Aborted) ** in thread 7f61c5997700 ceph version 0.38-259-gd4aef20 (commit:d4aef20210d43e25eefe945009e6f77d5b045381) 1: /usr/bin/ceph-mds() [0x7adfa4] 2: (()+0xfb40) [0x7f624c658b40] 3: (gsignal()+0x35) [0x7f624ae2cba5] 4: (abort()+0x180) [0x7f624ae306b0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f624b6d06bd] 6: (()+0xb9906) [0x7f624b6ce906] 7: (()+0xb9933) [0x7f624b6ce933] 8: (()+0xb9a3e) [0x7f624b6cea3e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x39f) [0x72e15f] 10: (SimpleMessenger::Pipe::connect()+0xb10) [0x768220] 11: (SimpleMessenger::Pipe::writer()+0xc77) [0x76b3b7] 12: (SimpleMessenger::Pipe::Writer::entry()+0xd) [0x48edcd] 13: (()+0x7971) [0x7f624c650971] 14: (clone()+0x6d) [0x7f624aedf92d] root@issdm-23:/var/log/ceph#
Actions