Bug #1788
closedmsgr file descriptor leak
0%
Description
With our Hadoop workload (lots of client connections), this problem occurs every couple hours -- although this is the first crash and most other instances the MDS stopped accepting requests.
Currently ulimit -n reports 65536 for the root user under which the MDS runs.
2011-12-05 15:45:10.569880 7f62483db700 -- 192.168.141.123:6800/1297 <== mon.0 192.168.141.123:6789/0 1080 ==== mdsbeacon(7997/a up:active seq 1032 v132) v2 ==== 103+0+0 (4146415485 0 0) 0x31e0780 con 0x1af68c0 2011-12-05 15:45:10.569930 7f62483db700 mds.0.6 handle_mds_beacon up:active seq 1032 rtt 0.000454 2011-12-05 15:45:10.669546 7f62483db700 mds.0.6 ms_handle_reset on 192.168.141.131:6800/1456 2011-12-05 15:45:10.669571 7f62483db700 -- 192.168.141.123:6800/1297 mark_down 0x2b4edc0 -- 0x430ac80 2011-12-05 15:45:10.669714 7f62483db700 mds.0.6 ms_handle_reset on 192.168.141.124:6800/1314 2011-12-05 15:45:10.669730 7f62483db700 -- 192.168.141.123:6800/1297 mark_down 0x1b1d000 -- 0x428ba00 2011-12-05 15:45:10.669899 7f61c5997700 -- 192.168.141.123:6800/1297 >> 192.168.141.124:6800/1314 pipe(0x31e0280 sd=-1 pgs=0 cs=0 l=0).connect couldn't created socket Too many open files msg/SimpleMessenger.cc: In function 'int SimpleMessenger::Pipe::connect()', in thread '7f61c5997700' msg/SimpleMessenger.cc: 1032: FAILED assert(0) ceph version 0.38-259-gd4aef20 (commit:d4aef20210d43e25eefe945009e6f77d5b045381) 1: (SimpleMessenger::Pipe::connect()+0xb10) [0x768220] 2: (SimpleMessenger::Pipe::writer()+0xc77) [0x76b3b7] 3: (SimpleMessenger::Pipe::Writer::entry()+0xd) [0x48edcd] 4: (()+0x7971) [0x7f624c650971] 5: (clone()+0x6d) [0x7f624aedf92d] ceph version 0.38-259-gd4aef20 (commit:d4aef20210d43e25eefe945009e6f77d5b045381) 1: (SimpleMessenger::Pipe::connect()+0xb10) [0x768220] 2: (SimpleMessenger::Pipe::writer()+0xc77) [0x76b3b7] 3: (SimpleMessenger::Pipe::Writer::entry()+0xd) [0x48edcd] 4: (()+0x7971) [0x7f624c650971] 5: (clone()+0x6d) [0x7f624aedf92d] *** Caught signal (Aborted) ** in thread 7f61c5997700 ceph version 0.38-259-gd4aef20 (commit:d4aef20210d43e25eefe945009e6f77d5b045381) 1: /usr/bin/ceph-mds() [0x7adfa4] 2: (()+0xfb40) [0x7f624c658b40] 3: (gsignal()+0x35) [0x7f624ae2cba5] 4: (abort()+0x180) [0x7f624ae306b0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f624b6d06bd] 6: (()+0xb9906) [0x7f624b6ce906] 7: (()+0xb9933) [0x7f624b6ce933] 8: (()+0xb9a3e) [0x7f624b6cea3e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x39f) [0x72e15f] 10: (SimpleMessenger::Pipe::connect()+0xb10) [0x768220] 11: (SimpleMessenger::Pipe::writer()+0xc77) [0x76b3b7] 12: (SimpleMessenger::Pipe::Writer::entry()+0xd) [0x48edcd] 13: (()+0x7971) [0x7f624c650971] 14: (clone()+0x6d) [0x7f624aedf92d] root@issdm-23:/var/log/ceph#
Updated by Sage Weil over 12 years ago
- Translation missing: en.field_position set to 15
Updated by Sage Weil over 12 years ago
- Subject changed from MDS file descriptor leak to msgr file descriptor leak
- Translation missing: en.field_position deleted (
20) - Translation missing: en.field_position set to 20
Updated by Sage Weil over 12 years ago
- Translation missing: en.field_position deleted (
20) - Translation missing: en.field_position set to 18
Updated by Greg Farnum over 12 years ago
- Category set to 1
- Status changed from New to 7
- Assignee set to Greg Farnum
I guess this bug should be considered fixed by commit:8c4f4748e8b683f5b4ea939295793421c0ab7b61 in the wip-messenger branch.
#1803 is a more serious fix for the issue.
Updated by Greg Farnum over 12 years ago
- Status changed from 7 to Resolved
Haven't heard any new issues from Noah; merged to master in commit:18d996370efc2fc32d4973e9e6934901558bcbaf.
Updated by Noah Watkins over 12 years ago
Forgot to update this. Haven't run into it yet and wip-messenger seemed to have fixed things. Thanks Greg!
Updated by John Spray over 7 years ago
- Project changed from Ceph to CephFS
- Category deleted (
1) - Target version deleted (
v0.40)
Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.