Project

General

Profile

Actions

Bug #1788

closed

msgr file descriptor leak

Added by Noah Watkins over 12 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

With our Hadoop workload (lots of client connections), this problem occurs every couple hours -- although this is the first crash and most other instances the MDS stopped accepting requests.

Currently ulimit -n reports 65536 for the root user under which the MDS runs.

2011-12-05 15:45:10.569880 7f62483db700 -- 192.168.141.123:6800/1297 <== mon.0 192.168.141.123:6789/0 1080 ==== mdsbeacon(7997/a up:active seq 1032 v132) v2 ==== 103+0+0 (4146415485 0 0) 0x31e0780 con 0x1af68c0
2011-12-05 15:45:10.569930 7f62483db700 mds.0.6 handle_mds_beacon up:active seq 1032 rtt 0.000454
2011-12-05 15:45:10.669546 7f62483db700 mds.0.6 ms_handle_reset on 192.168.141.131:6800/1456
2011-12-05 15:45:10.669571 7f62483db700 -- 192.168.141.123:6800/1297 mark_down 0x2b4edc0 -- 0x430ac80
2011-12-05 15:45:10.669714 7f62483db700 mds.0.6 ms_handle_reset on 192.168.141.124:6800/1314
2011-12-05 15:45:10.669730 7f62483db700 -- 192.168.141.123:6800/1297 mark_down 0x1b1d000 -- 0x428ba00
2011-12-05 15:45:10.669899 7f61c5997700 -- 192.168.141.123:6800/1297 >> 192.168.141.124:6800/1314 pipe(0x31e0280 sd=-1 pgs=0 cs=0 l=0).connect couldn't created socket Too many open files
msg/SimpleMessenger.cc: In function 'int SimpleMessenger::Pipe::connect()', in thread '7f61c5997700'
msg/SimpleMessenger.cc: 1032: FAILED assert(0)
 ceph version 0.38-259-gd4aef20 (commit:d4aef20210d43e25eefe945009e6f77d5b045381)
 1: (SimpleMessenger::Pipe::connect()+0xb10) [0x768220]
 2: (SimpleMessenger::Pipe::writer()+0xc77) [0x76b3b7]
 3: (SimpleMessenger::Pipe::Writer::entry()+0xd) [0x48edcd]
 4: (()+0x7971) [0x7f624c650971]
 5: (clone()+0x6d) [0x7f624aedf92d]
 ceph version 0.38-259-gd4aef20 (commit:d4aef20210d43e25eefe945009e6f77d5b045381)
 1: (SimpleMessenger::Pipe::connect()+0xb10) [0x768220]
 2: (SimpleMessenger::Pipe::writer()+0xc77) [0x76b3b7]
 3: (SimpleMessenger::Pipe::Writer::entry()+0xd) [0x48edcd]
 4: (()+0x7971) [0x7f624c650971]
 5: (clone()+0x6d) [0x7f624aedf92d]
*** Caught signal (Aborted) **
 in thread 7f61c5997700
 ceph version 0.38-259-gd4aef20 (commit:d4aef20210d43e25eefe945009e6f77d5b045381)
 1: /usr/bin/ceph-mds() [0x7adfa4]
 2: (()+0xfb40) [0x7f624c658b40]
 3: (gsignal()+0x35) [0x7f624ae2cba5]
 4: (abort()+0x180) [0x7f624ae306b0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f624b6d06bd]
 6: (()+0xb9906) [0x7f624b6ce906]
 7: (()+0xb9933) [0x7f624b6ce933]
 8: (()+0xb9a3e) [0x7f624b6cea3e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x39f) [0x72e15f]
 10: (SimpleMessenger::Pipe::connect()+0xb10) [0x768220]
 11: (SimpleMessenger::Pipe::writer()+0xc77) [0x76b3b7]
 12: (SimpleMessenger::Pipe::Writer::entry()+0xd) [0x48edcd]
 13: (()+0x7971) [0x7f624c650971]
 14: (clone()+0x6d) [0x7f624aedf92d]
root@issdm-23:/var/log/ceph# 
Actions #1

Updated by Sage Weil over 12 years ago

  • Target version set to v0.40
Actions #2

Updated by Sage Weil over 12 years ago

  • Translation missing: en.field_position set to 15
Actions #3

Updated by Sage Weil over 12 years ago

  • Subject changed from MDS file descriptor leak to msgr file descriptor leak
  • Translation missing: en.field_position deleted (20)
  • Translation missing: en.field_position set to 20
Actions #4

Updated by Sage Weil over 12 years ago

  • Translation missing: en.field_position deleted (20)
  • Translation missing: en.field_position set to 18
Actions #5

Updated by Greg Farnum over 12 years ago

  • Category set to 1
  • Status changed from New to 7
  • Assignee set to Greg Farnum

I guess this bug should be considered fixed by commit:8c4f4748e8b683f5b4ea939295793421c0ab7b61 in the wip-messenger branch.
#1803 is a more serious fix for the issue.

Actions #6

Updated by Greg Farnum over 12 years ago

  • Status changed from 7 to Resolved

Haven't heard any new issues from Noah; merged to master in commit:18d996370efc2fc32d4973e9e6934901558bcbaf.

Actions #7

Updated by Noah Watkins over 12 years ago

Forgot to update this. Haven't run into it yet and wip-messenger seemed to have fixed things. Thanks Greg!

Actions #8

Updated by John Spray over 7 years ago

  • Project changed from Ceph to CephFS
  • Category deleted (1)
  • Target version deleted (v0.40)

Bulk updating project=ceph category=mds bugs so that I can remove the MDS category from the Ceph project to avoid confusion.

Actions

Also available in: Atom PDF