Bug #4385: mds: refusing connections with high open socket count - CephFS - Ceph

Actions

Copy link

Bug #4385

closed

mds: refusing connections with high open socket count

Added by Noah Watkins about 11 years ago. Updated about 11 years ago.

Status:

Resolved

Priority:

High

Assignee:

Sage Weil

Category:

Target version:

% Done:

Source:

Development

Tags:

Backport:

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(FS):

Labels (FS):

Pull request ID:

Crash signature (v1):

Crash signature (v2):

Description

My MDS has become unresponsive after a long period of map-reduce jobs. The MDS process is idle, but is eating up 16 GB of virt memory. It also has over 8000 open socket file descriptors.

New clients have a lot of connection refused messages:

2013-03-07 18:00:11.169075 7f0080131700 10 -- 192.168.141.127:0/3067547546 >> 192.168.141.144:6800/8274 pipe(0x7f008437bb70 sd=63 :0 s=1 pgs=0 cs=0 l=0).connecting to 192.168.141.144:6800/8274
2013-03-07 18:00:11.169166 7f0080131700  2 -- 192.168.141.127:0/3067547546 >> 192.168.141.144:6800/8274 pipe(0x7f008437bb70 sd=63 :0 s=1 pgs=0 cs=0 l=0).connect error 192.168.141.144:6800/8274, 111: Connection refused
2013-03-07 18:00:11.169193 7f0080131700  2 -- 192.168.141.127:0/3067547546 >> 192.168.141.144:6800/8274 pipe(0x7f008437bb70 sd=63 :0 s=1 pgs=0 cs=0 l=0).fault 111: Connection refused
2013-03-07 18:00:11.169205 7f0080131700 10 -- 192.168.141.127:0/3067547546 >> 192.168.141.144:6800/8274 pipe(0x7f008437bb70 sd=63 :0 s=1 pgs=0 cs=0 l=0).fault waiting 15.000000

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Greg Farnum about 11 years ago

The direct cause of this is almost certainly an open fd limit coming from the OS, which you can probably work around by changing the ulimit or whatever.

The more interesting question is why on earth it has so many open sockets — it ought to be closing them down after they're idle for enough time.

Actions

Copy link

Updated by Noah Watkins about 11 years ago

I'll test out the ulimit as a workaround, and presumably to verify the open fd limit theory.

I checked all my client nodes and they don't seem to have any stuck client processes with open connections. If I dump up the logging level on the OSD will we be able to see the FDs, so we can correlate the log with the lsof output?

Actions

Copy link

Updated by Noah Watkins about 11 years ago

Err, dump up the level on the MDS...

Actions

Copy link

Updated by Greg Farnum about 11 years ago

Doesn't /proc tell you whether the fd is a socket or not? Or do you mean correlate activity?

In any case, all the Pipe debugging output includes the sd it's using — you'll probably want debug_ms 10 in order to get anything useful.

(Side note: while of course I want to debug this, restarting the MDS will presumably free up all those descriptors if necessary — and if not then this is either much weirder or not the kind of but I think it is.)

Actions

Copy link

Updated by Noah Watkins about 11 years ago

I had this thought that the set of FDs in the logs would be >> than the set shown in lsof, and that we'd want to cross-reference lsof from the log to figure out which class of socket was being left open. But maybe that's overkill :P

I'll do some investigation on this. We aren't under a time crunch or anything, but want to be able to keep MR jobs running continuously.

Actions

Copy link

Updated by Sage Weil about 11 years ago

I bet #3630 is contributing here.

Actions

Copy link

Updated by Greg Farnum about 11 years ago

It might be contributing, but I believe the sockets should still be getting closed after a timeout period, right?

Actions

Copy link

Updated by Noah Watkins about 11 years ago

Is there anything I can do to get more information for this ticket?

Actions

Copy link

Updated by Noah Watkins about 11 years ago

To Greg's question, it seems as though the connections were not timing out. I'd toss out a rough estimate of about 45 minutes before I took down the MDS and they were still there.

Actions

Copy link

#10

Updated by Sage Weil about 11 years ago

can you reproduce with debug ms = 20 and debug mds = 20 ? those logs would be helpful

Actions

Copy link

#11

Updated by Noah Watkins about 11 years ago

Would you like to logs up to the point that the MDS stops accepts connections, or just a snap shot after the FD list has grown large. For example, after a few rounds MR jobs with 256 map-tasks each (that have completed), I'm up to about 800 open sockets. Either is OK, the lots for the former will be enormous.

Actions

Copy link

#12

Updated by Noah Watkins about 11 years ago

Log file fun. Here is the MDS log up until it stopped accepting connections.

http://piha.soe.ucsc.edu/ceph-mds.a.log.tar.gz

Actions

Copy link

#13

Updated by Noah Watkins about 11 years ago

Here's some more info after investigating this a bit further.

Open socket counts by category after a fresh MDS reboot.

    378 REG
    180 FIFO
     90 IPv4
     54 CHR
     36 DIR
     18 unix

Open sockets after 1000 clean, mount/shutdown iterations.

1993964 sock
  42126 REG
  20060 FIFO
  10030 IPv4
   6018 CHR
   4012 DIR
   2006 unix

Driver program

int main(){
  struct ceph_mount_info *cmount;

  for (int i = 0; i < 1000; i++) {
    assert(ceph_create(&cmount, NULL) == 0);
    assert(ceph_conf_read_file(cmount, NULL) == 0);
    assert(ceph_mount(cmount, "/") == 0);

    assert(ceph_unmount(cmount) == 0);
    assert(ceph_release(cmount) == 0);
  }
}

Taking the diff between successive lsof output for 1, 2, ... mount attempts reveals the pattern. After one mount there are 20 threads each with an open socket:

ceph-mds  10361 10362 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10364 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10365 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10366 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10367 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10369 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10370 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10371 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10372 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10373 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10376 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10387 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10612 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10614 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10615 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10619 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10622 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10623 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10671 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10672 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol

After a second mount, each of the 20 threads have 2 open sockets, plus 2 new threads with two sockets.

ceph-mds  10361 10362 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10362 nwatkins   21u     sock                0,7       0t0   11657837 can't identify protocol
ceph-mds  10361 10364 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol
ceph-mds  10361 10364 nwatkins   21u     sock                0,7       0t0   11657837 can't identify protocol
ceph-mds  10361 10365 nwatkins   20u     sock                0,7       0t0   11657833 can't identify protocol

... clip ...

And so on. After 3 mounts, each of the previous 22 threads have 3 open sockets, plus 2 new threads with 3 open sockets.

Actions

Copy link

#14

Updated by Greg Farnum about 11 years ago

Hmm, did we screw up our refactoring work so that replaced sockets are no longer actually closed? That might explain this behavior.

Actions

Copy link

#15

Updated by Noah Watkins about 11 years ago

Although the high counts were because of double counting by lsof, the sockets still are not being closed. Without any active clients running, all the sockets created from previous mounts are still around after 30 minutes. I'm using the default settings.

Actions

Copy link

#16

Updated by Sage Weil about 11 years ago

Status changed from New to In Progress
Assignee set to Sage Weil

Actions

Copy link

#17

Updated by Sage Weil about 11 years ago

Noah, do you want to try wip-mds-con?

Actions

Copy link

#18

Updated by Noah Watkins about 11 years ago

Well hot damn. That branch seems to solve two problems. First, clients that do a clean unmount don't leave lots of FDs lingering in the MDS. Second, unclean mounts are now timing out the MDS-side sockets, after about 7 minutes. Is this mds_session_autoclose (defaulting to 5 minutes) ?

Actions

Copy link

#19