Bug #4385
closedmds: refusing connections with high open socket count
0%
Description
My MDS has become unresponsive after a long period of map-reduce jobs. The MDS process is idle, but is eating up 16 GB of virt memory. It also has over 8000 open socket file descriptors.
New clients have a lot of connection refused messages:
2013-03-07 18:00:11.169075 7f0080131700 10 -- 192.168.141.127:0/3067547546 >> 192.168.141.144:6800/8274 pipe(0x7f008437bb70 sd=63 :0 s=1 pgs=0 cs=0 l=0).connecting to 192.168.141.144:6800/8274 2013-03-07 18:00:11.169166 7f0080131700 2 -- 192.168.141.127:0/3067547546 >> 192.168.141.144:6800/8274 pipe(0x7f008437bb70 sd=63 :0 s=1 pgs=0 cs=0 l=0).connect error 192.168.141.144:6800/8274, 111: Connection refused 2013-03-07 18:00:11.169193 7f0080131700 2 -- 192.168.141.127:0/3067547546 >> 192.168.141.144:6800/8274 pipe(0x7f008437bb70 sd=63 :0 s=1 pgs=0 cs=0 l=0).fault 111: Connection refused 2013-03-07 18:00:11.169205 7f0080131700 10 -- 192.168.141.127:0/3067547546 >> 192.168.141.144:6800/8274 pipe(0x7f008437bb70 sd=63 :0 s=1 pgs=0 cs=0 l=0).fault waiting 15.000000
Updated by Greg Farnum about 11 years ago
The direct cause of this is almost certainly an open fd limit coming from the OS, which you can probably work around by changing the ulimit or whatever.
The more interesting question is why on earth it has so many open sockets — it ought to be closing them down after they're idle for enough time.
Updated by Noah Watkins about 11 years ago
I'll test out the ulimit as a workaround, and presumably to verify the open fd limit theory.
I checked all my client nodes and they don't seem to have any stuck client processes with open connections. If I dump up the logging level on the OSD will we be able to see the FDs, so we can correlate the log with the lsof output?
Updated by Greg Farnum about 11 years ago
Doesn't /proc tell you whether the fd is a socket or not? Or do you mean correlate activity?
In any case, all the Pipe debugging output includes the sd it's using — you'll probably want debug_ms 10 in order to get anything useful.
(Side note: while of course I want to debug this, restarting the MDS will presumably free up all those descriptors if necessary — and if not then this is either much weirder or not the kind of but I think it is.)
Updated by Noah Watkins about 11 years ago
I had this thought that the set of FDs in the logs would be >> than the set shown in lsof, and that we'd want to cross-reference lsof from the log to figure out which class of socket was being left open. But maybe that's overkill :P
I'll do some investigation on this. We aren't under a time crunch or anything, but want to be able to keep MR jobs running continuously.
Updated by Greg Farnum about 11 years ago
It might be contributing, but I believe the sockets should still be getting closed after a timeout period, right?
Updated by Noah Watkins about 11 years ago
Is there anything I can do to get more information for this ticket?
Updated by Noah Watkins about 11 years ago
To Greg's question, it seems as though the connections were not timing out. I'd toss out a rough estimate of about 45 minutes before I took down the MDS and they were still there.
Updated by Sage Weil about 11 years ago
can you reproduce with debug ms = 20 and debug mds = 20 ? those logs would be helpful
Updated by Noah Watkins about 11 years ago
Would you like to logs up to the point that the MDS stops accepts connections, or just a snap shot after the FD list has grown large. For example, after a few rounds MR jobs with 256 map-tasks each (that have completed), I'm up to about 800 open sockets. Either is OK, the lots for the former will be enormous.
Updated by Noah Watkins about 11 years ago
Log file fun. Here is the MDS log up until it stopped accepting connections.
Updated by Noah Watkins about 11 years ago
Here's some more info after investigating this a bit further.
Open socket counts by category after a fresh MDS reboot.
378 REG 180 FIFO 90 IPv4 54 CHR 36 DIR 18 unix
Open sockets after 1000 clean, mount/shutdown iterations.
1993964 sock 42126 REG 20060 FIFO 10030 IPv4 6018 CHR 4012 DIR 2006 unix
Driver program
int main(){ struct ceph_mount_info *cmount; for (int i = 0; i < 1000; i++) { assert(ceph_create(&cmount, NULL) == 0); assert(ceph_conf_read_file(cmount, NULL) == 0); assert(ceph_mount(cmount, "/") == 0); assert(ceph_unmount(cmount) == 0); assert(ceph_release(cmount) == 0); } }
Taking the diff between successive lsof output for 1, 2, ... mount attempts reveals the pattern. After one mount there are 20 threads each with an open socket:
ceph-mds 10361 10362 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10364 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10365 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10366 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10367 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10369 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10370 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10371 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10372 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10373 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10376 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10387 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10612 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10614 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10615 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10619 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10622 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10623 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10671 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10672 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol
After a second mount, each of the 20 threads have 2 open sockets, plus 2 new threads with two sockets.
ceph-mds 10361 10362 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10362 nwatkins 21u sock 0,7 0t0 11657837 can't identify protocol ceph-mds 10361 10364 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ceph-mds 10361 10364 nwatkins 21u sock 0,7 0t0 11657837 can't identify protocol ceph-mds 10361 10365 nwatkins 20u sock 0,7 0t0 11657833 can't identify protocol ... clip ...
And so on. After 3 mounts, each of the previous 22 threads have 3 open sockets, plus 2 new threads with 3 open sockets.
Updated by Greg Farnum about 11 years ago
Hmm, did we screw up our refactoring work so that replaced sockets are no longer actually closed? That might explain this behavior.
Updated by Noah Watkins about 11 years ago
Although the high counts were because of double counting by lsof, the sockets still are not being closed. Without any active clients running, all the sockets created from previous mounts are still around after 30 minutes. I'm using the default settings.
Updated by Sage Weil about 11 years ago
- Status changed from New to In Progress
- Assignee set to Sage Weil
Updated by Noah Watkins about 11 years ago
Well hot damn. That branch seems to solve two problems. First, clients that do a clean unmount don't leave lots of FDs lingering in the MDS. Second, unclean mounts are now timing out the MDS-side sockets, after about 7 minutes. Is this mds_session_autoclose (defaulting to 5 minutes) ?
Updated by Noah Watkins about 11 years ago
Err, "unclean mounts" = "exiting without unmounting"
Updated by Sage Weil about 11 years ago
- Status changed from In Progress to Fix Under Review
sounds right. thanks for testing!
Updated by Sage Weil about 11 years ago
- Status changed from Fix Under Review to Resolved
commit:8b713371447f9761597457af2c81f0b870d3c4ba