Project

General

Profile

Bug #14048

MDS heap stats standby threads

Added by Vitaliy Dutchak over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
12/10/2015
Due date:
% Done:

0%

Source:
other
Tags:
mds, heap stats
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:

Description

Hello I am have a problem with heap stats (dump, start_profiler, stop_profiler) command:

When I run it - i get some results, but ceph-mds create 4 new threads and don't exit it, so if I call
ceph tell mds.a heap stats every 10 seconds after 5-6 hours I had 4000+ threads and cluster slow down.

in ceph-mds log i had a lot of:

2015-12-10 15:39:00.121346 7f6187fe6700 0 -- 10.10.10.11:6800/25374 >> 10.10.10.11:0/1766420499 pipe(0x7f61972f8000 sd=17 :6800 s=2 pgs=2 cs=1 l=0 c=0x7f6197194b00).fault, server, going to standby
2015-12-10 15:39:00.152101 7f6187de4700 0 -- 10.10.10.11:6800/25374 >> 10.10.10.11:0/1665626131 pipe(0x7f61972fd000 sd=18 :6800 s=2 pgs=2 cs=1 l=0 c=0x7f6197194dc0).fault, server, going to standby

I was try versions: 0.94.3, 0.94.5, 9.2 on all same picture.
I was try for other daemons (osd and mon) and it's work nice, but not for mds

Associated revisions

Revision b1429aa7 (diff)
Added by John Spray over 3 years ago

mds: tear down connections from `tell` commands

We can identify sessions that were never really
opened, just created when a client sent in
a 'tell' message. When we see one of those, mark
the associated connection disposable when responding
to the command.

Fixes: #14048
Signed-off-by: John Spray <>

Revision b010b6fe (diff)
Added by John Spray over 3 years ago

tasks: add TestSessionMap.test_[mount|tell]_conn_close

To check that teardown is happening correctly when
sending commands to an MDS.

Fixes: #14048
Signed-off-by: John Spray <>

History

#1 Updated by Greg Farnum over 3 years ago

  • Project changed from Ceph to fs
  • Category deleted (1)

Huh. The tell command is setting up a new client connection — are we failing to clean these up for some reason?

#2 Updated by Vitaliy Dutchak over 3 years ago

I think this threads should be cleaned up

#3 Updated by John Spray over 3 years ago

  • Status changed from New to Verified

Reproduced on master.

#4 Updated by John Spray over 3 years ago

  • Status changed from Verified to In Progress
  • Assignee set to John Spray

#6 Updated by Vitaliy Dutchak over 3 years ago

Sorry for question. It's I am need to review or CephFs team? )

#7 Updated by Zheng Yan over 3 years ago

Vitaliy Dutchak wrote:

Sorry for question. It's I am need to review or CephFs team? )

We will review it, If you like to, you can test or review it too.

#8 Updated by Zheng Yan over 3 years ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF