Project

General

Profile

Actions

Support #13211

closed

profiler and getting some memory info with it

Added by Sergey Mir over 8 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Component(FS):
Labels (FS):
Pull request ID:

Description

ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
3.13.0-61-generic #100-Ubuntu 2015 x86_64 x86_64 x86_64 GNU/Linux

After turning on profiler onto osd.00(i made it from mon02 node, which have monitor and mds daemons), got dump info with ceph tell osd.00 heap dump - i got warn message "mon.mon00@0(leader).osd e1109 we have enough reports/reporters to mark osd.0 down" with other slow requests info, so osd.00 has stopped working. after turning it off - osd has back to work.
here is a part of log file from osd0 when it starts:
2015-09-23 16:53:32.904799 7f4d007a5700 0 osd.0 1109 do_command r=0
2015-09-23 16:53:32.923139 7f4d007a5700 0 turning on heap profiler with prefix /var/log/ceph//osd.0.profile
2015-09-23 16:53:32.933646 7f4d007a5700 0 osd.0 1109 do_command r=0
2015-09-23 16:53:40.971444 7f4d007a5700 0 osd.0 1109 do_command r=0
2015-09-23 16:53:41.152338 7f4d007a5700 0 osd.0 1109 do_command r=0
2015-09-23 16:53:51.230343 7f4d0dfc0700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f4d03fac700' had timed out after 15
2015-09-23 16:53:51.230996 7f4d0dfc0700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f4d047ad700' had timed out after 15
2015-09-23 16:53:51.231905 7f4d0c7bd700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f4d03fac700' had timed out after 15

--
another problem - cannot get mds heap stats
root@mon02 ~ # ceph tell mds.mon02 heap stats
Error EPERM: problem getting command descriptions from mds.

root@mon02 ~ # cat /var/log/ceph/ceph-mds.mon02.log
2015-09-23 16:36:16.408296 7f825e570700 1 mds.-1.0 handle_command: received command from client without `tell` capability: (mon/mds2_ip):0/4048052415
2015-09-23 16:36:16.408952 7f8259465700 0 -- (mon/mds2_ip):6800/10787 >> (mon/mds2_ip):0/4048052415 pipe(0x49b5000 sd=17 :6800 s=2 pgs=2 cs=1 l=0 c=0x494d440).fault, server, going to standby

here is some info about mds:
379044: (mon/mds2_ip):6800/10787 'mon02' mds.-1.0 up:standby seq 1
416149: (mon/mds0_ip):6800/31743 'mon00' mds.0.39 up:active seq 5 export_targets=1
399506: (mon/mds1_ip):6800/3114 'mon01' mds.1.10 up:active seq 6 export_targets=0

is there any way to fix that?

Actions

Also available in: Atom PDF