Project

General

Profile

Support #21933

OSDs consumes around 50% CPU on idle cluster

Added by Tzachi Strul over 6 years ago. Updated over 6 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

0%

Tags:
OSD,cpu,bluestore
Reviewed:
Affected Versions:
Pull request ID:

Description

Hi all,
W have just installed a new cluster using ceph-ansible.
We installed Luminous version (12.2.0-1).
Cluster contains 6 * Intel server board S2600WTTR using only ssd drives 16*1.6TB for each node.
64G RAM , CPU: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz , 32 cores .

We are using Bluestore

OSDs consumes a lot of CPU although we don't generate too much iops at this stage (
almost nothing ) We cant figure out why this behaviour occures.

Main usage are rbd's for our openstack environment ( Okata )

top - 07:41:55 up 49 days, 2:54, 2 users, load average: 6.85, 6.40, 6.37

Tasks: 518 total, 1 running, 517 sleeping, 0 stopped, 0 zombie
%Cpu(s): 14.8 us, 4.3 sy, 0.0 ni, 80.3 id, 0.0 wa, 0.0 hi, 0.6 si,
0.0 st
KiB Mem : 65853584 total, 23953788 free, 40342680 used, 1557116 buff/cache
KiB Swap: 3997692 total, 3997692 free, 0 used. 18020584 avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
COMMAND
36713 ceph 20 0 3869588 2.826g 28896 S 47.2 4.5 6079:20
ceph-osd
53981 ceph 20 0 3998732 2.666g 28628 S 45.8 4.2 5939:28
ceph-osd
55879 ceph 20 0 3707004 2.286g 28844 S 44.2 3.6 5854:29
ceph-osd
46026 ceph 20 0 3631136 1.930g 29100 S 43.2 3.1 6008:50
ceph-osd
39021 ceph 20 0 4091452 2.698g 28936 S 42.9 4.3 5687:39
ceph-osd
47210 ceph 20 0 3598572 1.871g 29092 S 42.9 3.0 5759:19
ceph-osd
52763 ceph 20 0 3843216 2.410g 28896 S 42.2 3.8 5540:11
ceph-osd
49317 ceph 20 0 3794760 2.142g 28932 S 41.5 3.4 5872:24
ceph-osd
42653 ceph 20 0 3915476 2.489g 28840 S 41.2 4.0 5605:13
ceph-osd
41560 ceph 20 0 3460900 1.801g 28660 S 38.5 2.9 5128:01
ceph-osd
50675 ceph 20 0 3590288 1.827g 28840 S 37.9 2.9 5196:58
ceph-osd
37897 ceph 20 0 4034180 2.814g 29000 S 34.9 4.5 4789:10
ceph-osd
50237 ceph 20 0 3379780 1.930g 28892 S 34.6 3.1 4846:36
ceph-osd
48608 ceph 20 0 3893684 2.721g 28880 S 33.9 4.3 4752:43
ceph-osd
40323 ceph 20 0 4227864 2.959g 28800 S 33.6 4.7 4712:36
ceph-osd
44638 ceph 20 0 3656780 2.437g 28896 S 33.2 3.9 4793:58
ceph-osd
61639 ceph 20 0 527512 114300 20988 S 2.7 0.2 2722:03
ceph-mgr
31586 ceph 20 0 765672 304140 21816 S 0.7 0.5 409:06.09
ceph-mon
68 root 20 0 0 0 0 S 0.3 0.0 3:09.69
ksoftirqd/12

Core dump of an osd + osd debug log + debugpack is provided:
ceph-post-file: 9e1eaaf1-033b-438c-abaa-9949353f843e

History

#1 Updated by Sage Weil over 6 years ago

Did you turn up teh debug levels just to generate teh log, or were they already up? Debug logs eat gobs of CPU, so I'd remeasure with them at normal levels?

#2 Updated by Tzachi Strul over 6 years ago

Sage Weil wrote:

Did you turn up teh debug levels just to generate teh log, or were they already up? Debug logs eat gobs of CPU, so I'd remeasure with them at normal levels?

Just to generate logs.... The normal state is "debug_osd": "1/5"

#3 Updated by Sage Weil over 6 years ago

can you try install ceph-osd-dbg or ceph-debuginfo and then run 'perf top'? That should give us a birds-eye view of where the time is being spent.

#4 Updated by Tzachi Strul over 6 years ago

Sage Weil wrote:

can you try install ceph-osd-dbg or ceph-debuginfo and then run 'perf top'? That should give us a birds-eye view of where the time is being spent.

Sorry for the late response.
I have uploaded "perf record" output via ceph-post-file, id is: b8e0a533-8a26-4f08-bbf1-34444a004338

Live data if it helps:
17.35% libc-2.23.so [.] vfprintf
10.87% [kernel] [k] get_futex_key_refs.isra.11
9.66% libnss3.so [.] PK11_FreeSymKey
9.66% libstdc++.so.6.0.21 (deleted) [.] 0x000000000008bfca
8.45% libpthread-2.23.so [.] __pthread_rwlock_rdlock
8.11% ceph-osd (deleted) [.] 0x0000000000dc1691
4.14% libc-2.23.so [.] __memcmp_sse4_1
2.76% libtcmalloc.so.4.2.6 [.] operator delete[]
2.76% ceph-osd (deleted) [.] 0x0000000000f2e6b7
2.76% libc-2.23.so [.] strlen
2.76% ceph-osd (deleted) [.] 0x0000000000a8fbb4
2.76% ceph-osd (deleted) [.] 0x00000000006680d9
2.76% libstdc++.so.6.0.21 (deleted) [.] 0x0000000000089490

#5 Updated by Tzachi Strul over 6 years ago

Update:
Cluster upgraded to 12.2.1. I have restarted all daemons and problem still the same... 50% CPU from all osds.

Also available in: Atom PDF