Project

General

Profile

Actions

Bug #44945

open

Mon High CPU usage when another mon syncing from it

Added by Xiaoxi Chen about 4 years ago. Updated over 2 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Each sync request take very long time to come back, as show below. And from the TOP of source-mon, the CPU was on dispatcher, perf shows mostly on buffer operations.

One special thing in this cluster is , the cluster is heavily used by rbd snapshot, where 1K + volumes make snapshots every hour and keep for 7 days.

2020-04-05 19:53:09.722 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0x199af980
2020-04-05 19:53:09.722 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0x199af980 con 0xc632d80
2020-04-05 19:53:09.758 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 244 ==== mon_sync(chunk cookie 79456894977 lc 40807336 bl 1625633 bytes) v2 ==== 1625721+0+0 (secure 0 0 0) 0x199af980 con 0xc632d80
2020-04-05 19:53:09.766 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0x199ae580
2020-04-05 19:53:09.766 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0x199ae580 con 0xc632d80
2020-04-05 19:53:23.219 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 245 ==== mon_sync(chunk cookie 79456894977 lc 40807337 bl 998724 bytes last_key osd_snap,removed_snap_1_00000000004fedcb) v2 ==== 998851+0+0 (secure 0 0 0) 0x199ae580 con 0xc632d80
2020-04-05 19:53:23.235 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0x199ae300
2020-04-05 19:53:23.235 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0x199ae300 con 0xc632d80
2020-04-05 19:53:23.271 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 246 ==== mon_sync(chunk cookie 79456894977 lc 40807338 bl 1175179 bytes) v2 ==== 1175267+0+0 (secure 0 0 0) 0x199ae300 con 0xc632d80
2020-04-05 19:53:23.279 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0x199af200
2020-04-05 19:53:23.279 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0x199af200 con 0xc632d80
2020-04-05 19:53:40.436 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 247 ==== mon_sync(chunk cookie 79456894977 lc 40807338 bl 1048478 bytes last_key osd_snap,removed_snap_1_0000000000504aa4) v2 ==== 1048605+0+0 (secure 0 0 0) 0x199af200 con 0xc632d80
2020-04-05 19:53:40.448 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0x199ac780
2020-04-05 19:53:40.448 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0x199ac780 con 0xc632d80
2020-04-05 19:53:54.404 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 248 ==== mon_sync(chunk cookie 79456894977 lc 40807339 bl 998698 bytes last_key osd_snap,removed_snap_1_000000000050a0f4) v2 ==== 998825+0+0 (secure 0 0 0) 0x199ac780 con 0xc632d80
2020-04-05 19:53:54.420 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0x199afc00
2020-04-05 19:53:54.420 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0x199afc00 con 0xc632d80
2020-04-05 19:54:12.133 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 249 ==== mon_sync(chunk cookie 79456894977 lc 40807339 bl 1048478 bytes last_key osd_snap,removed_snap_1_000000000051020c) v2 ==== 1048605+0+0 (secure 0 0 0) 0x199afc00 con 0xc632d80
2020-04-05 19:54:12.149 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0x199ac280
2020-04-05 19:54:12.149 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0x199ac280 con 0xc632d80
2020-04-05 19:54:12.173 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 250 ==== mon_sync(chunk cookie 79456894977 lc 40807340 bl 1626593 bytes) v2 ==== 1626681+0+0 (secure 0 0 0) 0x199ac280 con 0xc632d80
2020-04-05 19:54:12.181 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0x199af480
2020-04-05 19:54:12.181 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0x199af480 con 0xc632d80
2020-04-05 19:54:26.138 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 251 ==== mon_sync(chunk cookie 79456894977 lc 40807341 bl 998690 bytes last_key osd_snap,removed_snap_1_0000000000515c15) v2 ==== 998817+0+0 (secure 0 0 0) 0x199af480 con 0xc632d80
2020-04-05 19:54:26.154 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0xce4ed00
2020-04-05 19:54:26.154 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0xce4ed00 con 0xc632d80
2020-04-05 19:54:43.459 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 252 ==== mon_sync(chunk cookie 79456894977 lc 40807341 bl 1048478 bytes last_key osd_snap,removed_snap_1_000000000051c031) v2 ==== 1048605+0+0 (secure 0 0 0) 0xce4ed00 con 0xc632d80
2020-04-05 19:54:43.475 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0xce4ef80
2020-04-05 19:54:43.475 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0xce4ef80 con 0xc632d80
2020-04-05 19:54:43.507 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 253 ==== mon_sync(chunk cookie 79456894977 lc 40807342 bl 1624418 bytes) v2 ==== 1624506+0+0 (secure 0 0 0) 0xce4ef80 con 0xc632d80
2020-04-05 19:54:43.511 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0xce4f200
2020-04-05 19:54:43.511 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0xce4f200 con 0xc632d80
2020-04-05 19:54:57.272 7f34ada19700  1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 254 ==== mon_sync(chunk cookie 79456894977 lc 40807343 bl 998702 bytes last_key osd_snap,removed_snap_1_0000000000521b3f) v2 ==== 998829+0+0 (secure 0 0 0) 0xce4f200 con 0xc632d80
top - 19:52:23 up 333 days,  8:14,  2 users,  load average: 1.21, 1.17, 0.95
Threads:  27 total,   1 running,  26 sleeping,   0 stopped,   0 zombie
%Cpu(s): 13.6 us,  0.2 sy,  0.0 ni, 86.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 16431696 total,  8569568 free,  1426536 used,  6435592 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 14443308 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
3812474 ceph      20   0 1654848 940032  28164 R 99.0  5.7  33:25.47 ms_dispatch
3812472 ceph      20   0 1654848 940032  28164 S  2.7  5.7  17:08.01 cpu_tp
3812470 ceph      20   0 1654848 940032  28164 S  2.3  5.7  18:23.44 cpu_tp
3812471 ceph      20   0 1654848 940032  28164 S  2.3  5.7  17:27.31 cpu_tp
3812473 ceph      20   0 1654848 940032  28164 S  2.3  5.7  18:07.42 cpu_tp
3812439 ceph      20   0 1654848 940032  28164 S  0.3  5.7   0:08.48 log
3812464 ceph      20   0 1654848 940032  28164 S  0.3  5.7   0:14.46 msgr-worker-0
3812437 ceph      20   0 1654848 940032  28164 S  0.0  5.7   0:00.47 ceph-mon
3812440 ceph      20   0 1654848 940032  28164 S  0.0  5.7   0:00.69 service
3812441 ceph      20   0 1654848 940032  28164 S  0.0  5.7   0:03.91 admin_socket
3812442 ceph      20   0 1654848 940032  28164 S  0.0  5.7   0:00.00 signal_handler
3812443 ceph      20   0 1654848 940032  28164 S  0.0  5.7  27:21.85 rocksdb:low0
3812444 ceph      20   0 1654848 940032  28164 S  0.0  5.7   1:18.43 rocksdb:high0
3812445 ceph      20   0 1654848 940032  28164 S  0.0  5.7   0:00.00 ceph-mon
3812461 ceph      20   0 1654848 940032  28164 S  0.0  5.7   0:00.05 rocksdb:dump_st
  15.16%  libceph-common.so.0   [.] _ZN4ceph6buffer7v14_2_03ptr6appendEPKcj                                                                                                                                                                                                    ▒
  12.67%  libceph-common.so.0   [.] _ZN4ceph6buffer7v14_2_03ptr7releaseEv                                                                                                                                                                                                      ▒
  11.29%  libceph-common.so.0   [.] _ZN4ceph6buffer7v14_2_04list6appendEPKcj                                                                                                                                                                                                   ▒
   9.08%  libceph-common.so.0   [.] _ZN4ceph6buffer7v14_2_03ptrC1ERKS2_                                                                                                                                                                                                        ▒
   8.63%  libtcmalloc.so.4.2.6  [.] tc_malloc                                                                                                                                                                                                                                  ▒
   4.26%  libtcmalloc.so.4.2.6  [.] _ZN8tcmalloc11ThreadCache21ReleaseToCentralCacheEPNS0_8FreeListEmi                                                                                                                                                                         ▒
   4.02%  libtcmalloc.so.4.2.6  [.] tc_deletearray                                                                                                                                                                                                                             ▒
   3.99%  libceph-common.so.0   [.] _ZNK4ceph6buffer7v14_2_03ptr18unused_tail_lengthEv                                                                                                                                                                                         ▒
   3.90%  ceph-mon              [.] _ZNK14MonitorDBStore11Transaction6encodeERN4ceph6buffer7v14_2_04listE

Actions

Also available in: Atom PDF