Actions
Bug #44945
openMon High CPU usage when another mon syncing from it
Status:
Need More Info
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Each sync request take very long time to come back, as show below. And from the TOP of source-mon, the CPU was on dispatcher, perf shows mostly on buffer operations.
One special thing in this cluster is , the cluster is heavily used by rbd snapshot, where 1K + volumes make snapshots every hour and keep for 7 days.
2020-04-05 19:53:09.722 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0x199af980 2020-04-05 19:53:09.722 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0x199af980 con 0xc632d80 2020-04-05 19:53:09.758 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 244 ==== mon_sync(chunk cookie 79456894977 lc 40807336 bl 1625633 bytes) v2 ==== 1625721+0+0 (secure 0 0 0) 0x199af980 con 0xc632d80 2020-04-05 19:53:09.766 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0x199ae580 2020-04-05 19:53:09.766 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0x199ae580 con 0xc632d80 2020-04-05 19:53:23.219 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 245 ==== mon_sync(chunk cookie 79456894977 lc 40807337 bl 998724 bytes last_key osd_snap,removed_snap_1_00000000004fedcb) v2 ==== 998851+0+0 (secure 0 0 0) 0x199ae580 con 0xc632d80 2020-04-05 19:53:23.235 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0x199ae300 2020-04-05 19:53:23.235 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0x199ae300 con 0xc632d80 2020-04-05 19:53:23.271 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 246 ==== mon_sync(chunk cookie 79456894977 lc 40807338 bl 1175179 bytes) v2 ==== 1175267+0+0 (secure 0 0 0) 0x199ae300 con 0xc632d80 2020-04-05 19:53:23.279 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0x199af200 2020-04-05 19:53:23.279 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0x199af200 con 0xc632d80 2020-04-05 19:53:40.436 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 247 ==== mon_sync(chunk cookie 79456894977 lc 40807338 bl 1048478 bytes last_key osd_snap,removed_snap_1_0000000000504aa4) v2 ==== 1048605+0+0 (secure 0 0 0) 0x199af200 con 0xc632d80 2020-04-05 19:53:40.448 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0x199ac780 2020-04-05 19:53:40.448 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0x199ac780 con 0xc632d80 2020-04-05 19:53:54.404 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 248 ==== mon_sync(chunk cookie 79456894977 lc 40807339 bl 998698 bytes last_key osd_snap,removed_snap_1_000000000050a0f4) v2 ==== 998825+0+0 (secure 0 0 0) 0x199ac780 con 0xc632d80 2020-04-05 19:53:54.420 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0x199afc00 2020-04-05 19:53:54.420 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0x199afc00 con 0xc632d80 2020-04-05 19:54:12.133 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 249 ==== mon_sync(chunk cookie 79456894977 lc 40807339 bl 1048478 bytes last_key osd_snap,removed_snap_1_000000000051020c) v2 ==== 1048605+0+0 (secure 0 0 0) 0x199afc00 con 0xc632d80 2020-04-05 19:54:12.149 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0x199ac280 2020-04-05 19:54:12.149 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0x199ac280 con 0xc632d80 2020-04-05 19:54:12.173 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 250 ==== mon_sync(chunk cookie 79456894977 lc 40807340 bl 1626593 bytes) v2 ==== 1626681+0+0 (secure 0 0 0) 0x199ac280 con 0xc632d80 2020-04-05 19:54:12.181 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0x199af480 2020-04-05 19:54:12.181 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0x199af480 con 0xc632d80 2020-04-05 19:54:26.138 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 251 ==== mon_sync(chunk cookie 79456894977 lc 40807341 bl 998690 bytes last_key osd_snap,removed_snap_1_0000000000515c15) v2 ==== 998817+0+0 (secure 0 0 0) 0x199af480 con 0xc632d80 2020-04-05 19:54:26.154 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0xce4ed00 2020-04-05 19:54:26.154 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0xce4ed00 con 0xc632d80 2020-04-05 19:54:43.459 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 252 ==== mon_sync(chunk cookie 79456894977 lc 40807341 bl 1048478 bytes last_key osd_snap,removed_snap_1_000000000051c031) v2 ==== 1048605+0+0 (secure 0 0 0) 0xce4ed00 con 0xc632d80 2020-04-05 19:54:43.475 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0xce4ef80 2020-04-05 19:54:43.475 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0xce4ef80 con 0xc632d80 2020-04-05 19:54:43.507 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 253 ==== mon_sync(chunk cookie 79456894977 lc 40807342 bl 1624418 bytes) v2 ==== 1624506+0+0 (secure 0 0 0) 0xce4ef80 con 0xc632d80 2020-04-05 19:54:43.511 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] send_to--> mon [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- ?+0 0xce4f200 2020-04-05 19:54:43.511 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] --> [v2:10.75.88.177:3300/0,v1:10.75.88.177:6789/0] -- mon_sync(get_chunk cookie 79456894977) v2 -- 0xce4f200 con 0xc632d80 2020-04-05 19:54:57.272 7f34ada19700 1 -- [v2:10.75.86.255:3300/0,v1:10.75.86.255:6789/0] <== mon.2 v2:10.75.88.177:3300/0 254 ==== mon_sync(chunk cookie 79456894977 lc 40807343 bl 998702 bytes last_key osd_snap,removed_snap_1_0000000000521b3f) v2 ==== 998829+0+0 (secure 0 0 0) 0xce4f200 con 0xc632d80
top - 19:52:23 up 333 days, 8:14, 2 users, load average: 1.21, 1.17, 0.95 Threads: 27 total, 1 running, 26 sleeping, 0 stopped, 0 zombie %Cpu(s): 13.6 us, 0.2 sy, 0.0 ni, 86.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 16431696 total, 8569568 free, 1426536 used, 6435592 buff/cache KiB Swap: 0 total, 0 free, 0 used. 14443308 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3812474 ceph 20 0 1654848 940032 28164 R 99.0 5.7 33:25.47 ms_dispatch 3812472 ceph 20 0 1654848 940032 28164 S 2.7 5.7 17:08.01 cpu_tp 3812470 ceph 20 0 1654848 940032 28164 S 2.3 5.7 18:23.44 cpu_tp 3812471 ceph 20 0 1654848 940032 28164 S 2.3 5.7 17:27.31 cpu_tp 3812473 ceph 20 0 1654848 940032 28164 S 2.3 5.7 18:07.42 cpu_tp 3812439 ceph 20 0 1654848 940032 28164 S 0.3 5.7 0:08.48 log 3812464 ceph 20 0 1654848 940032 28164 S 0.3 5.7 0:14.46 msgr-worker-0 3812437 ceph 20 0 1654848 940032 28164 S 0.0 5.7 0:00.47 ceph-mon 3812440 ceph 20 0 1654848 940032 28164 S 0.0 5.7 0:00.69 service 3812441 ceph 20 0 1654848 940032 28164 S 0.0 5.7 0:03.91 admin_socket 3812442 ceph 20 0 1654848 940032 28164 S 0.0 5.7 0:00.00 signal_handler 3812443 ceph 20 0 1654848 940032 28164 S 0.0 5.7 27:21.85 rocksdb:low0 3812444 ceph 20 0 1654848 940032 28164 S 0.0 5.7 1:18.43 rocksdb:high0 3812445 ceph 20 0 1654848 940032 28164 S 0.0 5.7 0:00.00 ceph-mon 3812461 ceph 20 0 1654848 940032 28164 S 0.0 5.7 0:00.05 rocksdb:dump_st
15.16% libceph-common.so.0 [.] _ZN4ceph6buffer7v14_2_03ptr6appendEPKcj ▒ 12.67% libceph-common.so.0 [.] _ZN4ceph6buffer7v14_2_03ptr7releaseEv ▒ 11.29% libceph-common.so.0 [.] _ZN4ceph6buffer7v14_2_04list6appendEPKcj ▒ 9.08% libceph-common.so.0 [.] _ZN4ceph6buffer7v14_2_03ptrC1ERKS2_ ▒ 8.63% libtcmalloc.so.4.2.6 [.] tc_malloc ▒ 4.26% libtcmalloc.so.4.2.6 [.] _ZN8tcmalloc11ThreadCache21ReleaseToCentralCacheEPNS0_8FreeListEmi ▒ 4.02% libtcmalloc.so.4.2.6 [.] tc_deletearray ▒ 3.99% libceph-common.so.0 [.] _ZNK4ceph6buffer7v14_2_03ptr18unused_tail_lengthEv ▒ 3.90% ceph-mon [.] _ZNK14MonitorDBStore11Transaction6encodeERN4ceph6buffer7v14_2_04listE
Actions