Project

General

Profile

Actions

Bug #57920

open

mds:ESubtreeMap event size is too large

Added by zhikuo du over 1 year ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

In production environment, we have a problem: The ESubtreeMap event size is too large.

1,The ESubtreeMap event size is about 75M (79384286).

For example :(from a mds's log)
3701845:2022-10-25T14:45:27.983+0800 7f7ab4013700 5 mds.0.log _submit_thread 161682556023939~79384286 : ESubtreeMap 2 subtrees , 0 ambiguous [metablob 0x1, 2904 dirs]
3701923:2022-10-25T14:45:29.171+0800 7f7ab4013700 5 mds.0.log _submit_thread 161682635408245~1026 : EUpdate scatter_writebehind [metablob 0x10000000000, 1 dirs]
3701928:2022-10-25T14:45:29.171+0800 7f7ab4013700 5 mds.0.log _submit_thread 161682635409291~1026 : EUpdate scatter_writebehind [metablob 0x10000000000, 1 dirs]
3701956:2022-10-25T14:45:29.171+0800 7f7ab4013700 5 mds.0.log _submit_thread 161682635410337~411678 : EUpdate scatter_writebehind [metablob 0x1, 1128 dirs]
3701963:2022-10-25T14:45:29.171+0800 7f7ab4013700 5 mds.0.log _submit_thread 161682635822035~6224 : EUpdate cap update [metablob 0x10000000000, 7 dirs]
3702858:2022-10-25T14:45:30.155+0800 7f7ab4013700 5 mds.0.log _submit_thread 161682635828279~79384286 : ESubtreeMap 2 subtrees , 0 ambiguous [metablob 0x1, 2904 dirs]
3702954:2022-10-25T14:45:31.415+0800 7f7ab4013700 5 mds.0.log _submit_thread 161682715212585~7606 : EUpdate cap update [metablob 0x1, 8 dirs]
3702959:2022-10-25T14:45:31.415+0800 7f7ab4013700 5 mds.0.log _submit_thread 161682715220211~1856 : EUpdate scatter_writebehind [metablob 0x1, 2 dirs]
3702964:2022-10-25T14:45:31.415+0800 7f7ab4013700 5 mds.0.log _submit_thread 161682715222087~1958 : EUpdate cap update [metablob 0x1000d2632f8, 2 dirs]
3702969:2022-10-25T14:45:31.415+0800 7f7ab4013700 5 mds.0.log _submit_thread 161682715224065~1958 : EUpdate cap update [metablob 0x1000d2632f8, 2 dirs]
3703027:2022-10-25T14:45:31.415+0800 7f7ab4013700 5 mds.0.log _submit_thread 161682715251247~1856 : EUpdate scatter_writebehind [metablob 0x1, 2 dirs]
......
3703086:2022-10-25T14:45:31.419+0800 7f7ab4013700 5 mds.0.log _submit_thread 161682715379407~29284 : EUpdate openc [metablob 0x50001b4b21c, 2 dirs]
3703094:2022-10-25T14:45:31.419+0800 7f7ab4013700 5 mds.0.log _submit_thread 161682715408711~1874 : EUpdate scatter_writebehind [metablob 0x40001f0d358, 2 dirs]
3703098:2022-10-25T14:45:31.419+0800 7f7ab4013700 5 mds.0.log _submit_thread 161682715410605~405212 : EUpdate scatter_writebehind [metablob 0x50001b4b1a6, 1121 dirs]
3703106:2022-10-25T14:45:31.423+0800 7f7ab4013700 5 mds.0.log _submit_thread 161682715815837~2408 : EUpdate scatter_writebehind [metablob 0x1, 2 dirs]
3703296:2022-10-25T14:45:32.439+0800 7f7ab4013700 5 mds.0.log _submit_thread 161682715818265~79384286 : ESubtreeMap 2 subtrees , 0 ambiguous [metablob 0x1, 2904 dirs]

2,In this situation:

(1)The cost time of ESubtreeMap::encode is mainly between 512ms and 1000ms:
my_hist:
[512M, 1G) 23 |
@@@@@@@@@@@@@@@@@@@@@@@@@@@|
real 0m33.978s
user 0m0.157s
sys 0m0.045s
(2)The cost time of MDCache::create_subtree_map is mainly between 256ms and 512ms:
my_hist:
[128M, 256M) 7 |
@@@@@ |
[256M, 512M) 32 |
@@@@@@@@@@@@@@@@@@@@@@@@@@|
[512M, 1G) 6 |
@@@@ |
real 0m26.269s
user 0m0.169s
sys 0m0.035s

3,So the cpu utilization of md_submit(who uses ESubtreeMap::encode) and ms_dispatch(who use MDCache::create_subtree_map ) is high!

12085 ceph      20   0  133.2g  19.7g   9660 R 93.8  5.2  15040:53 ms_dispatch                                                                                                                                                                                              
18518 ceph 20 0 133.2g 19.7g 9660 R 93.8 5.2 29573:53 md_submit
62468 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 290:34.59 ceph-mds
12074 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 164:03.83 log
12075 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 2172:08 msgr-worker-0
12076 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 2003:44 msgr-worker-1
12077 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 2067:13 msgr-worker-2
12082 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 1:01.53 service
12083 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 0:02.35 admin_socket
12084 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 0:00.01 signal_handler
12086 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 0:00.00 ms_local
12087 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 6:01.68 safe_timer
12088 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 0:00.05 fn_anonymous
12089 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 162:41.85 safe_timer
12090 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 2:26.91 ceph-mds
12121 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 6:57.85 safe_timer
18510 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 11:42.19 ms_dispatch
18511 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 89:07.23 OpHistorySvc
18512 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 262:38.54 ms_dispatch
18513 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 13:08.91 mds_rank_progr
18514 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 97:22.64 PQ_Finisher
18515 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 0:28.42 safe_timer
18516 ceph 20 0 133.2g 19.7g 9660 S 0.0 5.2 2646:53 MR_Finisher

4,I think this problem will seriously affect the IOPS of write and read.

5, @Xiubo Li has provide a patch:
https://github.com/ceph/ceph/pull/44180 《mds/MDLog: use a larger sigment size if the event max size is too large》
But this patch has not been committed.
Will anybody review it again?

we use: ceph version 15.2.16 octopus (stable)


Files

cpu.png (113 KB) cpu.png zhikuo du, 10/25/2022 09:23 AM
create_subtree_map.png (21.2 KB) create_subtree_map.png zhikuo du, 10/25/2022 09:23 AM
encode.png (17.3 KB) encode.png zhikuo du, 10/25/2022 09:23 AM
ESubtree Size.png (318 KB) ESubtree Size.png zhikuo du, 10/25/2022 09:24 AM
Actions

Also available in: Atom PDF