Project

General

Profile

Backport #18531

Updated by Nathan Cutler over 7 years ago

we hit MDS CPU bottleneck (100% on one core as it is single thread) in our cephFS production enviroment. 

 Troubleshooting showing one applications are calling readdir often, what is worth, the dir has 130K files. Profiling showing on average each readdir call take ~20ms to finish in this scale(#files) and waste a significant time(and CPU!) on skipping unwanted dentries. 

 Take this request as example: 10 ms was spent on skipping the dentries and 7 ms was spent on encoding the wanted 1024 dentries. This patch addressed the 10ms and attempt to minimize it. 

 2017-01-09 19:49:03.023878 7ff8d74ce700 10 mds.0.server snapid head 

 [Iterating and skipping all dentry < offset] 
 https://github.com/ceph/ceph/blob/v10.2.2/src/mds/Server.cc#L3379 

 2017-01-09 19:49:03.033836 7ff8d74ce700 10 mds.0.cache.ino(100000edea3) encode_inodestat issuing pAsLsXsFscr seq 1867 
 [Encoding inodes to reply message, 1024 inodes in total , +10.836ms] 
 2017-01-09 19:49:03.040745 7ff8d74ce700 10 mds.0.cache.ino(100000eb3e7) encode_inodestat issuing pAsLsXsFscr seq 1867 
 …… 

 [Finished encoding, reply to client,total size ~300KB + 17.752ms] 
 2017-01-09 19:49:03.040752 7ff8d74ce700 10 mds.0.server reply to client_request(client.26741659:858764 readdir #100000cbe8d lvs3b02c-cb62.stratus.lvs.ebay.com.pem 2017-01-09 19:49:02.991586) v2 readdir num=1024 bytes=310510 end=0 complete=0

Back