Actions
Bug #55840
openwindows clients unable to perform IO to clusters with over 200+ OSDs
% Done:
0%
Source:
Tags:
windows,messenger,msg,async backport_processed
Backport:
pacific quincy
Regression:
No
Severity:
3 - minor
Reviewed:
06/03/2022
Description
When cluster has a large number of OSDs (around 200 or more), windows clients can only do IO for a short period then stop. For example a rados bench test looks like this:
PS C:\Users\Administrator\Downloads\ceph\ceph> rados.exe -p rbdec bench 60 write -b 1
hints = 1
Maintaining 16 concurrent writes of 1 bytes to objects of size 1 for up to 60 seconds or 0 objects
Object prefix: benchmark_data_WIN-TEST_7452
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 1 1 0 0 0 - 0
1 16 991 975 0.000921874 0.000929832 0.0170945 0.016306
2 16 1853 1837 0.000865209 0.000822067 0.0094437 0.0163601
3 16 2266 2250 0.000705895 0.000393867 0.0115127 0.0159883
4 16 2518 2502 0.000588372 0.000240326 0.0126331 0.0158619
5 16 2733 2717 0.000510963 0.00020504 0.012853 0.0157324
6 16 2826 2810 0.000440277 8.86917e-05 0.0143737 0.0157153
7 16 2884 2868 0.000385106 5.53131e-05 0.0137714 0.0157497
8 16 2941 2925 0.000343621 5.43594e-05 0.0139032 0.0157918
9 16 2941 2925 0.000305395 0 - 0.0157918
10 16 2941 2925 0.000274849 0 - 0.0157918
11 16 2941 2925 0.000249846 0 - 0.0157918
12 16 2941 2925 0.00022901 0 - 0.0157918
The issue is due to FD_SETSIZE used by select() in windows is limited to 64. This means the client can only manage ms_async_op_threads * 64 FDs (default 3*64) for IO to the cluster.
Workaround is to increase ms_async_op_threads in client ceph.conf to support the number of OSDs in the cluster.
Actions