Actions
Bug #55840
openwindows clients unable to perform IO to clusters with over 200+ OSDs
% Done:
0%
Source:
Tags:
windows,messenger,msg,async backport_processed
Backport:
pacific quincy
Regression:
No
Severity:
3 - minor
Reviewed:
06/03/2022
Description
When cluster has a large number of OSDs (around 200 or more), windows clients can only do IO for a short period then stop. For example a rados bench test looks like this:
PS C:\Users\Administrator\Downloads\ceph\ceph> rados.exe -p rbdec bench 60 write -b 1
hints = 1
Maintaining 16 concurrent writes of 1 bytes to objects of size 1 for up to 60 seconds or 0 objects
Object prefix: benchmark_data_WIN-TEST_7452
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 1 1 0 0 0 - 0
1 16 991 975 0.000921874 0.000929832 0.0170945 0.016306
2 16 1853 1837 0.000865209 0.000822067 0.0094437 0.0163601
3 16 2266 2250 0.000705895 0.000393867 0.0115127 0.0159883
4 16 2518 2502 0.000588372 0.000240326 0.0126331 0.0158619
5 16 2733 2717 0.000510963 0.00020504 0.012853 0.0157324
6 16 2826 2810 0.000440277 8.86917e-05 0.0143737 0.0157153
7 16 2884 2868 0.000385106 5.53131e-05 0.0137714 0.0157497
8 16 2941 2925 0.000343621 5.43594e-05 0.0139032 0.0157918
9 16 2941 2925 0.000305395 0 - 0.0157918
10 16 2941 2925 0.000274849 0 - 0.0157918
11 16 2941 2925 0.000249846 0 - 0.0157918
12 16 2941 2925 0.00022901 0 - 0.0157918
The issue is due to FD_SETSIZE used by select() in windows is limited to 64. This means the client can only manage ms_async_op_threads * 64 FDs (default 3*64) for IO to the cluster.
Workaround is to increase ms_async_op_threads in client ceph.conf to support the number of OSDs in the cluster.
Updated by Rafael Lopez almost 2 years ago
Submitted PR: https://github.com/ceph/ceph/pull/46525
Updated by Laura Flores almost 2 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 46525
Updated by Lucian Petrut 11 months ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to pacific quincy
Updated by Lucian Petrut 11 months ago
The fix was merged upstream during the Reef cycle and backported downstream to Pacific and Quincy (cloudbase.it builds).
Updated by Backport Bot 11 months ago
- Copied to Backport #61614: pacific: windows clients unable to perform IO to clusters with over 200+ OSDs added
Updated by Backport Bot 11 months ago
- Copied to Backport #61615: quincy: windows clients unable to perform IO to clusters with over 200+ OSDs added
Updated by Backport Bot 11 months ago
- Tags changed from windows,messenger,msg,async to windows,messenger,msg,async backport_processed
Actions