Project

General

Profile

Actions

Bug #55840

open

windows clients unable to perform IO to clusters with over 200+ OSDs

Added by Rafael Lopez almost 2 years ago. Updated 11 months ago.

Status:
Pending Backport
Priority:
Normal
Assignee:
Category:
msgr
Target version:
% Done:

0%

Source:
Tags:
windows,messenger,msg,async backport_processed
Backport:
pacific quincy
Regression:
No
Severity:
3 - minor
Reviewed:
06/03/2022
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

When cluster has a large number of OSDs (around 200 or more), windows clients can only do IO for a short period then stop. For example a rados bench test looks like this:

PS C:\Users\Administrator\Downloads\ceph\ceph> rados.exe -p rbdec bench 60 write -b 1
hints = 1
Maintaining 16 concurrent writes of 1 bytes to objects of size 1 for up to 60 seconds or 0 objects
Object prefix: benchmark_data_WIN-TEST_7452
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       1         1         0         0         0           -           0
    1      16       991       975 0.000921874 0.000929832   0.0170945    0.016306
    2      16      1853      1837 0.000865209 0.000822067   0.0094437   0.0163601
    3      16      2266      2250 0.000705895 0.000393867   0.0115127   0.0159883
    4      16      2518      2502 0.000588372 0.000240326   0.0126331   0.0158619
    5      16      2733      2717 0.000510963 0.00020504    0.012853   0.0157324
    6      16      2826      2810 0.000440277 8.86917e-05   0.0143737   0.0157153
    7      16      2884      2868 0.000385106 5.53131e-05   0.0137714   0.0157497
    8      16      2941      2925 0.000343621 5.43594e-05   0.0139032   0.0157918
    9      16      2941      2925 0.000305395         0           -   0.0157918
   10      16      2941      2925 0.000274849         0           -   0.0157918
   11      16      2941      2925 0.000249846         0           -   0.0157918
   12      16      2941      2925 0.00022901         0           -   0.0157918

The issue is due to FD_SETSIZE used by select() in windows is limited to 64. This means the client can only manage ms_async_op_threads * 64 FDs (default 3*64) for IO to the cluster.
Workaround is to increase ms_async_op_threads in client ceph.conf to support the number of OSDs in the cluster.


Related issues 2 (1 open1 closed)

Copied to Ceph - Backport #61614: pacific: windows clients unable to perform IO to clusters with over 200+ OSDsRejectedActions
Copied to Ceph - Backport #61615: quincy: windows clients unable to perform IO to clusters with over 200+ OSDsNewActions
Actions #1

Updated by Dan Mick almost 2 years ago

  • Assignee set to Rafael Lopez
Actions #3

Updated by Laura Flores almost 2 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 46525
Actions #4

Updated by Lucian Petrut 11 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport set to pacific quincy
Actions #5

Updated by Lucian Petrut 11 months ago

The fix was merged upstream during the Reef cycle and backported downstream to Pacific and Quincy (cloudbase.it builds).

Actions #6

Updated by Backport Bot 11 months ago

  • Copied to Backport #61614: pacific: windows clients unable to perform IO to clusters with over 200+ OSDs added
Actions #7

Updated by Backport Bot 11 months ago

  • Copied to Backport #61615: quincy: windows clients unable to perform IO to clusters with over 200+ OSDs added
Actions #8

Updated by Backport Bot 11 months ago

  • Tags changed from windows,messenger,msg,async to windows,messenger,msg,async backport_processed
Actions

Also available in: Atom PDF