Project

General

Profile

Actions

Bug #53758

closed

fsync so slow

Added by jichan lee over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
librbd
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Docker Image: quay.io/cephcsi/cephcsi:v3.2.2
ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)

fsync takes seconds on rbd.
Is there any way to determine the cause?

...

root@localhost# time dd if=/dev/urandom of=temp01.bin bs=512 count=1000 status=progress conv=fsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 3.7859 s, 135 kB/s

root@localhost# time dd if=/dev/urandom of=temp01.bin bs=512 count=10000 status=progress conv=fsync
10000+0 records in
10000+0 records out
5120000 bytes (5.1 MB, 4.9 MiB) copied, 6.10873 s, 838 kB/s

root@localhost# fio -ioengine=libaio -name=test -filename=aaaaa.bin -fsync=1 -direct=1 -bs=4k -iodepth=32 -rw=write -size=1M
test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [W(1)][10.9%][w=4KiB/s][w=1 IOPS][eta 00m:57s]

Actions #1

Updated by jichan lee over 2 years ago

  1. rados bench -p k8s-default-class-c-blockpool 10 write -b 4M -t 16
    hints = 1
    Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
    Object prefix: benchmark_data_rook-ceph-tools-599d59ffcc-x9_28741
    sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
    0 0 0 0 0 0 - 0
    1 16 16 0 0 0 - 0
    2 16 26 10 19.9987 20 1.83676 1.52928
    3 16 35 19 25.3313 36 2.80031 1.70759
    4 16 44 28 27.9975 36 2.4809 1.76429
    5 16 54 38 30.3969 40 1.25239 1.71289
    6 16 62 46 30.6633 32 1.43621 1.72136
    7 16 74 58 33.1392 48 0.87739 1.64062
    8 16 80 64 31.9897 24 1.93673 1.63696
    9 16 88 72 31.9903 32 1.24941 1.69071
    10 16 93 77 30.7911 20 2.31739 1.77738
    11 16 93 77 27.9923 0 - 1.77738
    12 15 93 78 25.9931 2 3.33216 1.79732
    13 5 93 88 27.0699 40 3.21164 2.02457
    14 2 93 91 25.9935 12 5.75782 2.13017
    Total time run: 14.034
    Total writes made: 93
    Write size: 4194304
    Object size: 4194304
    Bandwidth (MB/sec): 26.5071
    Stddev Bandwidth: 15.9649
    Max bandwidth (MB/sec): 48
    Min bandwidth (MB/sec): 0
    Average IOPS: 6
    Stddev IOPS: 4.04711
    Max IOPS: 12
    Min IOPS: 0
    Average Latency(s): 2.20728
    Stddev Latency(s): 1.28793
    Max latency(s): 6.23165
    Min latency(s): 0.760782
    Cleaning up (deleting benchmark objects)
    Removed 93 objects
    Clean up completed and total clean up time :14.6214
Actions #2

Updated by jichan lee over 2 years ago

Fixed.
One OSD was abnormal. But it was not marked as SLOW_OPS, so it was difficult to check on the dashboard.

Actions #3

Updated by jichan lee over 2 years ago

dd if=/dev/urandom of=target_disk oflag=direct bs=4096 count=1000 status=progress
3944448 bytes (3.9 MB, 3.8 MiB) copied, 14 s, 282 kB/s
1000+0 records in
1000+0 records out
4096000 bytes (4.1 MB, 3.9 MiB) copied, 14.4186 s, 284 kB/s

Actions #4

Updated by Josh Durgin over 2 years ago

  • Status changed from New to Resolved
Actions

Also available in: Atom PDF