Project

General

Profile

Actions

Bug #18742

closed

Bluestore: up to 90 sec delays observed during 'dummy' write req processing

Added by Igor Fedotov about 7 years ago. Updated almost 7 years ago.

Status:
Can't reproduce
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
Yes
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

BlueStore 'dummy' 64K write request handling might stall when doing BlueStore benchmarking using FIO.
This increases op latency up to 90 seconds.
BlueStore is configured in all-ssd mode using one SSD drive for DB/WAl and another one for Block storage.

'dummy' write = write request that causes no block device write, object metadata is updated only.
This is enabled by bluestore_debug_omit_block_device_write = true and intended to speed-up store preconditioning.

Test case performs 64 append-only jobs over 32K objects 4 Mb size each. 128 Gb write in total.
Delays tend to appear when >60% of the write is completed.

Git bisect analysis shows that the issue was introduced with commit:
c776710a8ec04a2fd810fc5a1899fa8868cbc80a
"os/bluestore/BlueFS: increase size threshold before we flush"

Indeed reverting bluestore_min_flush_size parameter to 64K fixes the issue.

Here is the log snippet, full one is available at
https://drive.google.com/file/d/0B-4q9QFReegLd1lNVTdzUXRxZ2M/view?usp=sharing

2017-01-30 17:08:01.563477 7f72497f2700 20 bdev(./fio-bluestore/block.wal) aio_write 0x233a000~81000 (direct)
2017-01-30 17:08:01.563650 7f72497f2700 20 bdev(./fio-bluestore/block.wal) aio_write rebuilding buffer to be aligned
2017-01-30 17:08:01.563660 7f72497f2700 20 bdev(./fio-bluestore/block.wal) _aio_log_start 0x233a000~81000
2017-01-30 17:08:01.563666 7f72497f2700 5 bdev(./fio-bluestore/block.wal) aio_write 0x233a000~81000 aio 0x7f72301b3940
2017-01-30 17:08:01.563669 7f72497f2700 20 bdev(./fio-bluestore/block.wal) aio_submit ioc 0x7f7230091ee0 pending 1 running 0
2017-01-30 17:08:01.563671 7f72497f2700 20 bdev(./fio-bluestore/block.wal) aio_submit aio 0x7f72301b3940 fd 13 0x233a000~81000
2017-01-30 17:08:01.563757 7f72497f2700 20 bluefs _flush_range h 0x7f72301dd090 pos now 0x1eba17c

2017-01-30 17:08:01.563807 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 537 < min_flush_size 524288
2017-01-30 17:08:01.563831 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 1080 < min_flush_size 524288
2017-01-30 17:08:01.563851 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 1618 < min_flush_size 524288
2017-01-30 17:08:01.563874 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 2163 < min_flush_size 524288
2017-01-30 17:08:01.563896 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 2701 < min_flush_size 524288
2017-01-30 17:08:01.563923 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 3239 < min_flush_size 524288
2017-01-30 17:08:01.563947 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 3776 < min_flush_size 524288
2017-01-30 17:08:01.563971 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 4320 < min_flush_size 524288
2017-01-30 17:08:01.563995 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 4864 < min_flush_size 524288
2017-01-30 17:08:01.564026 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 5409 < min_flush_size 524288
2017-01-30 17:08:01.564043 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 5946 < min_flush_size 524288
2017-01-30 17:08:01.564069 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 6483 < min_flush_size 524288
2017-01-30 17:08:01.564097 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 7028 < min_flush_size 524288
2017-01-30 17:08:01.564120 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 7565 < min_flush_size 524288
2017-01-30 17:08:01.564141 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 8103 < min_flush_size 524288
2017-01-30 17:08:01.564163 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 8646 < min_flush_size 524288
2017-01-30 17:08:01.564180 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 9183 < min_flush_size 524288
2017-01-30 17:08:01.564204 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 9720 < min_flush_size 524288
2017-01-30 17:08:01.564225 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 10265 < min_flush_size 524288
2017-01-30 17:08:01.564253 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 10802 < min_flush_size 524288
2017-01-30 17:08:01.564275 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 11346 < min_flush_size 524288
2017-01-30 17:08:01.564297 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 11883 < min_flush_size 524288
2017-01-30 17:08:01.564331 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 12426 < min_flush_size 524288
2017-01-30 17:08:01.564357 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 12963 < min_flush_size 524288
2017-01-30 17:08:01.564379 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 13508 < min_flush_size 524288
2017-01-30 17:08:01.564408 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 14051 < min_flush_size 524288
2017-01-30 17:08:01.564431 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 14595 < min_flush_size 524288
2017-01-30 17:08:01.564499 7f72497f2700 10 bluefs get_usage bdev 0 free 160581025792 (149 GB) / 161061269504 (149 GB), used 0%
2017-01-30 17:08:01.564510 7f72497f2700 10 bluefs get_usage bdev 1 free 107338530816 (102366 MB) / 107373125632 (102398 MB), used 0%
2017-01-30 17:08:01.564513 7f72497f2700 10 bluefs get_usage bdev 2 free 15032385536 (14336 MB) / 15033425920 (14336 MB), used 0%
2017-01-30 17:08:01.564519 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 14618 < min_flush_size 524288
2017-01-30 17:08:01.564521 7f72497f2700 10 bluefs _flush 0x7f72301dd090 ignoring, length 14618 < min_flush_size 524288
2017-01-30 17:08:01.564524 7f72497f2700 10 bluefs _fsync 0x7f72301dd090 file(ino 15 size 0xdbe066c mtime 2017-01-30 17:07:58.151831 bdev 0 extents [0:0x500000+c400000,0:0xcd00000+1800000])
2017-01-30 17:08:01.564538 7f72497f2700 10 bluefs _flush 0x7f72301dd090 0x1eba17c~391a to file(ino 15 size 0xdbe066c mtime 2017-01-30 17:07:58.151831 bdev 0 extents [0:0x500000+c400000,0:0xcd00000+1800000])
2017-01-30 17:08:01.564543 7f72497f2700 10 bluefs _flush_range 0x7f72301dd090 pos 0x1eba17c 0x1eba17c~391a to file(ino 15 size 0xdbe066c mtime 2017-01-30 17:07:58.151831 bdev 0 extents [0:0x500000+c400000,0:0xcd00000+1800000])
2017-01-30 17:08:01.564548 7f72497f2700 20 bluefs _flush_range file now file(ino 15 size 0xdbe066c mtime 2017-01-30 17:07:58.151831 bdev 0 extents [0:0x500000+c400000,0:0xcd00000+1800000])
2017-01-30 17:08:01.564555 7f72497f2700 20 bluefs _flush_range in 0:0x500000+c400000 x_off 0x1eba17c
2017-01-30 17:08:01.564558 7f72497f2700 20 bluefs _flush_range using partial tail 0x17c
2017-01-30 17:08:01.564560 7f72497f2700 20 bluefs _flush_range waiting for previous aio to complete
2017-01-30 17:08:01.564562 7f72497f2700 10 bdev aio_wait 0x7f7230091ee0 waiting for 1 aios and/or 0 readers to complete

90 seconds!

2017-01-30 17:09:32.117785 7f725d108700 20 bdev(./fio-bluestore/block.wal) _aio_log_finish 1 0x233a000~81000
2017-01-30 17:09:32.117819 7f725d108700 10 bdev(./fio-bluestore/block.wal) _aio_thread finished aio 0x7f72301b3940 r 528384 ioc 0x7f7230091ee0 with aios left
2017-01-30 17:09:32.117886 7f72497f2700 20 bdev aio_wait 0x7f7230091ee0 done
2017-01-30 17:09:32.117904 7f72497f2700 -1 bdev LONG AIO_WAIT!!!
2017-01-30 17:09:32.117914 7f72497f2700 20 bdev aio_wait 0x7f7230083f60 done
2017-01-30 17:09:32.117949 7f72497f2700 20 bdev aio_wait 0x7f72301ae810 done
2017-01-30 17:09:32.118018 7f72497f2700 20 bluefs _flush_range caching tail of 0xa96 and padding block with 0x56a
2017-01-30 17:09:32.118043 7f72497f2700 20 bdev(./fio-bluestore/block.wal) aio_write 0x23ba000~4000 (direct)
2017-01-30 17:09:32.118100 7f72497f2700 20 bdev(./fio-bluestore/block.wal) aio_write rebuilding buffer to be aligned
2017-01-30 17:09:32.118109 7f72497f2700 20 bdev(./fio-bluestore/block.wal) _aio_log_start 0x23ba000~4000
2017-01-30 17:09:32.118123 7f72497f2700 5 bdev(./fio-bluestore/block.wal) aio_write 0x23ba000~4000 aio 0x7f723000d6d0
2017-01-30 17:09:32.118127 7f72497f2700 20 bdev(./fio-bluestore/block.wal) aio_submit ioc 0x7f7230091ee0 pending 1 running 0
2017-01-30 17:09:32.118133 7f72497f2700 20 bdev(./fio-bluestore/block.wal) aio_submit aio 0x7f723000d6d0 fd 13 0x23ba000~4000
2017-01-30 17:09:32.118205 7f72497f2700 20 bluefs _flush_range h 0x7f72301dd090 pos now 0x1ebda96
2017-01-30 17:09:32.118219 7f72497f2700 10 bluefs _claim_completed_aios got 2 aios
2017-01-30 17:09:32.118224 7f72497f2700 10 bluefs wait_for_aio 0x7f72301dd090
2017-01-30 17:09:32.118227 7f72497f2700 10 bdev aio_wait 0x7f7230091ee0 waiting for 1 aios and/or 0 readers to complete
2017-01-30 17:09:32.118316 7f725d108700 20 bdev(./fio-bluestore/block.wal) _aio_log_finish 1 0x23ba000~4000
2017-01-30 17:09:32.118332 7f725d108700 10 bdev(./fio-bluestore/block.wal) _aio_thread finished aio 0x7f723000d6d0 r 16384 ioc 0x7f7230091ee0 with aios left
2017-01-30 17:09:32.118376 7f72497f2700 20 bdev aio_wait 0x7f7230091ee0 done
2017-01-30 17:09:32.118381 7f72497f2700 20 bdev aio_wait 0x7f7230083f60 done
2017-01-30 17:09:32.118402 7f72497f2700 20 bdev aio_wait 0x7f72301ae810 done


Files

pLL64.fio (393 Bytes) pLL64.fio FIO task Igor Fedotov, 01/31/2017 11:49 AM
ceph.conf (1.08 KB) ceph.conf Igor Fedotov, 01/31/2017 11:50 AM
out (111 KB) out Igor Fedotov, 01/31/2017 03:57 PM
Actions #1

Updated by Alexey Sheplyakov about 7 years ago

bdev(./fio-bluestore/block.wal)

Do you use ordinary files (on local Linux' filesystem) as data and WAL devices?

Actions #2

Updated by Igor Fedotov about 7 years ago

Alexey Sheplyakov wrote:

bdev(./fio-bluestore/block.wal)

Do you use ordinary files (on local Linux' filesystem) as data and WAL devices?

No, these are ust a symlinks to /dev/sdXX
See ceph.conf attached.

Actions #3

Updated by Igor Fedotov about 7 years ago

Brief blktrace started shortly before stall is attached.

Full blktrace (different case) is available at
https://drive.google.com/open?id=0B-4q9QFReegLU2k3ajJxLTdCOTQ

Actions #4

Updated by Sage Weil almost 7 years ago

  • Status changed from New to Can't reproduce
Actions

Also available in: Atom PDF