Project

General

Profile

Actions

Bug #45337

closed

Large (>=2 GB) writes are incomplete when bluefs_buffered_io = true

Added by Igor Fedotov almost 4 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
octopus, nautilus, mimic
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Large buffered writes triggers _pwritev() call from KernelDevice::_sync_write() func which doesn't support more than RW_MAX_IO bytes write per single call.
This results in incomplete writes and might cause RocksDB failures on subsequent reads, e.g.
-126> 2020-04-29 11:40:52.497 7ff1a1e2ca80 10 bluefs open_for_write db/068269.sst
-125> 2020-04-29 11:40:52.497 7ff1a1e2ca80 20 bluefs open_for_write mapping db/068269.sst to bdev 1
-124> 2020-04-29 11:40:52.497 7ff1a1e2ca80 10 bluefs open_for_write h 0x55e54ec3ba40 on file(ino 53370 size 0
x0 mtime 2020-04-29 11:40:52.498550 bdev 1 allocated 0 extents [])
-123> 2020-04-29 11:40:56.187 7ff1a1e2ca80 10 bluefs _flush 0x55e54ec3ba40 0x0~c496f919 to file(ino 53370 siz
e 0x0 mtime 2020-04-29 11:40:52.498550 bdev 1 allocated 0 extents [])
-122> 2020-04-29 11:40:56.187 7ff1a1e2ca80 10 bluefs _flush_range 0x55e54ec3ba40 pos 0x0 0x0~c496f919 to file
(ino 53370 size 0x0 mtime 2020-04-29 11:40:52.498550 bdev 1 allocated 0 extents [])
-121> 2020-04-29 11:40:56.187 7ff1a1e2ca80 10 bluefs _allocate len 0xc496f919 from 1
-120> 2020-04-29 11:40:56.187 7ff1a1e2ca80 20 bluefs _flush_range dirty_seq = 934190 (was clean)
-119> 2020-04-29 11:40:56.187 7ff1a1e2ca80 20 bluefs _flush_range file now file(ino 53370 size 0xc496f919 mti
me 2020-04-29 11:40:56.187276 bdev 1 allocated c4970000 extents [1:0x383db280000~c4970000])
-118> 2020-04-29 11:40:56.187 7ff1a1e2ca80 20 bluefs _flush_range in 1:0x383db280000~c4970000 x_off 0x0
-117> 2020-04-29 11:40:56.187 7ff1a1e2ca80 20 bluefs _flush_range caching tail of 0x919 and padding block wit
h 0x6e7
-116> 2020-04-29 11:40:56.187 7ff1a1e2ca80 20 bdev(0x55e54dba0e00 /var/lib/ceph/osd/ceph-5/block) aio_write 0
x383db280000~c4970000 (buffered)
-115> 2020-04-29 11:40:58.803 7ff1a1e2ca80 20 bdev(0x55e54dba0e00 /var/lib/ceph/osd/ceph-5/block) aio_write r
ebuilding buffer to be aligned
-114> 2020-04-29 11:40:58.803 7ff1a1e2ca80 20 bdev(0x55e54dba0e00 /var/lib/ceph/osd/ceph-5/block) _aio_log_st
art 0x383db280000~c4970000
-113> 2020-04-29 11:40:58.803 7ff1a1e2ca80 5 bdev(0x55e54dba0e00 /var/lib/ceph/osd/ceph-5/block) _sync_write
0x383db280000~c4970000 (buffered)
-112> 2020-04-29 11:41:00.510 7ff1a1e2ca80 20 bdev(0x55e54dba0e00 /var/lib/ceph/osd/ceph-5/block) _aio_log_fi
nish 1 0x383db280000~c4970000
-111> 2020-04-29 11:41:00.510 7ff1a1e2ca80 20 bluefs _flush_range h 0x55e54ec3ba40 pos now 0xc496f919
-110> 2020-04-29 11:41:00.814 7ff1a1e2ca80 10 bluefs _fsync 0x55e54ec3ba40 file(ino 53370 size 0xc496f919 mti
me 2020-04-29 11:40:56.187276 bdev 1 allocated c4970000 extents [1:0x383db280000~c4970000])
-109> 2020-04-29 11:41:00.814 7ff1a1e2ca80 10 bluefs _flush 0x55e54ec3ba40 no dirty data on file(ino 53370 si
ze 0xc496f919 mtime 2020-04-29 11:40:56.187276 bdev 1 allocated c4970000 extents [1:0x383db280000~c4970000])
-108> 2020-04-29 11:41:00.814 7ff1a1e2ca80 10 bluefs _claim_completed_aios got 0 aios
-107> 2020-04-29 11:41:00.814 7ff1a1e2ca80 10 bluefs wait_for_aio 0x55e54ec3ba40
-106> 2020-04-29 11:41:00.814 7ff1a1e2ca80 20 bdev aio_wait 0x55e54ee30fa0 done
-105> 2020-04-29 11:41:00.814 7ff1a1e2ca80 10 bluefs wait_for_aio 0x55e54ec3ba40 done in 0.000020
-104> 2020-04-29 11:41:00.814 7ff1a1e2ca80 20 bluefs flush_bdev
-103> 2020-04-29 11:41:00.814 7ff1a1e2ca80 10 bdev(0x55e54dba0e00 /var/lib/ceph/osd/ceph-5/block) flush start
-102> 2020-04-29 11:41:00.814 7ff1a1e2ca80 5 bdev(0x55e54dba0e00 /var/lib/ceph/osd/ceph-5/block) flush in 0.
000018
-101> 2020-04-29 11:41:00.814 7ff1a1e2ca80 20 bluefs _fsync file metadata was dirty (934190) on file(ino 5337
0 size 0xc496f919 mtime 2020-04-29 11:40:56.187276 bdev 1 allocated c4970000 extents [1:0x383db280000~c4970000]
), flushing log
-100> 2020-04-29 11:41:00.814 7ff1a1e2ca80 20 bluefs _flush_and_sync_log 1 dirty_files
-99> 2020-04-29 11:41:00.814 7ff1a1e2ca80 20 bluefs _flush_and_sync_log op_file_update file(ino 53370 size
0xc496f919 mtime 2020-04-29 11:40:56.187276 bdev 1 allocated c4970000 extents [1:0x383db280000~c4970000])
-98> 2020-04-29 11:41:00.814 7ff1a1e2ca80 10 bluefs _flush_and_sync_log txn(seq 934190 len 0x60 crc 0xf47e2f
01)
-97> 2020-04-29 11:41:00.814 7ff1a1e2ca80 10 bluefs _flush 0x55e54d0a7cc0 0x29f000~1000 to file(ino 1 size 0
x29f000 mtime 0.000000 bdev 0 allocated 410000 extents [1:0x37ecb9b0000~10000,1:0x37ec9bf0000~110000,1:0x37ecb6
c0000~2f0000])
-96> 2020-04-29 11:41:00.814 7ff1a1e2ca80 10 bluefs _flush_range 0x55e54d0a7cc0 pos 0x29f000 0x29f000~1000 t
o file(ino 1 size 0x29f000 mtime 0.000000 bdev 0 allocated 410000 extents [1:0x37ecb9b0000~10000,1:0x37ec9bf000
0~110000,1:0x37ecb6c0000~2f0000])
-95> 2020-04-29 11:41:00.814 7ff1a1e2ca80 20 bluefs _flush_range file now file(ino 1 size 0x2a0000 mtime 0.0
00000 bdev 0 allocated 410000 extents [1:0x37ecb9b0000~10000,1:0x37ec9bf0000~110000,1:0x37ecb6c0000~2f0000])
-94> 2020-04-29 11:41:00.814 7ff1a1e2ca80 20 bluefs _flush_range in 1:0x37ecb6c0000~2f0000 x_off 0x17f000
-93> 2020-04-29 11:41:00.814 7ff1a1e2ca80 20 bdev(0x55e54dba0e00 /var/lib/ceph/osd/ceph-5/block) aio_write 0x37ecb83f000~1000 (direct)
-92> 2020-04-29 11:41:00.814 7ff1a1e2ca80 20 bdev(0x55e54dba0e00 /var/lib/ceph/osd/ceph-5/block) _aio_log_start 0x37ecb83f000~1000
-91> 2020-04-29 11:41:00.814 7ff1a1e2ca80 5 bdev(0x55e54dba0e00 /var/lib/ceph/osd/ceph-5/block) aio_write 0x37ecb83f000~1000 aio 0x55e54ec3a510
-90> 2020-04-29 11:41:00.814 7ff1a1e2ca80 20 bdev(0x55e54dba0e00 /var/lib/ceph/osd/ceph-5/block) aio_submit ioc 0x55e54ece37c0 pending 1 running 0
-89> 2020-04-29 11:41:00.814 7ff1a1e2ca80 20 bluefs _flush_range h 0x55e54d0a7cc0 pos now 0x2a0000
-88> 2020-04-29 11:41:00.814 7ff1a1e2ca80 10 bluefs _claim_completed_aios got 1 aios
-87> 2020-04-29 11:41:00.814 7ff1a1e2ca80 10 bluefs wait_for_aio 0x55e54d0a7cc0
-86> 2020-04-29 11:41:00.814 7ff1a1e2ca80 10 bdev aio_wait 0x55e54ece37c0 waiting for 1 aios to complete
-85> 2020-04-29 11:41:00.814 7ff19229b700 20 bdev(0x55e54dba0e00 /var/lib/ceph/osd/ceph-5/block) _aio_log_finish 1 0x37ecb83f000~1000
-84> 2020-04-29 11:41:00.814 7ff19229b700 10 bdev(0x55e54dba0e00 /var/lib/ceph/osd/ceph-5/block) _aio_thread finished aio 0x55e54ec3a510 r 4096 ioc 0x55e54ece37c0 with 0 aios left
-83> 2020-04-29 11:41:00.814 7ff1a1e2ca80 20 bdev aio_wait 0x55e54ece37c0 done
-82> 2020-04-29 11:41:00.814 7ff1a1e2ca80 10 bluefs wait_for_aio 0x55e54d0a7cc0 done in 0.000097
-81> 2020-04-29 11:41:00.814 7ff1a1e2ca80 20 bluefs flush_bdev
-80> 2020-04-29 11:41:00.814 7ff1a1e2ca80 10 bdev(0x55e54dba0e00 /var/lib/ceph/osd/ceph-5/block) flush start
-79> 2020-04-29 11:41:00.814 7ff1a1e2ca80 5 bdev(0x55e54dba0e00 /var/lib/ceph/osd/ceph-5/block) flush in 0.000011
-78> 2020-04-29 11:41:00.814 7ff1a1e2ca80 20 bluefs _flush_and_sync_log log_seq_stable 934190
-77> 2020-04-29 11:41:00.814 7ff1a1e2ca80 20 bluefs _flush_and_sync_log cleaned file file(ino 53370 size 0xc496f919 mtime 2020-04-29 11:40:56.187276 bdev 1 allocated c4970000 extents [1:0x383db280000~c4970000])
-76> 2020-04-29 11:41:00.814 7ff1a1e2ca80 10 bluefs _flush 0x55e54ec3ba40 ignoring, length 0 < min_flush_size 524288
-75> 2020-04-29 11:41:00.814 7ff1a1e2ca80 10 bluefs _flush 0x55e54ec3ba40 ignoring, length 0 < min_flush_size 524288
-74> 2020-04-29 11:41:00.814 7ff1a1e2ca80 10 bluefs _close_writer 0x55e54ec3ba40 type 2
-73> 2020-04-29 11:41:00.814 7ff1a1e2ca80 20 bdev aio_wait 0x55e54ee30fa0 done
-72> 2020-04-29 11:41:00.814 7ff1a1e2ca80 10 bluefs open_for_read db/068269.sst (random)
-71> 2020-04-29 11:41:00.814 7ff1a1e2ca80 10 bluefs open_for_read h 0x55e54f31cb00 on file(ino 53370 size 0xc496f919 mtime 2020-04-29 11:40:56.187276 bdev 1 allocated c4970000 extents [1:0x383db280000~c4970000])
-70> 2020-04-29 11:41:00.814 7ff1a1e2ca80 10 bluefs _read h 0x55e54f31cb00 0xc496f5c2~357 from file(ino 53370 size 0xc496f919 mtime 2020-04-29 11:40:56.187276 bdev 1 allocated c4970000 extents [1:0x383db280000~c4970000]) prefetch
-69> 2020-04-29 11:41:00.814 7ff1a1e2ca80 20 bluefs _read fetching 0xc496f000~1000 of 1:0x383db280000~c4970000
-68> 2020-04-29 11:41:00.814 7ff1a1e2ca80 5 bdev(0x55e54dba0e00 /var/lib/ceph/osd/ceph-5/block) read 0x3849fbef000~1000 (buffered)
-67> 2020-04-29 11:41:00.814 7ff1a1e2ca80 20 bdev(0x55e54dba0e00 /var/lib/ceph/osd/ceph-5/block) _aio_log_start 0x3849fbef000~1000
-66> 2020-04-29 11:41:01.064 7ff19229b700 20 bdev reap_ioc reap ioc 0x55e54ee30fa0
-65> 2020-04-29 11:41:01.489 7ff1a1e2ca80 20 bdev(0x55e54dba0e00 /var/lib/ceph/osd/ceph-5/block) _aio_log_finish 1 0x3849fbef000~1000
-64> 2020-04-29 11:41:01.489 7ff1a1e2ca80 20 bluefs _read left 0xa3e len 0x357
-63> 2020-04-29 11:41:01.489 7ff1a1e2ca80 20 bluefs _read got 855
-62> 2020-04-29 11:41:01.489 7ff1a1e2ca80 10 bluefs _read_random h 0x55e54f31cb00 0xc496f8e4~35 from file(ino 53370 size 0xc496f919 mtime 2020-04-29 11:40:56.187276 bdev 1 allocated c4970000 extents [1:0x383db280000~c4970000])
-61> 2020-04-29 11:41:01.489 7ff1a1e2ca80 20 bluefs _read_random left 0x71c 0xc496f8e4~35
-60> 2020-04-29 11:41:01.489 7ff1a1e2ca80 20 bluefs _read_random got 53
-59> 2020-04-29 11:41:01.489 7ff1a1e2ca80 10 bluefs unlink db/068269.sst
-58> 2020-04-29 11:41:01.489 7ff1a1e2ca80 20 bluefs _drop_link had refs 1 on file(ino 53370 size 0xc496f919 mtime 2020-04-29 11:40:56.187276 bdev 1 allocated c4970000 extents [1:0x383db280000~c4970000])
-57> 2020-04-29 11:41:01.489 7ff1a1e2ca80 20 bluefs _drop_link destroying file(ino 53370 size 0xc496f919 mtime 2020-04-29 11:40:56.187276 bdev 1 allocated c4970000 extents [1:0x383db280000~c4970000])
-56> 2020-04-29 11:41:01.539 7ff1a1e2ca80 4 rocksdb: [db/db_impl.cc:390] Shutdown: canceling all background work
-55> 2020-04-29 11:41:01.542 7ff1a1e2ca80 10 bluefs unlock_file 0x55e54cfc22c0 on file(ino 2 size 0x0 mtime 2019-04-01 11:15:08.211377 bdev 0 allocated 0 extents [])
-54> 2020-04-29 11:41:01.542 7ff1a1e2ca80 4 rocksdb: [db/db_impl.cc:563] Shutdown complete
-53> 2020-04-29 11:41:01.542 7ff1a1e2ca80 -1 rocksdb: Corruption: Bad table magic number: expected 9863518390377041911, found 12950032858166034944 in db/068269.sst


Related issues 3 (0 open3 closed)

Copied to bluestore - Backport #45682: octopus: Large (>=2 GB) writes are incomplete when bluefs_buffered_io = trueResolvedNathan CutlerActions
Copied to bluestore - Backport #45683: mimic: Large (>=2 GB) writes are incomplete when bluefs_buffered_io = trueRejectedIgor FedotovActions
Copied to bluestore - Backport #45684: nautilus: Large (>=2 GB) writes are incomplete when bluefs_buffered_io = trueResolvedNathan CutlerActions
Actions #1

Updated by Igor Fedotov almost 4 years ago

  • Subject changed from Large (>=2 GB) writes are broken when bluefs_buffered_io = true to Large (>=2 GB) writes are incomplete when bluefs_buffered_io = true
Actions #2

Updated by Igor Fedotov almost 4 years ago

  • Status changed from In Progress to Pending Backport
  • Pull request ID set to 34836
Actions #3

Updated by Igor Fedotov almost 4 years ago

  • Status changed from Pending Backport to Fix Under Review
Actions #4

Updated by Kefu Chai almost 4 years ago

  • Status changed from Fix Under Review to Resolved
Actions #5

Updated by Kefu Chai almost 4 years ago

  • Status changed from Resolved to Pending Backport
Actions #6

Updated by Nathan Cutler almost 4 years ago

  • Copied to Backport #45682: octopus: Large (>=2 GB) writes are incomplete when bluefs_buffered_io = true added
Actions #7

Updated by Nathan Cutler almost 4 years ago

  • Copied to Backport #45683: mimic: Large (>=2 GB) writes are incomplete when bluefs_buffered_io = true added
Actions #8

Updated by Nathan Cutler almost 4 years ago

  • Copied to Backport #45684: nautilus: Large (>=2 GB) writes are incomplete when bluefs_buffered_io = true added
Actions #9

Updated by Igor Fedotov almost 3 years ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF