Bug #17824
closed
bluestore: multiple aios allocated at same address?
Added by Sage Weil over 7 years ago.
Updated over 7 years ago.
Description
-284> 2016-11-08 08:39:58.754161 7fcb14d71700 5 bdev(/var/lib/ceph/osd/ceph-0/block) aio_write 0x3b8b000~2000 aio 0x559e30f904a0
-283> 2016-11-08 08:39:58.754163 7fcb14d71700 20 bdev(/var/lib/ceph/osd/ceph-0/block) aio_submit ioc 0x559e29f6fdd0 pending 1 running 0
-282> 2016-11-08 08:39:58.754164 7fcb14d71700 20 bdev(/var/lib/ceph/osd/ceph-0/block) aio_submit aio <b>0x559e30f904a0</b> fd 24 0x3b8b000~2000
...
-260> 2016-11-08 08:39:58.754827 7fcb14d71700 5 bdev(/var/lib/ceph/osd/ceph-0/block) aio_write 0x241d8000~1000 aio 0x559e30f904a0
-259> 2016-11-08 08:39:58.754829 7fcb14d71700 20 bdev(/var/lib/ceph/osd/ceph-0/block) aio_submit ioc 0x559e29f70490 pending 1 running 0
-258> 2016-11-08 08:39:58.754830 7fcb14d71700 20 bdev(/var/lib/ceph/osd/ceph-0/block) aio_submit aio <b>0x559e30f904a0</b> fd 24 0x241d8000~1000
...
-257> 2016-11-08 08:39:58.754854 7fcb1bd7f700 10 bdev(/var/lib/ceph/osd/ceph-0/block) _aio_thread finished aio 0x559e30f904a0 r -1000 ioc 0x559e29f6fdd0 with 0 aios left
/a/sage-2016-11-08_02:52:28-rados-wip-sage-testing---basic-smithi/531641
This is very perplexing. The aio address is coming from a push_back on a list.. in this case two different lists. Somehow they ended up with the same address?
Also surprising (although not impossible) is that the ioc value in the aio_thread finish is the first ioc, not the second... but for hte second one it was just assigned before -258 line was printed in a different thread, so this could be blamed on the CPU cache.
In any case, not sure what to make of it. Heap corruption?
- Status changed from Need More Info to Resolved
- Status changed from Resolved to In Progress
It looks like this is the fault of aio_wait not actually waiting? Looking at logs from /a/sage-2016-11-23_20:05:59-rados-wip-sage-testing---basic-smithi/573264
-50> 2016-11-23 23:29:29.635616 7fbea6cf7700 20 bluefs _flush_range caching tail of 0x176 and padding block with 0xe8a
-49> 2016-11-23 23:29:29.635620 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_write 0x173d000~3000 (direct)
-48> 2016-11-23 23:29:29.635626 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_write rebuilding buffer to be aligned
..
-46> 2016-11-23 23:29:29.635627 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) _aio_log_start 0x173d000~3000
..
-44> 2016-11-23 23:29:29.635631 7fbea6cf7700 5 bdev(/var/lib/ceph/osd/ceph-4/block) aio_write 0x173d000~3000 aio 0x5630e05d8e20
..
-42> 2016-11-23 23:29:29.635633 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_submit ioc 0x5630dd107d40 pending 1 running 0
-41> 2016-11-23 23:29:29.635636 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_submit aio 0x5630e05d8e20 fd 25 0x173d000~3000
..
-28> 2016-11-23 23:29:29.635729 7fbea6cf7700 20 bluefs _flush_range h 0x5630dd101700 pos now 0x33f176
..look at this..
-27> 2016-11-23 23:29:29.635737 7fbea6cf7700 10 bluefs _claim_completed_aios got 1 aios
..
-26> 2016-11-23 23:29:29.635739 7fbea6cf7700 10 bluefs wait_for_aio 0x5630dd101700
-25> 2016-11-23 23:29:29.635740 7fbea6cf7700 20 bdev aio_wait 0x5630dd107d40 done
.. it didn't wait! ..
-24> 2016-11-23 23:29:29.635742 7fbea6cf7700 10 bluefs wait_for_aio 0x5630dd101700 done in 0.000001
-23> 2016-11-23 23:29:29.635745 7fbea6cf7700 20 bluefs _fsync file metadata was dirty (1415) on file(ino 25 size 0x33f176 mtime 2016-11-23 23:29:29.635598 bdev 1 extents [1:0x1000000+100000,1:0x1500000+300000]), flushing log
-22> 2016-11-23 23:29:29.635750 7fbea6cf7700 20 bluefs _flush_and_sync_log 1 dirty_files
-21> 2016-11-23 23:29:29.635752 7fbea6cf7700 20 bluefs _flush_and_sync_log op_file_update file(ino 25 size 0x33f176 mtime 2016-11-23 23:29:29.635598 bdev 1 extents [1:0x1000000+100000,1:0x1500000+300000])
-20> 2016-11-23 23:29:29.635760 7fbea6cf7700 10 bluefs _flush_and_sync_log txn(seq 1415 len 0x33 crc 0xbf81f265)
-19> 2016-11-23 23:29:29.635765 7fbea6cf7700 10 bluefs _pad_bl padding with 0xfa7 zeros
-18> 2016-11-23 23:29:29.635769 7fbea6cf7700 20 bluefs flush_bdev
-17> 2016-11-23 23:29:29.635770 7fbea6cf7700 10 bdev(/var/lib/ceph/osd/ceph-4/block) flush start
-16> 2016-11-23 23:29:29.635782 7fbea6cf7700 5 bdev(/var/lib/ceph/osd/ceph-4/block) flush in 0.000010
-15> 2016-11-23 23:29:29.635784 7fbea6cf7700 10 bluefs _flush 0x5630dd101200 0x586000~1000 to file(ino 1 size 0x586000 mtime 0.000000 bdev 0 extents [1:0x100000+400000,1:0x1100000+400000])
-14> 2016-11-23 23:29:29.635788 7fbea6cf7700 10 bluefs _flush_range 0x5630dd101200 pos 0x586000 0x586000~1000 to file(ino 1 size 0x586000 mtime 0.000000 bdev 0 extents [1:0x100000+400000,1:0x1100000+400000])
-13> 2016-11-23 23:29:29.635792 7fbea6cf7700 20 bluefs _flush_range file now file(ino 1 size 0x587000 mtime 0.000000 bdev 0 extents [1:0x100000+400000,1:0x1100000+400000])
-12> 2016-11-23 23:29:29.635794 7fbea6cf7700 20 bluefs _flush_range in 1:0x1100000+400000 x_off 0x186000
-11> 2016-11-23 23:29:29.635797 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_write 0x1286000~1000 (direct)
-10> 2016-11-23 23:29:29.635802 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_write rebuilding buffer to be aligned
-9> 2016-11-23 23:29:29.635803 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) _aio_log_start 0x1286000~1000
-8> 2016-11-23 23:29:29.635806 7fbea6cf7700 5 bdev(/var/lib/ceph/osd/ceph-4/block) aio_write 0x1286000~1000 aio 0x5630e05d8e20
-7> 2016-11-23 23:29:29.635808 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_submit ioc 0x5630dd107950 pending 1 running 0
-6> 2016-11-23 23:29:29.635808 7fbeadd05700 10 bdev(/var/lib/ceph/osd/ceph-4/block) _aio_thread finished aio 0x5630e05d8e20 r -1000 ioc 0x5630dd107d40 with 0 aios left
...ok *now* it finished..
- Status changed from In Progress to Fix Under Review
- Priority changed from Urgent to Immediate
- Status changed from Fix Under Review to Resolved
Also available in: Atom
PDF