Actions
Bug #17824
closedbluestore: multiple aios allocated at same address?
% Done:
0%
Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
-284> 2016-11-08 08:39:58.754161 7fcb14d71700 5 bdev(/var/lib/ceph/osd/ceph-0/block) aio_write 0x3b8b000~2000 aio 0x559e30f904a0 -283> 2016-11-08 08:39:58.754163 7fcb14d71700 20 bdev(/var/lib/ceph/osd/ceph-0/block) aio_submit ioc 0x559e29f6fdd0 pending 1 running 0 -282> 2016-11-08 08:39:58.754164 7fcb14d71700 20 bdev(/var/lib/ceph/osd/ceph-0/block) aio_submit aio <b>0x559e30f904a0</b> fd 24 0x3b8b000~2000 ... -260> 2016-11-08 08:39:58.754827 7fcb14d71700 5 bdev(/var/lib/ceph/osd/ceph-0/block) aio_write 0x241d8000~1000 aio 0x559e30f904a0 -259> 2016-11-08 08:39:58.754829 7fcb14d71700 20 bdev(/var/lib/ceph/osd/ceph-0/block) aio_submit ioc 0x559e29f70490 pending 1 running 0 -258> 2016-11-08 08:39:58.754830 7fcb14d71700 20 bdev(/var/lib/ceph/osd/ceph-0/block) aio_submit aio <b>0x559e30f904a0</b> fd 24 0x241d8000~1000 ... -257> 2016-11-08 08:39:58.754854 7fcb1bd7f700 10 bdev(/var/lib/ceph/osd/ceph-0/block) _aio_thread finished aio 0x559e30f904a0 r -1000 ioc 0x559e29f6fdd0 with 0 aios left
/a/sage-2016-11-08_02:52:28-rados-wip-sage-testing---basic-smithi/531641
This is very perplexing. The aio address is coming from a push_back on a list.. in this case two different lists. Somehow they ended up with the same address?
Also surprising (although not impossible) is that the ioc value in the aio_thread finish is the first ioc, not the second... but for hte second one it was just assigned before -258 line was printed in a different thread, so this could be blamed on the CPU cache.
In any case, not sure what to make of it. Heap corruption?
Updated by Sage Weil over 7 years ago
- Status changed from Need More Info to Resolved
Updated by Sage Weil over 7 years ago
- Status changed from Resolved to In Progress
It looks like this is the fault of aio_wait not actually waiting? Looking at logs from /a/sage-2016-11-23_20:05:59-rados-wip-sage-testing---basic-smithi/573264
-50> 2016-11-23 23:29:29.635616 7fbea6cf7700 20 bluefs _flush_range caching tail of 0x176 and padding block with 0xe8a -49> 2016-11-23 23:29:29.635620 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_write 0x173d000~3000 (direct) -48> 2016-11-23 23:29:29.635626 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_write rebuilding buffer to be aligned .. -46> 2016-11-23 23:29:29.635627 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) _aio_log_start 0x173d000~3000 .. -44> 2016-11-23 23:29:29.635631 7fbea6cf7700 5 bdev(/var/lib/ceph/osd/ceph-4/block) aio_write 0x173d000~3000 aio 0x5630e05d8e20 .. -42> 2016-11-23 23:29:29.635633 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_submit ioc 0x5630dd107d40 pending 1 running 0 -41> 2016-11-23 23:29:29.635636 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_submit aio 0x5630e05d8e20 fd 25 0x173d000~3000 .. -28> 2016-11-23 23:29:29.635729 7fbea6cf7700 20 bluefs _flush_range h 0x5630dd101700 pos now 0x33f176 ..look at this.. -27> 2016-11-23 23:29:29.635737 7fbea6cf7700 10 bluefs _claim_completed_aios got 1 aios .. -26> 2016-11-23 23:29:29.635739 7fbea6cf7700 10 bluefs wait_for_aio 0x5630dd101700 -25> 2016-11-23 23:29:29.635740 7fbea6cf7700 20 bdev aio_wait 0x5630dd107d40 done .. it didn't wait! .. -24> 2016-11-23 23:29:29.635742 7fbea6cf7700 10 bluefs wait_for_aio 0x5630dd101700 done in 0.000001 -23> 2016-11-23 23:29:29.635745 7fbea6cf7700 20 bluefs _fsync file metadata was dirty (1415) on file(ino 25 size 0x33f176 mtime 2016-11-23 23:29:29.635598 bdev 1 extents [1:0x1000000+100000,1:0x1500000+300000]), flushing log -22> 2016-11-23 23:29:29.635750 7fbea6cf7700 20 bluefs _flush_and_sync_log 1 dirty_files -21> 2016-11-23 23:29:29.635752 7fbea6cf7700 20 bluefs _flush_and_sync_log op_file_update file(ino 25 size 0x33f176 mtime 2016-11-23 23:29:29.635598 bdev 1 extents [1:0x1000000+100000,1:0x1500000+300000]) -20> 2016-11-23 23:29:29.635760 7fbea6cf7700 10 bluefs _flush_and_sync_log txn(seq 1415 len 0x33 crc 0xbf81f265) -19> 2016-11-23 23:29:29.635765 7fbea6cf7700 10 bluefs _pad_bl padding with 0xfa7 zeros -18> 2016-11-23 23:29:29.635769 7fbea6cf7700 20 bluefs flush_bdev -17> 2016-11-23 23:29:29.635770 7fbea6cf7700 10 bdev(/var/lib/ceph/osd/ceph-4/block) flush start -16> 2016-11-23 23:29:29.635782 7fbea6cf7700 5 bdev(/var/lib/ceph/osd/ceph-4/block) flush in 0.000010 -15> 2016-11-23 23:29:29.635784 7fbea6cf7700 10 bluefs _flush 0x5630dd101200 0x586000~1000 to file(ino 1 size 0x586000 mtime 0.000000 bdev 0 extents [1:0x100000+400000,1:0x1100000+400000]) -14> 2016-11-23 23:29:29.635788 7fbea6cf7700 10 bluefs _flush_range 0x5630dd101200 pos 0x586000 0x586000~1000 to file(ino 1 size 0x586000 mtime 0.000000 bdev 0 extents [1:0x100000+400000,1:0x1100000+400000]) -13> 2016-11-23 23:29:29.635792 7fbea6cf7700 20 bluefs _flush_range file now file(ino 1 size 0x587000 mtime 0.000000 bdev 0 extents [1:0x100000+400000,1:0x1100000+400000]) -12> 2016-11-23 23:29:29.635794 7fbea6cf7700 20 bluefs _flush_range in 1:0x1100000+400000 x_off 0x186000 -11> 2016-11-23 23:29:29.635797 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_write 0x1286000~1000 (direct) -10> 2016-11-23 23:29:29.635802 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_write rebuilding buffer to be aligned -9> 2016-11-23 23:29:29.635803 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) _aio_log_start 0x1286000~1000 -8> 2016-11-23 23:29:29.635806 7fbea6cf7700 5 bdev(/var/lib/ceph/osd/ceph-4/block) aio_write 0x1286000~1000 aio 0x5630e05d8e20 -7> 2016-11-23 23:29:29.635808 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_submit ioc 0x5630dd107950 pending 1 running 0 -6> 2016-11-23 23:29:29.635808 7fbeadd05700 10 bdev(/var/lib/ceph/osd/ceph-4/block) _aio_thread finished aio 0x5630e05d8e20 r -1000 ioc 0x5630dd107d40 with 0 aios left ...ok *now* it finished..
Updated by Sage Weil over 7 years ago
- Status changed from In Progress to Fix Under Review
- Priority changed from Urgent to Immediate
Updated by Sage Weil over 7 years ago
- Status changed from Fix Under Review to Resolved
Actions