Project

General

Profile

Bug #17824

bluestore: multiple aios allocated at same address?

Added by Sage Weil 9 months ago. Updated 8 months ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
Start date:
11/08/2016
Due date:
% Done:

0%

Source:
Q/A
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Release:
Needs Doc:
No

Description

  -284> 2016-11-08 08:39:58.754161 7fcb14d71700  5 bdev(/var/lib/ceph/osd/ceph-0/block) aio_write 0x3b8b000~2000 aio 0x559e30f904a0
  -283> 2016-11-08 08:39:58.754163 7fcb14d71700 20 bdev(/var/lib/ceph/osd/ceph-0/block) aio_submit ioc 0x559e29f6fdd0 pending 1 running 0
  -282> 2016-11-08 08:39:58.754164 7fcb14d71700 20 bdev(/var/lib/ceph/osd/ceph-0/block) aio_submit  aio <b>0x559e30f904a0</b> fd 24 0x3b8b000~2000
...
  -260> 2016-11-08 08:39:58.754827 7fcb14d71700  5 bdev(/var/lib/ceph/osd/ceph-0/block) aio_write 0x241d8000~1000 aio 0x559e30f904a0
  -259> 2016-11-08 08:39:58.754829 7fcb14d71700 20 bdev(/var/lib/ceph/osd/ceph-0/block) aio_submit ioc 0x559e29f70490 pending 1 running 0
  -258> 2016-11-08 08:39:58.754830 7fcb14d71700 20 bdev(/var/lib/ceph/osd/ceph-0/block) aio_submit  aio <b>0x559e30f904a0</b> fd 24 0x241d8000~1000
...
  -257> 2016-11-08 08:39:58.754854 7fcb1bd7f700 10 bdev(/var/lib/ceph/osd/ceph-0/block) _aio_thread finished aio 0x559e30f904a0 r -1000 ioc 0x559e29f6fdd0 with 0 aios left

/a/sage-2016-11-08_02:52:28-rados-wip-sage-testing---basic-smithi/531641

This is very perplexing. The aio address is coming from a push_back on a list.. in this case two different lists. Somehow they ended up with the same address?

Also surprising (although not impossible) is that the ioc value in the aio_thread finish is the first ioc, not the second... but for hte second one it was just assigned before -258 line was printed in a different thread, so this could be blamed on the CPU cache.

In any case, not sure what to make of it. Heap corruption?

History

#1 Updated by Sage Weil 8 months ago

  • Status changed from Need More Info to Resolved

#2 Updated by Sage Weil 8 months ago

  • Status changed from Resolved to In Progress

It looks like this is the fault of aio_wait not actually waiting? Looking at logs from /a/sage-2016-11-23_20:05:59-rados-wip-sage-testing---basic-smithi/573264

   -50> 2016-11-23 23:29:29.635616 7fbea6cf7700 20 bluefs _flush_range caching tail of 0x176 and padding block with 0xe8a
   -49> 2016-11-23 23:29:29.635620 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_write 0x173d000~3000 (direct)
   -48> 2016-11-23 23:29:29.635626 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_write rebuilding buffer to be aligned
..
   -46> 2016-11-23 23:29:29.635627 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) _aio_log_start 0x173d000~3000
..
   -44> 2016-11-23 23:29:29.635631 7fbea6cf7700  5 bdev(/var/lib/ceph/osd/ceph-4/block) aio_write 0x173d000~3000 aio 0x5630e05d8e20
..
   -42> 2016-11-23 23:29:29.635633 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_submit ioc 0x5630dd107d40 pending 1 running 0
   -41> 2016-11-23 23:29:29.635636 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_submit  aio 0x5630e05d8e20 fd 25 0x173d000~3000
..
   -28> 2016-11-23 23:29:29.635729 7fbea6cf7700 20 bluefs _flush_range h 0x5630dd101700 pos now 0x33f176
..look at this..
   -27> 2016-11-23 23:29:29.635737 7fbea6cf7700 10 bluefs _claim_completed_aios got 1 aios
..
   -26> 2016-11-23 23:29:29.635739 7fbea6cf7700 10 bluefs wait_for_aio 0x5630dd101700
   -25> 2016-11-23 23:29:29.635740 7fbea6cf7700 20 bdev aio_wait 0x5630dd107d40 done
.. it didn't wait! ..
   -24> 2016-11-23 23:29:29.635742 7fbea6cf7700 10 bluefs wait_for_aio 0x5630dd101700 done in 0.000001
   -23> 2016-11-23 23:29:29.635745 7fbea6cf7700 20 bluefs _fsync file metadata was dirty (1415) on file(ino 25 size 0x33f176 mtime 2016-11-23 23:29:29.635598 bdev 1 extents [1:0x1000000+100000,1:0x1500000+300000]), flushing log
   -22> 2016-11-23 23:29:29.635750 7fbea6cf7700 20 bluefs _flush_and_sync_log 1 dirty_files
   -21> 2016-11-23 23:29:29.635752 7fbea6cf7700 20 bluefs _flush_and_sync_log   op_file_update file(ino 25 size 0x33f176 mtime 2016-11-23 23:29:29.635598 bdev 1 extents [1:0x1000000+100000,1:0x1500000+300000])
   -20> 2016-11-23 23:29:29.635760 7fbea6cf7700 10 bluefs _flush_and_sync_log txn(seq 1415 len 0x33 crc 0xbf81f265)
   -19> 2016-11-23 23:29:29.635765 7fbea6cf7700 10 bluefs _pad_bl padding with 0xfa7 zeros
   -18> 2016-11-23 23:29:29.635769 7fbea6cf7700 20 bluefs flush_bdev
   -17> 2016-11-23 23:29:29.635770 7fbea6cf7700 10 bdev(/var/lib/ceph/osd/ceph-4/block) flush start
   -16> 2016-11-23 23:29:29.635782 7fbea6cf7700  5 bdev(/var/lib/ceph/osd/ceph-4/block) flush in 0.000010
   -15> 2016-11-23 23:29:29.635784 7fbea6cf7700 10 bluefs _flush 0x5630dd101200 0x586000~1000 to file(ino 1 size 0x586000 mtime 0.000000 bdev 0 extents [1:0x100000+400000,1:0x1100000+400000])
   -14> 2016-11-23 23:29:29.635788 7fbea6cf7700 10 bluefs _flush_range 0x5630dd101200 pos 0x586000 0x586000~1000 to file(ino 1 size 0x586000 mtime 0.000000 bdev 0 extents [1:0x100000+400000,1:0x1100000+400000])
   -13> 2016-11-23 23:29:29.635792 7fbea6cf7700 20 bluefs _flush_range file now file(ino 1 size 0x587000 mtime 0.000000 bdev 0 extents [1:0x100000+400000,1:0x1100000+400000])
   -12> 2016-11-23 23:29:29.635794 7fbea6cf7700 20 bluefs _flush_range in 1:0x1100000+400000 x_off 0x186000
   -11> 2016-11-23 23:29:29.635797 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_write 0x1286000~1000 (direct)
   -10> 2016-11-23 23:29:29.635802 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_write rebuilding buffer to be aligned
    -9> 2016-11-23 23:29:29.635803 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) _aio_log_start 0x1286000~1000
    -8> 2016-11-23 23:29:29.635806 7fbea6cf7700  5 bdev(/var/lib/ceph/osd/ceph-4/block) aio_write 0x1286000~1000 aio 0x5630e05d8e20
    -7> 2016-11-23 23:29:29.635808 7fbea6cf7700 20 bdev(/var/lib/ceph/osd/ceph-4/block) aio_submit ioc 0x5630dd107950 pending 1 running 0
    -6> 2016-11-23 23:29:29.635808 7fbeadd05700 10 bdev(/var/lib/ceph/osd/ceph-4/block) _aio_thread finished aio 0x5630e05d8e20 r -1000 ioc 0x5630dd107d40 with 0 aios left
...ok *now* it finished..

#3 Updated by Sage Weil 8 months ago

  • Status changed from In Progress to Need Review
  • Priority changed from Urgent to Immediate

#4 Updated by Sage Weil 8 months ago

  • Status changed from Need Review to Resolved

Also available in: Atom PDF