Project

General

Profile

Actions

Bug #48696

closed

osd assert because of aios will be truncated.

Added by hongsong wu over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

  • 1.anomalies
    osd assert after it‘s reboot,just like the following:
    2020-12-15 10:32:19.008476 7fec4fc5dec0 5 bdev(0x5567f772afc0 /var/lib/ceph/osd/ceph-5/block) aio_write 0x1891e7c000~1000 aio 0x55686d70b690
    2020-12-15 10:32:19.008477 7fec4fc5dec0 20 bdev(0x5567f772afc0 /var/lib/ceph/osd/ceph-5/block) aio_submit ioc 0x5567f7c02058 pending 133794 running 0
    2020-12-15 10:32:19.009711 7fec4fc5dec0 -1 *** Caught signal (Segmentation fault) **
    in thread 7fec4fc5dec0 thread_name:ceph-osd
    
  • 2.reason
    after the osd reboot,it will call the fuction _deferred_replay,then it will call submit_batch,but the unit of aios_size is uint16_t, so if the number of aios is is greater than 65535,it will be truncated.then osd will assert.
    int submit_batch(aio_iter begin, aio_iter end, uint16_t aios_size, void *priv, int *retries);
    
  • 3.reproduce the scene as shown below:
    1) change the osd config,just like below and restart osds:
    bluestore_throttle_bytes = 67108864000
    bluestore_throttle_deferred_bytes = 134217728000
    bluestore_deferred_batch_ops = 64000000
    bluestore_max_deferred_txc = 32000000
    

    2) create pool above the osds.
    3) run fio above the pool for 30 seconds and kill the osd by `kill -9`
    4) reboot the osds and you will find the anomalies.
Actions #1

Updated by Kefu Chai over 3 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 38709
Actions #2

Updated by Kefu Chai over 3 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF