Project

General

Profile

Bug #20653

bluestore: aios don't complete on very large writes on xenial

Added by Sage Weil over 6 years ago. Updated over 6 years ago.

Status:
Can't reproduce
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

cpach:ceph-deploy (master) 03:06 PM $ dpkg -l linux-image-generic
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version      Architecture Description
+++-==============-============-============-=================================
ii  linux-image-ge 4.4.0.83.89  amd64        Generic Linux kernel image
cpach:ceph-deploy (master) 03:07 PM $ uname -a
Linux cpach 4.4.0-65-generic #86-Ubuntu SMP Thu Feb 23 17:49:58 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
cpach:ceph-deploy (master) 03:07 PM $ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.2 LTS
Release:        16.04
Codename:       xenial

40x ~12MB osdmap writes, resulting in ~9000 (?) aios, but we stop draining them about halfway through.

(worked around by reducing osd_map_message_max to 5.)

History

#1 Updated by Sage Weil over 6 years ago

2017-07-16 14:57:18.653415 7f846a967700 10 bdev(0x55dec129a900 /var/lib/ceph/osd/ceph-11/block) _aio_thread finished aio 0x55defc6c8810 r 65536 ioc 0x55dedbc2ee00 with 4183 aios left
2017-07-16 14:57:18.653417 7f846a967700 20 bdev(0x55dec129a900 /var/lib/ceph/osd/ceph-11/block) _aio_log_finish 1 0x6827bda000~10000
2017-07-16 14:57:18.653419 7f846a967700 10 bdev(0x55dec129a900 /var/lib/ceph/osd/ceph-11/block) _aio_thread finished aio 0x55defc6c8950 r 65536 ioc 0x55dedbc2ee00 with 4182 aios left
2017-07-16 14:57:18.653421 7f846a967700 20 bdev(0x55dec129a900 /var/lib/ceph/osd/ceph-11/block) _aio_log_finish 1 0x6827bea000~10000
2017-07-16 14:57:18.653423 7f846a967700 10 bdev(0x55dec129a900 /var/lib/ceph/osd/ceph-11/block) _aio_thread finished aio 0x55defc6c8a90 r 65536 ioc 0x55dedbc2ee00 with 4181 aios left
2017-07-16 14:57:18.653425 7f846a967700 20 bdev(0x55dec129a900 /var/lib/ceph/osd/ceph-11/block) _aio_log_finish 1 0x6827bfa000~10000
2017-07-16 14:57:18.653427 7f846a967700 10 bdev(0x55dec129a900 /var/lib/ceph/osd/ceph-11/block) _aio_thread finished aio 0x55defc6c8bd0 r 65536 ioc 0x55dedbc2ee00 with 4180 aios left
2017-07-16 14:57:18.653429 7f846a967700 20 bdev(0x55dec129a900 /var/lib/ceph/osd/ceph-11/block) _aio_log_finish 1 0x6827c0a000~10000
2017-07-16 14:57:18.653430 7f846a967700 10 bdev(0x55dec129a900 /var/lib/ceph/osd/ceph-11/block) _aio_thread finished aio 0x55defc6c8d10 r 65536 ioc 0x55dedbc2ee00 with 4179 aios left
2017-07-16 14:57:18.653432 7f846a967700 20 bdev(0x55dec129a900 /var/lib/ceph/osd/ceph-11/block) _aio_log_finish 1 0x6827c1a000~10000
2017-07-16 14:57:18.653434 7f846a967700 10 bdev(0x55dec129a900 /var/lib/ceph/osd/ceph-11/block) _aio_thread finished aio 0x55defc6c8e50 r 65536 ioc 0x55dedbc2ee00 with 4178 aios left
2017-07-16 14:57:18.653436 7f846a967700 20 bdev(0x55dec129a900 /var/lib/ceph/osd/ceph-11/block) _aio_log_finish 1 0x6827c2a000~10000
2017-07-16 14:57:18.653438 7f846a967700 10 bdev(0x55dec129a900 /var/lib/ceph/osd/ceph-11/block) _aio_thread finished aio 0x55defc6c8f90 r 65536 ioc 0x55dedbc2ee00 with 4177 aios left
2017-07-16 14:57:18.653440 7f846a967700 20 bdev(0x55dec129a900 /var/lib/ceph/osd/ceph-11/block) _aio_log_finish 1 0x6827c3a000~10000
2017-07-16 14:57:18.653442 7f846a967700 10 bdev(0x55dec129a900 /var/lib/ceph/osd/ceph-11/block) _aio_thread finished aio 0x55defc6c90d0 r 65536 ioc 0x55dedbc2ee00 with 4176 aios left
2017-07-16 14:57:18.653443 7f846a967700 20 bdev(0x55dec129a900 /var/lib/ceph/osd/ceph-11/block) _aio_log_finish 1 0x6827c4a000~10000
2017-07-16 14:57:18.653445 7f846a967700 10 bdev(0x55dec129a900 /var/lib/ceph/osd/ceph-11/block) _aio_thread finished aio 0x55defc6c9210 r 65536 ioc 0x55dedbc2ee00 with 4175 aios left
2017-07-16 14:57:18.653447 7f846a967700 20 bdev(0x55dec129a900 /var/lib/ceph/osd/ceph-11/block) _aio_log_finish 1 0x6827c5a000~10000
2017-07-16 14:57:18.653448 7f846a967700 10 bdev(0x55dec129a900 /var/lib/ceph/osd/ceph-11/block) _aio_thread finished aio 0x55defc6c9350 r 65536 ioc 0x55dedbc2ee00 with 4174 aios left

then nothing from that thread.

#2 Updated by Josh Durgin over 6 years ago

  • Priority changed from Urgent to Immediate

#4 Updated by Sage Weil over 6 years ago

  • Priority changed from Immediate to Urgent

Those last two failures are due to #20771 fixed by dfab9d9b5d75d0f87053b1a3727f62da72af6c91

I haven't been able to reproduce the large hangs with a ceph_test_objectstore tool on my xenial box (the same one where i saw the original hang). :(

#5 Updated by Sage Weil over 6 years ago

  • Status changed from 12 to Need More Info

#6 Updated by Sage Weil over 6 years ago

  • Priority changed from Urgent to High

#7 Updated by Sage Weil over 6 years ago

  • Status changed from Need More Info to Can't reproduce

I'm going to assume this was #21171

Also available in: Atom PDF