Project

General

Profile

Bug #22115

OSD SIGABRT on bluestore_prefer_deferred_size = 104857600: assert(_buffers.size() <= 1024)

Added by Марк Коренберг over 6 years ago. Updated about 6 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Shinobu Kinjo
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

1. How to reproduce:

ceph.conf on host A:

[global]
osd_recovery_max_active = 1
osd_scrub_begin_hour = 1
osd_scrub_end_hour = 7
osd_scrub_during_recovery = false

bluestore_cache_size = 536870912

bluestore_prefer_deferred_size_hdd = 104857600
bluestore_prefer_deferred_size_ssd = 104857600
bluestore_prefer_deferred_size = 104857600

ceph.conf on host B/C/D:

[global]
osd_recovery_max_active = 1
osd_scrub_begin_hour = 1
osd_scrub_end_hour = 7
osd_scrub_during_recovery = false

bluestore_cache_size = 536870912

bluestore_prefer_deferred_size_hdd = 104857600

All OSDs set up with bluestore, with WAL and DB put on separate SSD:
DB 1G
WAL 576M

=====

OSD gets SIGABRT after running command (on separate host):

ceph -f plain tell osd.10 bench $((1024*1024*700)) $((4096*100))

logs from journald:

ноя 13 13:51:52 node4 ceph-osd[30116]: starting osd.10 at - osd_data /var/lib/ceph/osd/ceph-10 /var/lib/ceph/osd/ceph-10/journal
ноя 13 13:52:22 node4 ceph-osd[30116]: 2017-11-13 13:52:22.140075 7f6496d74e00 -1 osd.10 12632 log_to_monitors {default=true}
ноя 13 13:52:22 node4 ceph-osd[30116]: 2017-11-13 13:52:22.152344 7f6496d74e00 -1 osd.10 12632 mon_cmd_maybe_osd_create fail: 'osd.10 has already bound to class 'hdd', can not reset class to 'ssd'; use 'ceph osd crush rm-device-class <osd>' to remove old class first': (16) Device or resource busy
ноя 13 13:53:22 node4 ceph-osd[30116]: /build/ceph-12.2.1/src/include/buffer.h: In function 'void ceph::buffer::list::prepare_iov(VectorT*) const [with VectorT = boost::container::small_vector<iovec, 4ul>]' thread 7f64891b1700 time 2017-11-13 13:53:22.026982
ноя 13 13:53:22 node4 ceph-osd[30116]: /build/ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024)
ноя 13 13:53:22 node4 ceph-osd[30116]:  ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
ноя 13 13:53:22 node4 ceph-osd[30116]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55cff1b0bd82]
ноя 13 13:53:22 node4 ceph-osd[30116]:  2: (KernelDevice::aio_write(unsigned long, ceph::buffer::list&, IOContext*, bool)+0x15e6) [0x55cff1aac2a6]
ноя 13 13:53:22 node4 ceph-osd[30116]:  3: (BlueStore::_deferred_submit_unlock(BlueStore::OpSequencer*)+0x72c) [0x55cff195679c]
ноя 13 13:53:22 node4 ceph-osd[30116]:  4: (BlueStore::deferred_try_submit()+0x699) [0x55cff1962539]
ноя 13 13:53:22 node4 ceph-osd[30116]:  5: (BlueStore::_kv_finalize_thread()+0xbc1) [0x55cff1980521]
ноя 13 13:53:22 node4 ceph-osd[30116]:  6: (BlueStore::KVFinalizeThread::entry()+0xd) [0x55cff19da10d]
ноя 13 13:53:22 node4 ceph-osd[30116]:  7: (()+0x7494) [0x7f649457e494]
ноя 13 13:53:22 node4 ceph-osd[30116]:  8: (clone()+0x3f) [0x7f6493605aff]
ноя 13 13:53:22 node4 ceph-osd[30116]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
ноя 13 13:53:22 node4 ceph-osd[30116]: 2017-11-13 13:53:22.031270 7f64891b1700 -1 /build/ceph-12.2.1/src/include/buffer.h: In function 'void ceph::buffer::list::prepare_iov(VectorT*) const [with VectorT = boost::container::small_vector<iovec, 4ul>]' thread 7f64891b1700 time 2017-11-13 13:53:22.026982
ноя 13 13:53:22 node4 ceph-osd[30116]: /build/ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024)
ноя 13 13:53:22 node4 ceph-osd[30116]:  ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
ноя 13 13:53:22 node4 ceph-osd[30116]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55cff1b0bd82]
ноя 13 13:53:22 node4 ceph-osd[30116]:  2: (KernelDevice::aio_write(unsigned long, ceph::buffer::list&, IOContext*, bool)+0x15e6) [0x55cff1aac2a6]
ноя 13 13:53:22 node4 ceph-osd[30116]:  3: (BlueStore::_deferred_submit_unlock(BlueStore::OpSequencer*)+0x72c) [0x55cff195679c]
ноя 13 13:53:22 node4 ceph-osd[30116]:  4: (BlueStore::deferred_try_submit()+0x699) [0x55cff1962539]
ноя 13 13:53:22 node4 ceph-osd[30116]:  5: (BlueStore::_kv_finalize_thread()+0xbc1) [0x55cff1980521]
ноя 13 13:53:22 node4 ceph-osd[30116]:  6: (BlueStore::KVFinalizeThread::entry()+0xd) [0x55cff19da10d]
ноя 13 13:53:22 node4 ceph-osd[30116]:  7: (()+0x7494) [0x7f649457e494]
ноя 13 13:53:22 node4 ceph-osd[30116]:  8: (clone()+0x3f) [0x7f6493605aff]
ноя 13 13:53:22 node4 ceph-osd[30116]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
ноя 13 13:53:22 node4 ceph-osd[30116]:      0> 2017-11-13 13:53:22.031270 7f64891b1700 -1 /build/ceph-12.2.1/src/include/buffer.h: In function 'void ceph::buffer::list::prepare_iov(VectorT*) const [with VectorT = boost::container::small_vector<iovec, 4ul>]' thread 7f64891b1700 time 2017-11-13 13:53:22.026982
ноя 13 13:53:22 node4 ceph-osd[30116]: /build/ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024)
ноя 13 13:53:22 node4 ceph-osd[30116]:  ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
ноя 13 13:53:22 node4 ceph-osd[30116]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55cff1b0bd82]
ноя 13 13:53:22 node4 ceph-osd[30116]:  2: (KernelDevice::aio_write(unsigned long, ceph::buffer::list&, IOContext*, bool)+0x15e6) [0x55cff1aac2a6]
ноя 13 13:53:22 node4 ceph-osd[30116]:  3: (BlueStore::_deferred_submit_unlock(BlueStore::OpSequencer*)+0x72c) [0x55cff195679c]
ноя 13 13:53:22 node4 ceph-osd[30116]:  4: (BlueStore::deferred_try_submit()+0x699) [0x55cff1962539]
ноя 13 13:53:22 node4 ceph-osd[30116]:  5: (BlueStore::_kv_finalize_thread()+0xbc1) [0x55cff1980521]
ноя 13 13:53:22 node4 ceph-osd[30116]:  6: (BlueStore::KVFinalizeThread::entry()+0xd) [0x55cff19da10d]
ноя 13 13:53:22 node4 ceph-osd[30116]:  7: (()+0x7494) [0x7f649457e494]
ноя 13 13:53:22 node4 ceph-osd[30116]:  8: (clone()+0x3f) [0x7f6493605aff]
ноя 13 13:53:22 node4 ceph-osd[30116]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
ноя 13 13:53:22 node4 ceph-osd[30116]: *** Caught signal (Aborted) **
ноя 13 13:53:22 node4 ceph-osd[30116]:  in thread 7f64891b1700 thread_name:bstore_kv_final
ноя 13 13:53:22 node4 ceph-osd[30116]:  ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
ноя 13 13:53:22 node4 ceph-osd[30116]:  1: (()+0xa0c554) [0x55cff1ac4554]
ноя 13 13:53:22 node4 ceph-osd[30116]:  2: (()+0x110c0) [0x7f64945880c0]
ноя 13 13:53:22 node4 ceph-osd[30116]:  3: (gsignal()+0xcf) [0x7f649354ffcf]
ноя 13 13:53:22 node4 ceph-osd[30116]:  4: (abort()+0x16a) [0x7f64935513fa]
ноя 13 13:53:22 node4 ceph-osd[30116]:  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x55cff1b0bf0e]
ноя 13 13:53:22 node4 ceph-osd[30116]:  6: (KernelDevice::aio_write(unsigned long, ceph::buffer::list&, IOContext*, bool)+0x15e6) [0x55cff1aac2a6]
ноя 13 13:53:22 node4 ceph-osd[30116]:  7: (BlueStore::_deferred_submit_unlock(BlueStore::OpSequencer*)+0x72c) [0x55cff195679c]
ноя 13 13:53:22 node4 ceph-osd[30116]:  8: (BlueStore::deferred_try_submit()+0x699) [0x55cff1962539]
ноя 13 13:53:22 node4 ceph-osd[30116]:  9: (BlueStore::_kv_finalize_thread()+0xbc1) [0x55cff1980521]
ноя 13 13:53:22 node4 ceph-osd[30116]:  10: (BlueStore::KVFinalizeThread::entry()+0xd) [0x55cff19da10d]
ноя 13 13:53:22 node4 ceph-osd[30116]:  11: (()+0x7494) [0x7f649457e494]
ноя 13 13:53:22 node4 ceph-osd[30116]:  12: (clone()+0x3f) [0x7f6493605aff]
ноя 13 13:53:22 node4 ceph-osd[30116]: 2017-11-13 13:53:22.071001 7f64891b1700 -1 *** Caught signal (Aborted) **
ноя 13 13:53:22 node4 ceph-osd[30116]:  in thread 7f64891b1700 thread_name:bstore_kv_final
ноя 13 13:53:22 node4 ceph-osd[30116]:  ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
ноя 13 13:53:22 node4 ceph-osd[30116]:  1: (()+0xa0c554) [0x55cff1ac4554]
ноя 13 13:53:22 node4 ceph-osd[30116]:  2: (()+0x110c0) [0x7f64945880c0]
ноя 13 13:53:22 node4 ceph-osd[30116]:  3: (gsignal()+0xcf) [0x7f649354ffcf]
ноя 13 13:53:22 node4 ceph-osd[30116]:  4: (abort()+0x16a) [0x7f64935513fa]
ноя 13 13:53:22 node4 ceph-osd[30116]:  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x55cff1b0bf0e]
ноя 13 13:53:22 node4 ceph-osd[30116]:  6: (KernelDevice::aio_write(unsigned long, ceph::buffer::list&, IOContext*, bool)+0x15e6) [0x55cff1aac2a6]
ноя 13 13:53:22 node4 ceph-osd[30116]:  7: (BlueStore::_deferred_submit_unlock(BlueStore::OpSequencer*)+0x72c) [0x55cff195679c]
ноя 13 13:53:22 node4 ceph-osd[30116]:  8: (BlueStore::deferred_try_submit()+0x699) [0x55cff1962539]
ноя 13 13:53:22 node4 ceph-osd[30116]:  9: (BlueStore::_kv_finalize_thread()+0xbc1) [0x55cff1980521]
ноя 13 13:53:22 node4 ceph-osd[30116]:  10: (BlueStore::KVFinalizeThread::entry()+0xd) [0x55cff19da10d]
ноя 13 13:53:22 node4 ceph-osd[30116]:  11: (()+0x7494) [0x7f649457e494]
ноя 13 13:53:22 node4 ceph-osd[30116]:  12: (clone()+0x3f) [0x7f6493605aff]
ноя 13 13:53:22 node4 ceph-osd[30116]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
ноя 13 13:53:22 node4 ceph-osd[30116]:      0> 2017-11-13 13:53:22.071001 7f64891b1700 -1 *** Caught signal (Aborted) **
ноя 13 13:53:22 node4 ceph-osd[30116]:  in thread 7f64891b1700 thread_name:bstore_kv_final
ноя 13 13:53:22 node4 ceph-osd[30116]:  ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)
ноя 13 13:53:22 node4 ceph-osd[30116]:  1: (()+0xa0c554) [0x55cff1ac4554]
ноя 13 13:53:22 node4 ceph-osd[30116]:  2: (()+0x110c0) [0x7f64945880c0]
ноя 13 13:53:22 node4 ceph-osd[30116]:  3: (gsignal()+0xcf) [0x7f649354ffcf]
ноя 13 13:53:22 node4 ceph-osd[30116]:  4: (abort()+0x16a) [0x7f64935513fa]
ноя 13 13:53:22 node4 ceph-osd[30116]:  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x55cff1b0bf0e]
ноя 13 13:53:22 node4 ceph-osd[30116]:  6: (KernelDevice::aio_write(unsigned long, ceph::buffer::list&, IOContext*, bool)+0x15e6) [0x55cff1aac2a6]
ноя 13 13:53:22 node4 ceph-osd[30116]:  7: (BlueStore::_deferred_submit_unlock(BlueStore::OpSequencer*)+0x72c) [0x55cff195679c]
ноя 13 13:53:22 node4 ceph-osd[30116]:  8: (BlueStore::deferred_try_submit()+0x699) [0x55cff1962539]
ноя 13 13:53:22 node4 ceph-osd[30116]:  9: (BlueStore::_kv_finalize_thread()+0xbc1) [0x55cff1980521]
ноя 13 13:53:22 node4 ceph-osd[30116]:  10: (BlueStore::KVFinalizeThread::entry()+0xd) [0x55cff19da10d]
ноя 13 13:53:22 node4 ceph-osd[30116]:  11: (()+0x7494) [0x7f649457e494]
ноя 13 13:53:22 node4 ceph-osd[30116]:  12: (clone()+0x3f) [0x7f6493605aff]
ноя 13 13:53:22 node4 ceph-osd[30116]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


Related issues

Related to Ceph - Bug #21932: OSD crash on boot with assert caused by Bluefs on flush write Resolved 10/26/2017

History

#1 Updated by Марк Коренберг over 6 years ago

Also this :

# journalctl | fgrep -Fi 'assert('
ноя 13 13:29:04 node4 ceph-osd[29294]: /build/ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024)
ноя 13 13:29:04 node4 ceph-osd[29294]: /build/ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024)
ноя 13 13:29:04 node4 ceph-osd[29294]: /build/ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024)
ноя 13 13:51:30 node4 ceph-osd[29922]: /build/ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024)
ноя 13 13:51:30 node4 ceph-osd[29922]: /build/ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024)
ноя 13 13:51:30 node4 ceph-osd[29922]: /build/ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024)
ноя 13 13:53:22 node4 ceph-osd[30116]: /build/ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024)
ноя 13 13:53:22 node4 ceph-osd[30116]: /build/ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024)
ноя 13 13:53:22 node4 ceph-osd[30116]: /build/ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024)
ноя 13 14:33:53 node4 ceph-osd[30881]: /build/ceph-12.2.1/src/os/bluestore/KernelDevice.cc: 521: FAILED assert(r == 0)
ноя 13 14:33:53 node4 ceph-osd[30881]: /build/ceph-12.2.1/src/os/bluestore/KernelDevice.cc: 521: FAILED assert(r == 0)
ноя 13 14:33:53 node4 ceph-osd[30881]: /build/ceph-12.2.1/src/os/bluestore/KernelDevice.cc: 521: FAILED assert(r == 0)
ноя 13 14:34:37 node4 ceph-osd[1344]: /build/ceph-12.2.1/src/os/bluestore/KernelDevice.cc: 521: FAILED assert(r == 0)
ноя 13 14:34:37 node4 ceph-osd[1344]: /build/ceph-12.2.1/src/os/bluestore/KernelDevice.cc: 521: FAILED assert(r == 0)
ноя 13 14:34:37 node4 ceph-osd[1344]: /build/ceph-12.2.1/src/os/bluestore/KernelDevice.cc: 521: FAILED assert(r == 0)
ноя 13 14:35:18 node4 ceph-osd[1528]: /build/ceph-12.2.1/src/os/bluestore/KernelDevice.cc: 521: FAILED assert(r == 0)
ноя 13 14:35:18 node4 ceph-osd[1528]: /build/ceph-12.2.1/src/os/bluestore/KernelDevice.cc: 521: FAILED assert(r == 0)
ноя 13 14:35:18 node4 ceph-osd[1528]: /build/ceph-12.2.1/src/os/bluestore/KernelDevice.cc: 521: FAILED assert(r == 0)
ноя 13 14:36:03 node4 ceph-osd[1603]: /build/ceph-12.2.1/src/os/bluestore/KernelDevice.cc: 521: FAILED assert(r == 0)
ноя 13 14:36:03 node4 ceph-osd[1603]: /build/ceph-12.2.1/src/os/bluestore/KernelDevice.cc: 521: FAILED assert(r == 0)
ноя 13 14:36:03 node4 ceph-osd[1603]: /build/ceph-12.2.1/src/os/bluestore/KernelDevice.cc: 521: FAILED assert(r == 0)
ноя 13 14:36:40 node4 ceph-osd[1660]: /build/ceph-12.2.1/src/os/bluestore/KernelDevice.cc: 521: FAILED assert(r == 0)
ноя 13 14:36:40 node4 ceph-osd[1660]: /build/ceph-12.2.1/src/os/bluestore/KernelDevice.cc: 521: FAILED assert(r == 0)
ноя 13 14:36:40 node4 ceph-osd[1660]: /build/ceph-12.2.1/src/os/bluestore/KernelDevice.cc: 521: FAILED assert(r == 0)
ноя 13 14:37:18 node4 ceph-osd[1712]: /build/ceph-12.2.1/src/os/bluestore/KernelDevice.cc: 521: FAILED assert(r == 0)
ноя 13 14:37:18 node4 ceph-osd[1712]: /build/ceph-12.2.1/src/os/bluestore/KernelDevice.cc: 521: FAILED assert(r == 0)
ноя 13 14:37:18 node4 ceph-osd[1712]: /build/ceph-12.2.1/src/os/bluestore/KernelDevice.cc: 521: FAILED assert(r == 0)

#2 Updated by Sage Weil over 6 years ago

  • Project changed from Ceph to bluestore

#3 Updated by Shinobu Kinjo over 6 years ago

  • Assignee set to Shinobu Kinjo

#4 Updated by Shinobu Kinjo over 6 years ago

Could you elaborate on what is the condition where caused assertion? Node a or b,c,d, or all?

#5 Updated by Марк Коренберг over 6 years ago

Unfortunatelly, no. It's not reproduced now. This happened only on one node, and by some miracle disappeared after some number of osd restarts.

#6 Updated by Shinobu Kinjo about 6 years ago

  • Status changed from New to Need More Info

#7 Updated by Sage Weil about 6 years ago

  • Status changed from Need More Info to Duplicate

see #21932

#8 Updated by Sage Weil about 6 years ago

  • Related to Bug #21932: OSD crash on boot with assert caused by Bluefs on flush write added

Also available in: Atom PDF