Project

General

Profile

Actions

Bug #48849

open

BlueStore.cc: 11380: FAILED ceph_assert(r == 0)

Added by Christian Rohmann over 3 years ago. Updated almost 3 years ago.

Status:
Need More Info
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We experienced a few OSD crashes all with the same signature in the logs:

--- cut ---
2021-01-08 06:13:54.946 7f37289d0700 -1 rocksdb: submit_common error: Corruption: block checksum mismatch: expected 1858706858, got 291516022 in db/146430.sst offset 32863300 size 538979 code = 2 Rocksdb transaction:
Put( Prefix = P key = 0x0000000000000974'.0001234154.00000000000472581834' Value size = 184)
Put( Prefix = P key = 0x0000000000000974'._fastinfo' Value size = 186)
Put( Prefix = O key = 0x7f800000000000000205e3b390217262'd_data.e46d9e7c3dbd3d.0000000000007c24!='0xfffffffffffffffeffffffffffffffff6f00000000'x' Value size = 468)
Put( Prefix = O key = 0x7f800000000000000205e3b390217262'd_data.e46d9e7c3dbd3d.0000000000007c24!='0xfffffffffffffffeffffffffffffffff'o' Value size = 541)
Merge( Prefix = b key = 0x0000012a02680000 Value size = 16)
Merge( Prefix = b key = 0x0000012ca1980000 Value size = 16)
Merge( Prefix = b key = 0x0000012ca1980000 Value size = 16)
Merge( Prefix = b key = 0x0000017981680000 Value size = 16)
2021-01-08 06:13:54.974 7f37289d0700 -1 /build/ceph-14.2.16/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_kv_sync_thread()' thread 7f37289d0700 time 2021-01-08 06:13:54.954115
/build/ceph-14.2.16/src/os/bluestore/BlueStore.cc: 11380: FAILED ceph_assert(r == 0)

ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55fd49604fba]
2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x55fd49605195]
3: (BlueStore::_kv_sync_thread()+0x1144) [0x55fd49b78bb4]
4: (BlueStore::KVSyncThread::entry()+0xd) [0x55fd49b9bd8d]
5: (()+0x76db) [0x7f37381136db]
6: (clone()+0x3f) [0x7f3736eb071f]

2021-01-08 06:13:54.978 7f37289d0700 -1 ** Caught signal (Aborted) *
in thread 7f37289d0700 thread_name:bstore_kv_sync

ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus (stable)
1: (()+0x12980) [0x7f373811e980]
2: (gsignal()+0xc7) [0x7f3736dcdfb7]
3: (abort()+0x141) [0x7f3736dcf921]
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x55fd4960500b]
5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x55fd49605195]
6: (BlueStore::_kv_sync_thread()+0x1144) [0x55fd49b78bb4]
7: (BlueStore::KVSyncThread::entry()+0xd) [0x55fd49b9bd8d]
8: (()+0x76db) [0x7f37381136db]
9: (clone()+0x3f) [0x7f3736eb071f]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- cut ---

While there was somewhat of a burst of multiple OSD on the same machine experiencing this very error, it has occurred on other nodes as well.
I added quite a bit more of the ceph-osd.log in osd_log_assert_failure.log and would be more than happy to provide mode details to narrow this down if I can.


Files

osd_log__assert_failure.log.gz (204 KB) osd_log__assert_failure.log.gz Christian Rohmann, 01/12/2021 12:39 PM
osd-59-fsck.output (7.9 KB) osd-59-fsck.output fsck --deep on crashed OSD Christian Rohmann, 01/13/2021 02:45 PM
all_crash_reports.txt (8.33 KB) all_crash_reports.txt ceph crash info for all crashes Christian Rohmann, 01/17/2021 03:31 PM
crash_stacktraces.log (65.2 KB) crash_stacktraces.log Christian Rohmann, 01/19/2021 10:11 AM
journald_output_all_osd.tar.gz (714 KB) journald_output_all_osd.tar.gz Christian Rohmann, 01/19/2021 06:26 PM
perf_dump_osds.tar.gz (22 KB) perf_dump_osds.tar.gz ceph osd perf dump of all OSDs Christian Rohmann, 02/02/2021 09:56 AM
Actions

Also available in: Atom PDF