Project

General

Profile

Actions

Bug #51133

open

OSDs failing to start: rocksdb: submit_common error: Corruption: block checksum mismatch

Added by Ben G almost 3 years ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):

03648e85bd54b069c13692282b035d0f030786bfcaab3b1221178fd42f1130b8
109ab3ee85a3bc3337746eb0e056f21b397fcbad99e97dfa608e3080667f8744
83841b3bf546fd7e26065342f78c40b0d62a52eee87abbb2d08784c462a16920
89731bf93a69fe6a0abb4311dc16af93b46f1fae4949ca7654c5e30dfccc595b
a3b141a7ff14019694d6551ae1bff756bc7fb55f7dda44d04b022ae42c1be7cb
b10d16e2ecdc42f40eab364c6086bede7a0c39621f1d62c8302a9731ded47727
ee27598d524a96776e424624e7972be2c01b4ee6121ac2eacc836aa63d7cff1b


Description

After a while of high usage on my stack, I'm getting this error:

--- begin dump of recent events ---
debug    -83> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command assert hook 0x5624112e6540
debug    -82> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command abort hook 0x5624112e6540
debug    -81> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command leak_some_memory hook 0x5624112e6540
debug    -80> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command perfcounters_dump hook 0x5624112e6540
debug    -79> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command 1 hook 0x5624112e6540
debug    -78> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command perf dump hook 0x5624112e6540
debug    -77> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command perfcounters_schema hook 0x5624112e6540
debug    -76> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command perf histogram dump hook 0x5624112e6540
debug    -75> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command 2 hook 0x5624112e6540
debug    -74> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command perf schema hook 0x5624112e6540
debug    -73> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command perf histogram schema hook 0x5624112e6540
debug    -72> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command perf reset hook 0x5624112e6540
debug    -71> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command config show hook 0x5624112e6540
debug    -70> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command config help hook 0x5624112e6540
debug    -69> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command config set hook 0x5624112e6540
debug    -68> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command config unset hook 0x5624112e6540
debug    -67> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command config get hook 0x5624112e6540
debug    -66> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command config diff hook 0x5624112e6540
debug    -65> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command config diff get hook 0x5624112e6540
debug    -64> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command injectargs hook 0x5624112e6540
debug    -63> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command log flush hook 0x5624112e6540
debug    -62> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command log dump hook 0x5624112e6540
debug    -61> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command log reopen hook 0x5624112e6540
debug    -60> 2021-06-07T17:20:59.814+0000 7fe0ee9d4080  5 asok(0x56241139c000) register_command dump_mempools hook 0x56241138c328
debug    -59> 2021-06-07T17:20:59.887+0000 7fe0ee9d4080  0 set uid:gid to 167:167 (ceph:ceph)
debug    -58> 2021-06-07T17:20:59.887+0000 7fe0ee9d4080  0 ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable), process ceph-osd, pid 1
debug    -57> 2021-06-07T17:20:59.887+0000 7fe0ee9d4080  0 pidfile_write: ignore empty --pid-file
debug    -56> 2021-06-07T17:21:00.418+0000 7fe0ee9d4080  0 starting osd.7 osd_data /var/lib/ceph/osd/ceph-7 /var/lib/ceph/osd/ceph-7/journal
debug    -55> 2021-06-07T17:21:00.419+0000 7fe0ee9d4080 -1 Falling back to public interface
debug    -54> 2021-06-07T17:21:00.453+0000 7fe0ee9d4080  0 load: jerasure load: lrc load: isa
debug    -53> 2021-06-07T17:21:00.726+0000 7fe0ee9d4080  0 osd.7:0.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196)
debug    -52> 2021-06-07T17:21:00.989+0000 7fe0ee9d4080  0 osd.7:1.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196)
debug    -51> 2021-06-07T17:21:01.250+0000 7fe0ee9d4080  0 osd.7:2.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196)
debug    -50> 2021-06-07T17:21:01.510+0000 7fe0ee9d4080  0 osd.7:3.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196)
debug    -49> 2021-06-07T17:21:01.520+0000 7fe0ee9d4080  0 osd.7:4.OSDShard using op scheduler ClassedOpQueueScheduler(queue=WeightedPriorityQueue, cutoff=196)
debug    -48> 2021-06-07T17:21:01.521+0000 7fe0ee9d4080  0 bluestore(/var/lib/ceph/osd/ceph-7) _open_db_and_around read-only:0 repair:0
debug    -47> 2021-06-07T17:21:01.616+0000 7fe0ee9d4080  1  set rocksdb option max_total_wal_size = 1073741824
debug    -46> 2021-06-07T17:21:01.617+0000 7fe0ee9d4080  1  set rocksdb option compaction_readahead_size = 2097152
debug    -45> 2021-06-07T17:21:01.617+0000 7fe0ee9d4080  1  set rocksdb option max_write_buffer_number = 4
debug    -44> 2021-06-07T17:21:01.617+0000 7fe0ee9d4080  1  set rocksdb option max_background_compactions = 2
debug    -43> 2021-06-07T17:21:01.617+0000 7fe0ee9d4080  1  set rocksdb option compression = kNoCompression
debug    -42> 2021-06-07T17:21:01.617+0000 7fe0ee9d4080  1  set rocksdb option writable_file_max_buffer_size = 0
debug    -41> 2021-06-07T17:21:01.617+0000 7fe0ee9d4080  1  set rocksdb option min_write_buffer_number_to_merge = 1
debug    -40> 2021-06-07T17:21:01.617+0000 7fe0ee9d4080  1  set rocksdb option recycle_log_file_num = 4
debug    -39> 2021-06-07T17:21:01.617+0000 7fe0ee9d4080  1  set rocksdb option write_buffer_size = 268435456
debug    -38> 2021-06-07T17:21:01.617+0000 7fe0ee9d4080  1  set rocksdb option max_total_wal_size = 1073741824
debug    -37> 2021-06-07T17:21:01.617+0000 7fe0ee9d4080  1  set rocksdb option compaction_readahead_size = 2097152
debug    -36> 2021-06-07T17:21:01.617+0000 7fe0ee9d4080  1  set rocksdb option max_write_buffer_number = 4
debug    -35> 2021-06-07T17:21:01.617+0000 7fe0ee9d4080  1  set rocksdb option max_background_compactions = 2
debug    -34> 2021-06-07T17:21:01.617+0000 7fe0ee9d4080  1  set rocksdb option compression = kNoCompression
debug    -33> 2021-06-07T17:21:01.617+0000 7fe0ee9d4080  1  set rocksdb option writable_file_max_buffer_size = 0
debug    -32> 2021-06-07T17:21:01.618+0000 7fe0ee9d4080  1  set rocksdb option min_write_buffer_number_to_merge = 1
debug    -31> 2021-06-07T17:21:01.618+0000 7fe0ee9d4080  1  set rocksdb option recycle_log_file_num = 4
debug    -30> 2021-06-07T17:21:01.618+0000 7fe0ee9d4080  1  set rocksdb option write_buffer_size = 268435456
debug    -29> 2021-06-07T17:21:02.843+0000 7fe0ee9d4080  1  set rocksdb option max_total_wal_size = 1073741824
debug    -28> 2021-06-07T17:21:02.843+0000 7fe0ee9d4080  1  set rocksdb option compaction_readahead_size = 2097152
debug    -27> 2021-06-07T17:21:02.843+0000 7fe0ee9d4080  1  set rocksdb option max_write_buffer_number = 4
debug    -26> 2021-06-07T17:21:02.843+0000 7fe0ee9d4080  1  set rocksdb option max_background_compactions = 2
debug    -25> 2021-06-07T17:21:02.843+0000 7fe0ee9d4080  1  set rocksdb option compression = kNoCompression
debug    -24> 2021-06-07T17:21:02.843+0000 7fe0ee9d4080  1  set rocksdb option writable_file_max_buffer_size = 0
debug    -23> 2021-06-07T17:21:02.844+0000 7fe0ee9d4080  1  set rocksdb option min_write_buffer_number_to_merge = 1
debug    -22> 2021-06-07T17:21:02.844+0000 7fe0ee9d4080  1  set rocksdb option recycle_log_file_num = 4
debug    -21> 2021-06-07T17:21:02.844+0000 7fe0ee9d4080  1  set rocksdb option write_buffer_size = 268435456
debug    -20> 2021-06-07T17:21:02.844+0000 7fe0ee9d4080  1  set rocksdb option max_total_wal_size = 1073741824
debug    -19> 2021-06-07T17:21:02.844+0000 7fe0ee9d4080  1  set rocksdb option compaction_readahead_size = 2097152
debug    -18> 2021-06-07T17:21:02.844+0000 7fe0ee9d4080  1  set rocksdb option max_write_buffer_number = 4
debug    -17> 2021-06-07T17:21:02.844+0000 7fe0ee9d4080  1  set rocksdb option max_background_compactions = 2
debug    -16> 2021-06-07T17:21:02.844+0000 7fe0ee9d4080  1  set rocksdb option compression = kNoCompression
debug    -15> 2021-06-07T17:21:02.844+0000 7fe0ee9d4080  1  set rocksdb option writable_file_max_buffer_size = 0
debug    -14> 2021-06-07T17:21:02.844+0000 7fe0ee9d4080  1  set rocksdb option min_write_buffer_number_to_merge = 1
debug    -13> 2021-06-07T17:21:02.844+0000 7fe0ee9d4080  1  set rocksdb option recycle_log_file_num = 4
debug    -12> 2021-06-07T17:21:02.844+0000 7fe0ee9d4080  1  set rocksdb option write_buffer_size = 268435456
debug    -11> 2021-06-07T17:21:05.107+0000 7fe0ee9d4080  0 <cls> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/cls/cephfs/cls_cephfs.cc:201: loading cephfs
debug    -10> 2021-06-07T17:21:05.109+0000 7fe0ee9d4080  0 <cls> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/cls/hello/cls_hello.cc:316: loading cls_hello
debug     -9> 2021-06-07T17:21:05.111+0000 7fe0ee9d4080  0 _get_class not permitted to load kvs
debug     -8> 2021-06-07T17:21:05.114+0000 7fe0ee9d4080  0 _get_class not permitted to load lua
debug     -7> 2021-06-07T17:21:05.128+0000 7fe0ee9d4080  0 _get_class not permitted to load sdk
debug     -6> 2021-06-07T17:21:05.131+0000 7fe0ee9d4080  0 osd.7 236728 crush map has features 288514119978713088, adjusting msgr requires for clients
debug     -5> 2021-06-07T17:21:05.131+0000 7fe0ee9d4080  0 osd.7 236728 crush map has features 288514119978713088 was 8705, adjusting msgr requires for mons
debug     -4> 2021-06-07T17:21:05.131+0000 7fe0ee9d4080  0 osd.7 236728 crush map has features 3314933069571702784, adjusting msgr requires for osds
debug     -3> 2021-06-07T17:21:08.664+0000 7fe0ee9d4080  0 osd.7 236728 load_pgs
debug     -2> 2021-06-07T17:21:08.908+0000 7fe0d7c9f700 -1 rocksdb: submit_common error: Corruption: block checksum mismatch: expected 4021751865, got 3381438153  in db/007491.sst offset 57636 size 741633 code = 2 Rocksdb transaction:
PutCF( prefix = P key = 0x0000000000008284'.can_rollback_to' value size = 12)
PutCF( prefix = P key = 0x0000000000008284'.rollback_info_trimmed_to' value size = 12)
PutCF( prefix = O key = 0x82800000000000001AB0000000'!!='0xFFFFFFFFFFFFFFFEFFFFFFFFFFFFFFFF6F value size = 30)
PutCF( prefix = S key = 'nid_max' value size = 8)
PutCF( prefix = S key = 'blobid_max' value size = 8)
debug     -1> 2021-06-07T17:21:08.919+0000 7fe0d7c9f700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)' thread 7fe0d7c9f700 time 2021-06-07T17:21:08.909473+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/os/bluestore/BlueStore.cc: 11601: FAILED ceph_assert(r == 0)

 ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x562406b3c064]
 2: ceph-osd(+0x56927e) [0x562406b3c27e]
 3: (BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x45f) [0x56240716382f]
 4: (BlueStore::_kv_sync_thread()+0x16dc) [0x56240719c6fc]
 5: (BlueStore::KVSyncThread::entry()+0x11) [0x5624071c4b91]
 6: /lib64/libpthread.so.0(+0x814a) [0x7fe0ec73114a]
 7: clone()

debug      0> 2021-06-07T17:21:08.927+0000 7fe0d7c9f700 -1 *** Caught signal (Aborted) **
 in thread 7fe0d7c9f700 thread_name:bstore_kv_sync

 ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12b20) [0x7fe0ec73bb20]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x562406b3c0b5]
 5: ceph-osd(+0x56927e) [0x562406b3c27e]
 6: (BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x45f) [0x56240716382f]
 7: (BlueStore::_kv_sync_thread()+0x16dc) [0x56240719c6fc]
 8: (BlueStore::KVSyncThread::entry()+0x11) [0x5624071c4b91]
 9: /lib64/libpthread.so.0(+0x814a) [0x7fe0ec73114a]
 10: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

This seems to be the only relevant part of my config:

    [osd]
    # Needed to bypass default valuef failure (896M is the lowest)
    osd memory base = 100663296
    osd memory target = 939524096
    osd memory target cgroup limit ratio = 0.0

    bluestore cache autotune = false
    bluestore cache size hdd = 134217728
    bluestore cache size ssd = 67108864

    # https://tracker.ceph.com/issues/50656
    bluestore allocator = bitmap

There were also a bunch of crashes before this appeared:

{
    "archived": "2021-06-08 12:44:51.333438",
    "assert_condition": "!is_scrubbing()",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/PG.cc",
    "assert_func": "bool PG::sched_scrub()",
    "assert_line": 1339,
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/PG.cc: In function 'bool PG::sched_scrub()' thread 7fd905a7e700 time 2021-06-07T16:51:16.856046+0000\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/osd/PG.cc: 1339: FAILED ceph_assert(!is_scrubbing())\n",
    "assert_thread_name": "safe_timer",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12b20) [0x7fd90eaf4b20]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x55c3d221a0b5]",
        "ceph-osd(+0x56927e) [0x55c3d221a27e]",
        "(PG::sched_scrub()+0x561) [0x55c3d23ca7d1]",
        "(OSD::sched_scrub()+0x8e3) [0x55c3d2314633]",
        "(OSD::tick_without_osd_lock()+0x678) [0x55c3d2325fa8]",
        "(Context::complete(int)+0xd) [0x55c3d235974d]",
        "(SafeTimer::timer_thread()+0x1b7) [0x55c3d299fb07]",
        "(SafeTimerThread::entry()+0x11) [0x55c3d29a10e1]",
        "/lib64/libpthread.so.0(+0x814a) [0x7fd90eaea14a]",
        "clone()" 
    ],
    "ceph_version": "16.2.4",
    "crash_id": "2021-06-07T16:51:17.910297Z_cc0c9987-61ba-45f2-b9d2-a4c9a180a283",
    "entity_name": "osd.7",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-osd",
    "stack_sig": "e5c6203c14b6621da9fda8f2bdd2ee6b8585023de8b70ffeef68b075342749cf",
    "timestamp": "2021-06-07T16:51:17.910297Z",
    "utsname_hostname": "rook-ceph-osd-7-97684b598-d8zkz",
    "utsname_machine": "x86_64",
    "utsname_release": "5.8.12-200.fc32.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Mon Sep 28 12:17:31 UTC 2020" 
}

And then a lot of these:

{
    "archived": "2021-06-08 12:44:51.509393",
    "assert_condition": "r == 0",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/os/bluestore/BlueStore.cc",
    "assert_func": "void BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)",
    "assert_line": 11601,
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)' thread 7f8513d4b700 time 2021-06-07T16:58:26.313334+0000\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/os/bluestore/BlueStore.cc: 11601: FAILED ceph_assert(r == 0)\n",
    "assert_thread_name": "bstore_kv_sync",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12b20) [0x7f85287e7b20]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x563a580960b5]",
        "ceph-osd(+0x56927e) [0x563a5809627e]",
        "(BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x45f) [0x563a586bd82f]",
        "(BlueStore::_kv_sync_thread()+0x16dc) [0x563a586f66fc]",
        "(BlueStore::KVSyncThread::entry()+0x11) [0x563a5871eb91]",
        "/lib64/libpthread.so.0(+0x814a) [0x7f85287dd14a]",
        "clone()" 
    ],
    "ceph_version": "16.2.4",
    "crash_id": "2021-06-07T16:58:26.505453Z_1a9cbbf3-9906-4110-916f-f6b183900189",
    "entity_name": "osd.7",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-osd",
    "stack_sig": "a3b141a7ff14019694d6551ae1bff756bc7fb55f7dda44d04b022ae42c1be7cb",
    "timestamp": "2021-06-07T16:58:26.505453Z",
    "utsname_hostname": "rook-ceph-osd-7-97684b598-d8zkz",
    "utsname_machine": "x86_64",
    "utsname_release": "5.8.12-200.fc32.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Mon Sep 28 12:17:31 UTC 2020" 
}

I am deploying Ceph by using rook-ceph in Kubernetes. It belongs to an EC pool 4+1.


Files

ceph-osd.1442.log.gz (254 KB) ceph-osd.1442.log.gz Neha Ojha, 07/09/2021 10:44 PM

Related issues 1 (0 open1 closed)

Related to RADOS - Bug #51338: osd/scrub_machine.cc: FAILED ceph_assert(state_cast&lt;const NotActive*&gt;())Duplicate

Actions
Actions #1

Updated by Neha Ojha almost 3 years ago

similar

2021-07-03T06:51:08.746+0200 7fba950a3080 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/kv/RocksDBStore.cc: In function 'virtual int RocksDBStore::get(const string&, const string&, ceph::bufferlist*)' thread 7fba950a3080 time 2021-07-03T06:51:08.744960+0200
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.4/rpm/el8/BUILD/ceph-16.2.4/src/kv/RocksDBStore.cc: 1840: ceph_abort_msg("block checksum mismatch: expected 4252592570, got 1819148153  in db/120756.sst offset 1020683 size 3952")

 ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable)
 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe5) [0x55837d9697a4]
 2: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v15_2_0::list*)+0x3ec) [0x55837e4c4a1c]
 3: (BlueStore::omap_get_values(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > >*)+0x2f1) [0x55837df83641]
 4: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*)+0x3d5) [0x55837db0c735]
 5: (OSD::load_pgs()+0x90f) [0x55837da721cf]
 6: (OSD::init()+0x26f7) [0x55837da9f1b7]
 7: main()
 8: __libc_start_main()
 9: _start()
Actions #2

Updated by Neha Ojha almost 3 years ago

  • Related to Bug #51338: osd/scrub_machine.cc: FAILED ceph_assert(state_cast&lt;const NotActive*&gt;()) added
Actions #3

Updated by Telemetry Bot over 2 years ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
  • Affected Versions v16.2.0, v16.2.1, v16.2.5 added

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?orgId=1&var-sig_v2=109ab3ee85a3bc3337746eb0e056f21b397fcbad99e97dfa608e3080667f8744

Assert condition: r == 0
Assert function: void BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)

Sanitized backtrace:

    /lib64/libpthread.so.0(
    ceph-osd(
    BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)
    BlueStore::_kv_sync_thread()
    BlueStore::KVSyncThread::entry()
    /lib64/libpthread.so.0(
    clone()

Crash dump sample:
{
    "assert_condition": "r == 0",
    "assert_file": "os/bluestore/BlueStore.cc",
    "assert_func": "void BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)",
    "assert_line": 11603,
    "assert_msg": "os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)' thread 7f3d0ef32700 time 2021-08-09T20:33:07.148828+0000\nos/bluestore/BlueStore.cc: 11603: FAILED ceph_assert(r == 0)",
    "assert_thread_name": "bstore_kv_sync",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12b20) [0x7f3d221d3b20]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x560e8d2d3f0b]",
        "ceph-osd(+0x56a0d4) [0x560e8d2d40d4]",
        "(BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x45f) [0x560e8d8fb31f]",
        "(BlueStore::_kv_sync_thread()+0x16dc) [0x560e8d93450c]",
        "(BlueStore::KVSyncThread::entry()+0x11) [0x560e8d95cdf1]",
        "/lib64/libpthread.so.0(+0x814a) [0x7f3d221c914a]",
        "clone()" 
    ],
    "ceph_version": "16.2.5",
    "crash_id": "2021-08-09T20:33:07.162057Z_da07fd6b-ce84-4986-8939-99bc874a2108",
    "entity_name": "osd.72c850397be22043c1b0330cc36a398953933930",
    "os_id": "centos",
    "os_name": "CentOS Linux",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-osd",
    "stack_sig": "03648e85bd54b069c13692282b035d0f030786bfcaab3b1221178fd42f1130b8",
    "timestamp": "2021-08-09T20:33:07.162057Z",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.0-80-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#90-Ubuntu SMP Fri Jul 9 22:49:44 UTC 2021" 
}

Actions #4

Updated by Telemetry Bot about 2 years ago

  • Crash signature (v1) updated (diff)
  • Affected Versions v16.2.6, v16.2.7 added
Actions #5

Updated by Telemetry Bot about 2 years ago

  • Crash signature (v1) updated (diff)
  • Crash signature (v2) updated (diff)
Actions #6

Updated by jianwei zhang 3 months ago

similar: 16.2.6

/usr/bin/ceph-osd --cluster ceph -f -i 699 --setuser ceph --setgroup ceph
2024-01-31T13:34:39.802+0800 7fdb64dc5140 -1 Falling back to public interface log_submit_lat=0.000000 last_log_flush_lat=0.000052
2024-01-31T13:34:45.443+0800 7fdb64dc5140 -1 osd.699 0 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory log_submit_lat=0.000001 last_log_flush_lat=0.000004
2024-01-31T13:34:52.900+0800 7fdb4b8a8700 -1 rocksdb: submit_common error: Corruption: block checksum mismatch: stored = 3145323779, computed = 3833377101, type = 1  in db/000417.sst offset 11792861 size 3699 code = 2 Rocksdb transaction:
PutCF( prefix = O key = 0x7F8000000000001B9EF0000000'!!='0xFFFFFFFFFFFFFFFEFFFFFFFFFFFFFFFF6F value size = 30)
PutCF( prefix = S key = 'nid_max' value size = 8)
PutCF( prefix = S key = 'blobid_max' value size = 8) log_submit_lat=0.000000 last_log_flush_lat=0.000039
/home/gitlab-runner/builds/ppn7NR4N/0/eos/slicer/slicer-src/rpmbuild/BUILD/ceph-16.2.6-31/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)' thread 7fdb4b8a8700 time 2024-01-31T13:34:52.901575+0800
/home/gitlab-runner/builds/ppn7NR4N/0/eos/slicer/slicer-src/rpmbuild/BUILD/ceph-16.2.6-31/src/os/bluestore/BlueStore.cc: 11732: FAILED ceph_assert(r == 0)
 ceph version 16.2.6-31 (91120fbd2fd68d60ec50f9b33eeccb09ddb6500b) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x55f76ccfd47e]
 2: /usr/bin/ceph-osd(+0x639698) [0x55f76ccfd698]
 3: (BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x4af) [0x55f76d36cc4f]
 4: (BlueStore::_kv_sync_thread()+0x1787) [0x55f76d3a6d17]
 5: (BlueStore::KVSyncThread::entry()+0x11) [0x55f76d3d02b1]
 6: /lib64/libpthread.so.0(+0x814a) [0x7fdb628b814a]
 7: clone()
*** Caught signal (Aborted) **
 in thread 7fdb4b8a8700 thread_name:bstore_kv_sync
2024-01-31T13:34:52.903+0800 7fdb4b8a8700 -1 /home/gitlab-runner/builds/ppn7NR4N/0/eos/slicer/slicer-src/rpmbuild/BUILD/ceph-16.2.6-31/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)' thread 7fdb4b8a8700 time 2024-01-31T13:34:52.901575+0800
/home/gitlab-runner/builds/ppn7NR4N/0/eos/slicer/slicer-src/rpmbuild/BUILD/ceph-16.2.6-31/src/os/bluestore/BlueStore.cc: 11732: FAILED ceph_assert(r == 0)

 ceph version 16.2.6-31 (91120fbd2fd68d60ec50f9b33eeccb09ddb6500b) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x55f76ccfd47e]
 2: /usr/bin/ceph-osd(+0x639698) [0x55f76ccfd698]
 3: (BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x4af) [0x55f76d36cc4f]
 4: (BlueStore::_kv_sync_thread()+0x1787) [0x55f76d3a6d17]
 5: (BlueStore::KVSyncThread::entry()+0x11) [0x55f76d3d02b1]
 6: /lib64/libpthread.so.0(+0x814a) [0x7fdb628b814a]
 7: clone()
 log_submit_lat=0.000000 last_log_flush_lat=0.000031
 ceph version 16.2.6-31 (91120fbd2fd68d60ec50f9b33eeccb09ddb6500b) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12b20) [0x7fdb628c2b20]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x55f76ccfd4cf]
 5: /usr/bin/ceph-osd(+0x639698) [0x55f76ccfd698]
 6: (BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x4af) [0x55f76d36cc4f]
 7: (BlueStore::_kv_sync_thread()+0x1787) [0x55f76d3a6d17]
 8: (BlueStore::KVSyncThread::entry()+0x11) [0x55f76d3d02b1]
 9: /lib64/libpthread.so.0(+0x814a) [0x7fdb628b814a]
 10: clone()
2024-01-31T13:34:52.905+0800 7fdb4b8a8700 -1 *** Caught signal (Aborted) **
 in thread 7fdb4b8a8700 thread_name:bstore_kv_sync

 ceph version 16.2.6-31 (91120fbd2fd68d60ec50f9b33eeccb09ddb6500b) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12b20) [0x7fdb628c2b20]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x55f76ccfd4cf]
 5: /usr/bin/ceph-osd(+0x639698) [0x55f76ccfd698]
 6: (BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x4af) [0x55f76d36cc4f]
 7: (BlueStore::_kv_sync_thread()+0x1787) [0x55f76d3a6d17]
 8: (BlueStore::KVSyncThread::entry()+0x11) [0x55f76d3d02b1]
 9: /lib64/libpthread.so.0(+0x814a) [0x7fdb628b814a]
 10: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
 log_submit_lat=0.000000 last_log_flush_lat=0.000026
  -102> 2024-01-31T13:34:39.802+0800 7fdb64dc5140 -1 Falling back to public interface log_submit_lat=0.000000 last_log_flush_lat=0.000005
   -73> 2024-01-31T13:34:45.443+0800 7fdb64dc5140 -1 osd.699 0 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory log_submit_lat=0.000001 last_log_flush_lat=0.000003
    -2> 2024-01-31T13:34:52.900+0800 7fdb4b8a8700 -1 rocksdb: submit_common error: Corruption: block checksum mismatch: stored = 3145323779, computed = 3833377101, type = 1  in db/000417.sst offset 11792861 size 3699 code = 2 Rocksdb transaction:
PutCF( prefix = O key = 0x7F8000000000001B9EF0000000'!!='0xFFFFFFFFFFFFFFFEFFFFFFFFFFFFFFFF6F value size = 30)
PutCF( prefix = S key = 'nid_max' value size = 8)
PutCF( prefix = S key = 'blobid_max' value size = 8) log_submit_lat=0.000000 last_log_flush_lat=0.000005
    -1> 2024-01-31T13:34:52.903+0800 7fdb4b8a8700 -1 /home/gitlab-runner/builds/ppn7NR4N/0/eos/slicer/slicer-src/rpmbuild/BUILD/ceph-16.2.6-31/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)' thread 7fdb4b8a8700 time 2024-01-31T13:34:52.901575+0800
/home/gitlab-runner/builds/ppn7NR4N/0/eos/slicer/slicer-src/rpmbuild/BUILD/ceph-16.2.6-31/src/os/bluestore/BlueStore.cc: 11732: FAILED ceph_assert(r == 0)

 ceph version 16.2.6-31 (91120fbd2fd68d60ec50f9b33eeccb09ddb6500b) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x55f76ccfd47e]
 2: /usr/bin/ceph-osd(+0x639698) [0x55f76ccfd698]
 3: (BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x4af) [0x55f76d36cc4f]
 4: (BlueStore::_kv_sync_thread()+0x1787) [0x55f76d3a6d17]
 5: (BlueStore::KVSyncThread::entry()+0x11) [0x55f76d3d02b1]
 6: /lib64/libpthread.so.0(+0x814a) [0x7fdb628b814a]
 7: clone()
 log_submit_lat=0.000000 last_log_flush_lat=0.000008
     0> 2024-01-31T13:34:52.905+0800 7fdb4b8a8700 -1 *** Caught signal (Aborted) **
 in thread 7fdb4b8a8700 thread_name:bstore_kv_sync

 ceph version 16.2.6-31 (91120fbd2fd68d60ec50f9b33eeccb09ddb6500b) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12b20) [0x7fdb628c2b20]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x55f76ccfd4cf]
 5: /usr/bin/ceph-osd(+0x639698) [0x55f76ccfd698]
 6: (BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x4af) [0x55f76d36cc4f]
 7: (BlueStore::_kv_sync_thread()+0x1787) [0x55f76d3a6d17]
 8: (BlueStore::KVSyncThread::entry()+0x11) [0x55f76d3d02b1]
 9: /lib64/libpthread.so.0(+0x814a) [0x7fdb628b814a]
 10: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
 log_submit_lat=0.000000 last_log_flush_lat=0.000008
  -102> 2024-01-31T13:34:39.802+0800 7fdb64dc5140 -1 Falling back to public interface log_submit_lat=0.000000 last_log_flush_lat=0.000011
   -73> 2024-01-31T13:34:45.443+0800 7fdb64dc5140 -1 osd.699 0 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory log_submit_lat=0.000001 last_log_flush_lat=0.000003
    -2> 2024-01-31T13:34:52.900+0800 7fdb4b8a8700 -1 rocksdb: submit_common error: Corruption: block checksum mismatch: stored = 3145323779, computed = 3833377101, type = 1  in db/000417.sst offset 11792861 size 3699 code = 2 Rocksdb transaction:
PutCF( prefix = O key = 0x7F8000000000001B9EF0000000'!!='0xFFFFFFFFFFFFFFFEFFFFFFFFFFFFFFFF6F value size = 30)
PutCF( prefix = S key = 'nid_max' value size = 8)
PutCF( prefix = S key = 'blobid_max' value size = 8) log_submit_lat=0.000000 last_log_flush_lat=0.000003
    -1> 2024-01-31T13:34:52.903+0800 7fdb4b8a8700 -1 /home/gitlab-runner/builds/ppn7NR4N/0/eos/slicer/slicer-src/rpmbuild/BUILD/ceph-16.2.6-31/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)' thread 7fdb4b8a8700 time 2024-01-31T13:34:52.901575+0800
/home/gitlab-runner/builds/ppn7NR4N/0/eos/slicer/slicer-src/rpmbuild/BUILD/ceph-16.2.6-31/src/os/bluestore/BlueStore.cc: 11732: FAILED ceph_assert(r == 0)

 ceph version 16.2.6-31 (91120fbd2fd68d60ec50f9b33eeccb09ddb6500b) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x55f76ccfd47e]
 2: /usr/bin/ceph-osd(+0x639698) [0x55f76ccfd698]
 3: (BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x4af) [0x55f76d36cc4f]
 4: (BlueStore::_kv_sync_thread()+0x1787) [0x55f76d3a6d17]
 5: (BlueStore::KVSyncThread::entry()+0x11) [0x55f76d3d02b1]
 6: /lib64/libpthread.so.0(+0x814a) [0x7fdb628b814a]
 7: clone()
 log_submit_lat=0.000000 last_log_flush_lat=0.000007
     0> 2024-01-31T13:34:52.905+0800 7fdb4b8a8700 -1 *** Caught signal (Aborted) **
 in thread 7fdb4b8a8700 thread_name:bstore_kv_sync

 ceph version 16.2.6-31 (91120fbd2fd68d60ec50f9b33eeccb09ddb6500b) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12b20) [0x7fdb628c2b20]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x55f76ccfd4cf]
 5: /usr/bin/ceph-osd(+0x639698) [0x55f76ccfd698]
 6: (BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x4af) [0x55f76d36cc4f]
 7: (BlueStore::_kv_sync_thread()+0x1787) [0x55f76d3a6d17]
 8: (BlueStore::KVSyncThread::entry()+0x11) [0x55f76d3d02b1]
 9: /lib64/libpthread.so.0(+0x814a) [0x7fdb628b814a]
 10: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Actions

Also available in: Atom PDF