Project

General

Profile

Actions

Bug #57507

closed

rocksdb crushed due to checksum mismatch

Added by Satoru Takeuchi over 1 year ago. Updated 12 months ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
container
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

OSD crushed with the following log.

```
2022-09-09 03:03:05 debug 2022-09-08T18:03:05.665+0000 7f236763e080 1 osd.64 148517 check_osdmap_features require_osd_release unknown -> pacific
2022-09-09 03:03:05 debug 2022-09-08T18:03:05.868+0000 7f236763e080 0 osd.64 148517 load_pgs
2022-09-09 03:03:09 debug 2022-09-08T18:03:09.050+0000 7f235cb35700 4 rocksdb: [compaction/compaction_job.cc:1344] [P] [JOB 8] Generated table #6894: 314528 keys, 68954507 bytes
2022-09-09 03:03:09 debug 2022-09-08T18:03:09.050+0000 7f235cb35700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1662660189052367, "cf_name": "P", "job": 8, "event": "table_file_creation", "file_number": 6894, "file_size": 68954507, "table_properties": {"data_size": 67111733, "index_size": 1055545, "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key": 0, "index_value_is_delta_encoded": 0, "filter_size": 786373, "raw_key_size": 15472504, "raw_average_key_size": 49, "raw_value_size": 61783006, "raw_average_value_size": 196, "num_data_blocks": 17354, "num_entries": 314528, "num_deletions": 0, "num_merge_operands": 0, "num_range_deletions": 0, "format_version": 0, "fixed_key_len": 0, "filter_policy": "rocksdb.BuiltinBloomFilter", "column_family_name": "P", "column_family_id": 11, "comparator": "leveldb.BytewiseComparator", "merge_operator": "nullptr", "prefix_extractor_name": "nullptr", "property_collectors": "[]", "compression": "NoCompression", "compression_options": "window_bits=-14; level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time": 1645012412, "oldest_key_time": 0, "file_creation_time": 1662660185}}
2022-09-09 03:03:11 debug 2022-09-08T18:03:11.367+0000 7f235cb35700 3 rocksdb: [db_impl/db_impl_compaction_flush.cc:2807] Compaction error: Corruption: block checksum mismatch: expected 3279067714, got 2324967102 in db/006737.sst offset 11608690 size 4045
2022-09-09 03:03:11 debug 2022-09-08T18:03:11.367+0000 7f235cb35700 4 rocksdb: (Original Log Time 2022/09/08-18:03:11.367788) [compaction/compaction_job.cc:743] [P] compacted to: files[9 2 0 0 0 0 0] max score 0.00, MB/sec: 34.8 rd, 11.7 wr, level 1, files in(9, 2) out(2) MB in(82.1, 113.8) out(65.8), read-write-amplify(3.2) write-amplify(0.8) Corruption: block checksum mismatch: expected 3279067714, got 2324967102 in db/006737.sst offset 11608690 size 4045, records in: 2169837, records dropped: 1759227 output_compression: NoCompression
2022-09-09 03:03:11 2022-09-09 03:03:11 debug 2022-09-08T18:03:11.367+0000 7f235cb35700 4 rocksdb: (Original Log Time 2022/09/08-18:03:11.367858) EVENT_LOG_v1 {"time_micros": 1662660191367832, "job": 8, "event": "compaction_finished", "compaction_time_micros": 5902082, "compaction_time_cpu_micros": 2322121, "output_level": 1, "num_output_files": 2, "total_output_size": 89042848, "num_input_records": 2169837, "num_output_records": 410610, "num_subcompactions": 1, "output_compression": "NoCompression", "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [9, 2, 0, 0, 0, 0, 0]}
2022-09-09 03:03:11 debug 2022-09-08T18:03:11.367+0000 7f235cb35700 2 rocksdb: [db_impl/db_impl_compaction_flush.cc:2341] Waiting after background compaction error: Corruption: block checksum mismatch: expected 3279067714, got 2324967102 in db/006737.sst offset 11608690 size 4045, Accumulated background error counts: 1
2022-09-09 03:03:11 debug 2022-09-08T18:03:11.367+0000 7f2357b1c700 -1 rocksdb: submit_common error: Corruption: block checksum mismatch: expected 3279067714, got 2324967102 in db/006737.sst offset 11608690 size 4045 code = 2 Rocksdb transaction:
2022-09-09 03:03:11 PutCF( prefix = O key = 0x7F800000000000000921600000'!!='0xFFFFFFFFFFFFFFFEFFFFFFFFFFFFFFFF6F value size = 31)
2022-09-09 03:03:11 /root/project/src/ceph/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)' thread 7f2357b1c700 time 2022-09-08T18:03:11.368168+0000
2022-09-09 03:03:11 /root/project/src/ceph/src/os/bluestore/BlueStore.cc: 11942: FAILED ceph_assert(r 0)
2022-09-09 03:03:11 ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
2022-09-09 03:03:11 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x5602458fcb95]
2022-09-09 03:03:11 2: ceph-osd(+0xacdd9d) [0x5602458fcd9d]
2022-09-09 03:03:11 3: (BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x5d5) [0x560245f87d65]
2022-09-09 03:03:11 4: (BlueStore::_kv_sync_thread()+0x12e3) [0x560245febed3]
2022-09-09 03:03:11 5: (BlueStore::KVSyncThread::entry()+0x11) [0x560246019181]
2022-09-09 03:03:11 6: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f2367cda609]
2022-09-09 03:03:11 7: clone()
2022-09-09 03:03:11 debug 2022-09-08T18:03:11.373+0000 7f2357b1c700 -1 /root/project/src/ceph/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)' thread 7f2357b1c700 time 2022-09-08T18:03:11.368168+0000
2022-09-09 03:03:11 /root/project/src/ceph/src/os/bluestore/BlueStore.cc: 11942: FAILED ceph_assert(r 0)
2022-09-09 03:03:11 2022-09-09 03:03:11 ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
2022-09-09 03:03:11 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x5602458fcb95]
2022-09-09 03:03:11 2: ceph-osd(+0xacdd9d) [0x5602458fcd9d]
2022-09-09 03:03:11 3: (BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x5d5) [0x560245f87d65]
2022-09-09 03:03:11 4: (BlueStore::_kv_sync_thread()+0x12e3) [0x560245febed3]
2022-09-09 03:03:11 5: (BlueStore::KVSyncThread::entry()+0x11) [0x560246019181]
2022-09-09 03:03:11 6: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f2367cda609]
2022-09-09 03:03:11 7: clone()
2022-09-09 03:03:11 2022-09-09 03:03:11 * Caught signal (Aborted) *
2022-09-09 03:03:11 in thread 7f2357b1c700 thread_name:bstore_kv_sync
2022-09-09 03:03:11 ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
2022-09-09 03:03:11 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7f2367ce6420]
2022-09-09 03:03:11 2: gsignal()
2022-09-09 03:03:11 3: abort()
2022-09-09 03:03:11 4: (ceph::__ceph_assert_fail(char const
, char const*, int, char const*)+0x1ad) [0x5602458fcbf0]
2022-09-09 03:03:11 5: ceph-osd(+0xacdd9d) [0x5602458fcd9d]
2022-09-09 03:03:11 6: (BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x5d5) [0x560245f87d65]
2022-09-09 03:03:11 7: (BlueStore::_kv_sync_thread()+0x12e3) [0x560245febed3]
2022-09-09 03:03:11 8: (BlueStore::KVSyncThread::entry()+0x11) [0x560246019181]
2022-09-09 03:03:11 9: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f2367cda609]
2022-09-09 03:03:11 10: clone()
2022-09-09 03:03:11 debug 2022-09-08T18:03:11.379+0000 7f2357b1c700 -1
Caught signal (Aborted) *
2022-09-09 03:03:11 in thread 7f2357b1c700 thread_name:bstore_kv_sync
2022-09-09 03:03:11 2022-09-09 03:03:11 ceph version 16.2.10 (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
2022-09-09 03:03:11 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7f2367ce6420]
2022-09-09 03:03:11 2: gsignal()
2022-09-09 03:03:11 3: abort()
2022-09-09 03:03:11 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1ad) [0x5602458fcbf0]
2022-09-09 03:03:11 5: ceph-osd(+0xacdd9d) [0x5602458fcd9d]
2022-09-09 03:03:11 6: (BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x5d5) [0x560245f87d65]
2022-09-09 03:03:11 7: (BlueStore::_kv_sync_thread()+0x12e3) [0x560245febed3]
2022-09-09 03:03:11 8: (BlueStore::KVSyncThread::entry()+0x11) [0x560246019181]
2022-09-09 03:03:11 9: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f2367cda609]
2022-09-09 03:03:11 10: clone()
2022-09-09 03:03:11 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
```

I attached more logs in this ticket.

I hit this problem in rook/ceph.

- rook: v1.9.6
- ceph: v16.2.10


Files

Explore-logs-2022-09-12 15_34_00.zip (58 KB) Explore-logs-2022-09-12 15_34_00.zip Satoru Takeuchi, 09/13/2022 01:18 AM
Actions

Also available in: Atom PDF