Project

General

Profile

Actions

Bug #24968

closed

Compaction error: Corruption: block checksum mismatch

Added by Markus Stockhausen almost 6 years ago. Updated almost 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm unning ceph luminous 12.2.5 for a few weeks now. Unitl now only very light usage. Today we started our first cephfs test and several OSDs kicked out. Error is always the same in the logs:

2018-07-17 11:58:07.241772 7f91a1fd6700  4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/rocksdb/db/compaction_job.cc:1407] [default] Compaction start summary: Base version 4 Base level 0, inputs: [18(4795KB) 16(38MB) 10(1391B) 4(1456B)]

2018-07-17 11:58:07.241788 7f91a1fd6700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1531821487241776, "job": 5, "event": "compaction_started", "files_L0": [18, 16, 10, 4], "score": 1, "input_data_size": 45278547}
2018-07-17 11:58:07.272825 7f91a1fd6700  3 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/rocksdb/db/db_impl_compaction_flush.cc:1591] Compaction error: Corruption: block checksum mismatch
2018-07-17 11:58:07.272837 7f91a1fd6700  4 rocksdb: (Original Log Time 2018/07/17-11:58:07.272790) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/rocksdb/db/compaction_job.cc:621] [default] compacted to: base level 1 max bytes base 268435456 files[4 0 0 0 0 0 0] max score 0.00, MB/sec: 1461.6 rd, 20.0 wr, level 1, files in(4, 0) out(1) MB in(43.2, 0.0) out(0.6), read-write-amplify(1.0) write-amplify(0.0) Corruption: block checksum mismatch, records in: 26
2018-07-17 11:58:07.272841 7f91a1fd6700  4 rocksdb: (Original Log Time 2018/07/17-11:58:07.272818) EVENT_LOG_v1 {"time_micros": 1531821487272805, "job": 5, "event": "compaction_finished", "compaction_time_micros": 30978, "output_level": 1, "num_output_files": 1, "total_output_size": 619802, "num_input_records": 1586, "num_output_records": 1008, "num_subcompactions": 1, "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [4, 0, 0, 0, 0, 0, 0]}
2018-07-17 11:58:07.272843 7f91a1fd6700  2 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/rocksdb/db/db_impl_compaction_flush.cc:1275] Waiting after background compaction error: Corruption: block checksum mismatch, Accumulated background error counts: 1
2018-07-17 11:58:08.077835 7f91b1ff6700 -1 rocksdb: submit_transaction error: Corruption: block checksum mismatch code = 2 Rocksdb transaction:
Put( Prefix = M key = 0x00000000000005e5'.0000000480.00000000000000000523' Value size = 164)
Put( Prefix = M key = 0x00000000000005e5'._fastinfo' Value size = 186)
Delete( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff6f00000000'x')
Delete( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff6f00080000'x')
Delete( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff6f00100000'x')
Put( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff6f00000000'x' Value size = 755)
Put( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff6f00070000'x' Value size = 11)
Put( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff6f000a0000'x' Value size = 14)
Put( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff6f000b0000'x' Value size = 12)
Put( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff6f00120000'x' Value size = 13)
Put( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff6f00180000'x' Value size = 228)
Put( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff6f001b0000'x' Value size = 9)
Put( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff'o' Value size = 3056)

We are still working test servers so no problem at all. Nevertheless I have no idea what is going wrong. I tested the OSD with ceph-bluestore-tool --log-level 30 --path /var/lib/ceph/osd/ceph-5 fsck. Result is

Maintenance mode [root@cfiler102 ceph]# ceph-bluestore-tool --log-level 30 --path /var/lib/ceph/osd/ceph-5 fsck
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_kv_sync_thread()' thread 7fefcae79700 time 2018-07-17 21:04:23.282436
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/os/bluestore/BlueStore.cc: 8626: FAILED assert(r == 0)
2018-07-17 21:04:23.282367 7fefcae79700 -1 rocksdb: submit_transaction_sync error: Corruption: block checksum mismatch code = 2 Rocksdb transaction:
Put( Prefix = S key = 'nid_max' Value size = 8)
Put( Prefix = S key = 'blobid_max' Value size = 8)
SingleDelete(Prefix = L Key = 0x0000000000002a18)
SingleDelete(Prefix = L Key = 0x0000000000002a19)
SingleDelete(Prefix = L Key = 0x0000000000002a1a)
SingleDelete(Prefix = L Key = 0x0000000000002a1b)
SingleDelete(Prefix = L Key = 0x0000000000002a1c)
SingleDelete(Prefix = L Key = 0x0000000000002a1d)
SingleDelete(Prefix = L Key = 0x0000000000002a1e)
SingleDelete(Prefix = L Key = 0x0000000000002a1f)
SingleDelete(Prefix = L Key = 0x0000000000002a20)
SingleDelete(Prefix = L Key = 0x0000000000002a21)
SingleDelete(Prefix = L Key = 0x0000000000002a22)
SingleDelete(Prefix = L Key = 0x0000000000002a23)
SingleDelete(Prefix = L Key = 0x0000000000002a24)
SingleDelete(Prefix = L Key = 0x0000000000002a25)
SingleDelete(Prefix = L Key = 0x0000000000002a26)
SingleDelete(Prefix = L Key = 0x0000000000002a27)
SingleDelete(Prefix = L Key = 0x0000000000002a28)
SingleDelete(Prefix = L Key = 0x0000000000002a29)
 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7fefd2238d50]
 2: (BlueStore::_kv_sync_thread()+0x34bf) [0x55d279c0e2ff]
 3: (BlueStore::KVSyncThread::entry()+0xd) [0x55d279c5c4ad]
 4: (()+0x7e25) [0x7fefd0afee25]
 5: (clone()+0x6d) [0x7fefcf5b9bad]
...

This OSD details:

  • OSD.5 is the device in question
  • /dev/sdh is the real disk
  • /dev/fiob6 is the Rocksdb FusionIO PCIe SSD

Looking at the strace output it seems as the check/access of the externally stored rocksdb is failing.

...
open("/var/lib/ceph/osd/ceph-5/block", O_RDWR|O_DIRECT) = 5
...
open("/dev/fiob6", O_RDWR|O_DIRECT)     = 7
...
pread64(7, "\2\0\0027\2\22\2\10\2\36\2\0\2\0\2(\2\0\2\3766\2\6Y\305\321\2\0.\32\3\2"..., 12288, 1462272) = 12288
pread64(7, "b\362/\27\6\7\2\25\2\10\2\16\2\376/\2\0\4\f\303\377%A\2\2\0\5\263\2\0\255{"..., 8192, 1470464) = 8192
pread64(7, "\33\2\245\2\7\21\306]\10\2\0\10EH\n\2\0I\4\2\0\31\2\0\2\1\0\fL\246\371\203"..., 8192, 1474560) = 8192
pread64(7, "ginfo\0B\10\0\0\0\0\0\f\r\0epoch\0C\10\0\0\0\0\0\f\f\0"..., 8192, 11841536) = 8192
pread64(7, "\0ver\0k\7\0\0\0\0\0\v&\0may_include_delet"..., 8192, 11845632) = 8192
io_submit(140568734662656, 17, [{pwritev, fildes=5, iovec=[{"h\2544\0l\2544\0p\2544\0t\2544\0x\2544\0\204\2544\0\210\2544\0\214\2544\0"..., 28672}], offset=110595211264}, {pwritev, fildes=5, iovec=[{"r\253\0Lq\253\0\314s\253\0\315s\253\0\315t\253\0\315m\253\0\315t\253\0Mm\253\0\315"..., 24576}], offset=110596825088}, {pwritev, fildes=5, iovec=[{"\365g\1\314\364g\1\315Bh\1L;h\1\314\375g\1\314\376g\1\315\376g\1\315\373g\1L"..., 20480}], offset=110597091328}, {pwritev, fildes=5, iovec=[{"\335\1\2\315\335\1\2\315\335\1\2\315\332\1\2L\333\1\2L\334\1\2\314\337\1\2L\340\1\2\314"..., 65536}], offset=110597308416}, {pwritev, fildes=5, iovec=[{"\357\252\6\315\360\252\6M\357\252\6\315\356\252\6L\355\252\6L\360\252\6L\360\252\6\315\356\252\6L"..., 65536}], offset=110597570560}, {pwritev, fildes=5, iovec=[{"Hq\10\0Lq\10\0Pq\10\0Tq\10\0Xq\10\0\\q\10\0`q\10\0|q\10\0"..., 28672}], offset=110597869568}, {pwritev, fildes=5, iovec=[{"\\\313\20\0`\313\20\0d\313\20\0h\313\20\0\230\313\20\0\234\313\20\0\240\313\20\0\244\313\20\0"..., 24576}], offset=110598135808}, {pwritev, fildes=5, iovec=[{"\304\306\27\0\314\306\27\0\324\306\27\0\334\306\27\0\344\306\27\0\354\306\27\0\364\306\27\0\374\306\27\0"..., 28672}], offset=110598356992}, {pwritev, fildes=5, iovec=[{"\220\177\2\0\224\177\2\0\230\177\2\0\244\177\2\0\250\177\2\0\254\177\2\0\260\177\2\0\264\177\2\0"..., 65536}], offset=110598422528}, {pwritev, fildes=5, iovec=[{"\274\330\10\0\300\330\10\0\304\330\10\0\310\330\10\0\314\330\10\0\320\330\10\0\324\330\10\0\330\330\10\0"..., 24576}], offset=110598725632}, {pwritev, fildes=5, iovec=[{"MZ\220\0\3\0\0\0\4\0\0\0\377\377\0\0\270\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 16384}], offset=110598930432}, {pwritev, fildes=5, iovec=[{"$\372\37\0@\372\37\0D\372\37\0H\372\37\0P\372\37\0T\372\37\0X\372\37\0\200\372\37\0"..., 65536}], offset=110599143424}, {pwritev, fildes=5, iovec=[{"Dq)\0Hq)\0\200q)\0\204q)\0\210q)\0\214q)\0\220q)\0\224q)\0"..., 28672}], offset=110599639040}, {pwritev, fildes=5, iovec=[{"\354\3030\0\360\3030\0\364\3030\0\370\3030\0\374\3030\0\4\3040\0\10\3040\0\f\3040\0"..., 24576}], offset=110599905280}, {pwritev, fildes=5, iovec=[{"\230D\1\246\363\256\300\2\0\214+\376\3\4\276\253\223\7\16-\6\v\23\6\1\314\2\0\213\216\342\4"..., 12288}], offset=110600310784}, {pwritev, fildes=5, iovec=[{"\213\2\374X}\2w\370\2\2\0\37(\0069\2*\10\4\376\r\4\0052j9\2\35/\2\306\36"..., 8192}], offset=110600577024}, {pwritev, fildes=5, iovec=[{"\2\364\300\376@\t\215\2436\v\r\27X(\2/\2\3661\313\v\32\n\2\0\t\3N\5\3K\2"..., 4096}], offset=110600843264}]) = 17
futex(0x7ffd76da6c3c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7ffd76da6c38, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7ffd76da6e5c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7ffd76da6e58, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x562fafe31eec, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x562fafe31ec0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x562fafe31eec, FUTEX_WAIT_PRIVATE, 3, NULL/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_kv_sync_thread()' thread 7fd8a4968700 time 2018-07-17 21:07:19.635580
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/os/bluestore/BlueStore.cc: 8626: FAILED assert(r == 0)
2018-07-17 21:07:19.635526 7fd8a4968700 -1 rocksdb: submit_transaction_sync error: Corruption: block checksum mismatch code = 2 Rocksdb transaction:

Full logs attached. Don't hesitate to contact me for further details.


Files

fschk.txt (7.51 KB) fschk.txt Bluestore check Markus Stockhausen, 07/17/2018 07:15 PM
ceph-osd.5.log.zip (589 KB) ceph-osd.5.log.zip OSD log Markus Stockhausen, 07/17/2018 07:17 PM
out (841 KB) out detail check log Markus Stockhausen, 07/17/2018 07:23 PM
osd24.fsck.zip (227 KB) osd24.fsck.zip 2nd error fsck Markus Stockhausen, 07/21/2018 07:29 PM
ceph-osd.24.log.txt (77.7 KB) ceph-osd.24.log.txt 2nd error osd start Markus Stockhausen, 07/21/2018 07:29 PM
Actions

Also available in: Atom PDF