Project

General

Profile

Actions

Bug #24968

closed

Compaction error: Corruption: block checksum mismatch

Added by Markus Stockhausen almost 6 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I'm unning ceph luminous 12.2.5 for a few weeks now. Unitl now only very light usage. Today we started our first cephfs test and several OSDs kicked out. Error is always the same in the logs:

2018-07-17 11:58:07.241772 7f91a1fd6700  4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/rocksdb/db/compaction_job.cc:1407] [default] Compaction start summary: Base version 4 Base level 0, inputs: [18(4795KB) 16(38MB) 10(1391B) 4(1456B)]

2018-07-17 11:58:07.241788 7f91a1fd6700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1531821487241776, "job": 5, "event": "compaction_started", "files_L0": [18, 16, 10, 4], "score": 1, "input_data_size": 45278547}
2018-07-17 11:58:07.272825 7f91a1fd6700  3 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/rocksdb/db/db_impl_compaction_flush.cc:1591] Compaction error: Corruption: block checksum mismatch
2018-07-17 11:58:07.272837 7f91a1fd6700  4 rocksdb: (Original Log Time 2018/07/17-11:58:07.272790) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/rocksdb/db/compaction_job.cc:621] [default] compacted to: base level 1 max bytes base 268435456 files[4 0 0 0 0 0 0] max score 0.00, MB/sec: 1461.6 rd, 20.0 wr, level 1, files in(4, 0) out(1) MB in(43.2, 0.0) out(0.6), read-write-amplify(1.0) write-amplify(0.0) Corruption: block checksum mismatch, records in: 26
2018-07-17 11:58:07.272841 7f91a1fd6700  4 rocksdb: (Original Log Time 2018/07/17-11:58:07.272818) EVENT_LOG_v1 {"time_micros": 1531821487272805, "job": 5, "event": "compaction_finished", "compaction_time_micros": 30978, "output_level": 1, "num_output_files": 1, "total_output_size": 619802, "num_input_records": 1586, "num_output_records": 1008, "num_subcompactions": 1, "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [4, 0, 0, 0, 0, 0, 0]}
2018-07-17 11:58:07.272843 7f91a1fd6700  2 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/rocksdb/db/db_impl_compaction_flush.cc:1275] Waiting after background compaction error: Corruption: block checksum mismatch, Accumulated background error counts: 1
2018-07-17 11:58:08.077835 7f91b1ff6700 -1 rocksdb: submit_transaction error: Corruption: block checksum mismatch code = 2 Rocksdb transaction:
Put( Prefix = M key = 0x00000000000005e5'.0000000480.00000000000000000523' Value size = 164)
Put( Prefix = M key = 0x00000000000005e5'._fastinfo' Value size = 186)
Delete( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff6f00000000'x')
Delete( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff6f00080000'x')
Delete( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff6f00100000'x')
Put( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff6f00000000'x' Value size = 755)
Put( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff6f00070000'x' Value size = 11)
Put( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff6f000a0000'x' Value size = 14)
Put( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff6f000b0000'x' Value size = 12)
Put( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff6f00120000'x' Value size = 13)
Put( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff6f00180000'x' Value size = 228)
Put( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff6f001b0000'x' Value size = 9)
Put( Prefix = O key = 0x7f80000000000000026e0da9' !10000000055.0000065c!='0xfffffffffffffffeffffffffffffffff'o' Value size = 3056)

We are still working test servers so no problem at all. Nevertheless I have no idea what is going wrong. I tested the OSD with ceph-bluestore-tool --log-level 30 --path /var/lib/ceph/osd/ceph-5 fsck. Result is

Maintenance mode [root@cfiler102 ceph]# ceph-bluestore-tool --log-level 30 --path /var/lib/ceph/osd/ceph-5 fsck
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_kv_sync_thread()' thread 7fefcae79700 time 2018-07-17 21:04:23.282436
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/os/bluestore/BlueStore.cc: 8626: FAILED assert(r == 0)
2018-07-17 21:04:23.282367 7fefcae79700 -1 rocksdb: submit_transaction_sync error: Corruption: block checksum mismatch code = 2 Rocksdb transaction:
Put( Prefix = S key = 'nid_max' Value size = 8)
Put( Prefix = S key = 'blobid_max' Value size = 8)
SingleDelete(Prefix = L Key = 0x0000000000002a18)
SingleDelete(Prefix = L Key = 0x0000000000002a19)
SingleDelete(Prefix = L Key = 0x0000000000002a1a)
SingleDelete(Prefix = L Key = 0x0000000000002a1b)
SingleDelete(Prefix = L Key = 0x0000000000002a1c)
SingleDelete(Prefix = L Key = 0x0000000000002a1d)
SingleDelete(Prefix = L Key = 0x0000000000002a1e)
SingleDelete(Prefix = L Key = 0x0000000000002a1f)
SingleDelete(Prefix = L Key = 0x0000000000002a20)
SingleDelete(Prefix = L Key = 0x0000000000002a21)
SingleDelete(Prefix = L Key = 0x0000000000002a22)
SingleDelete(Prefix = L Key = 0x0000000000002a23)
SingleDelete(Prefix = L Key = 0x0000000000002a24)
SingleDelete(Prefix = L Key = 0x0000000000002a25)
SingleDelete(Prefix = L Key = 0x0000000000002a26)
SingleDelete(Prefix = L Key = 0x0000000000002a27)
SingleDelete(Prefix = L Key = 0x0000000000002a28)
SingleDelete(Prefix = L Key = 0x0000000000002a29)
 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7fefd2238d50]
 2: (BlueStore::_kv_sync_thread()+0x34bf) [0x55d279c0e2ff]
 3: (BlueStore::KVSyncThread::entry()+0xd) [0x55d279c5c4ad]
 4: (()+0x7e25) [0x7fefd0afee25]
 5: (clone()+0x6d) [0x7fefcf5b9bad]
...

This OSD details:

  • OSD.5 is the device in question
  • /dev/sdh is the real disk
  • /dev/fiob6 is the Rocksdb FusionIO PCIe SSD

Looking at the strace output it seems as the check/access of the externally stored rocksdb is failing.

...
open("/var/lib/ceph/osd/ceph-5/block", O_RDWR|O_DIRECT) = 5
...
open("/dev/fiob6", O_RDWR|O_DIRECT)     = 7
...
pread64(7, "\2\0\0027\2\22\2\10\2\36\2\0\2\0\2(\2\0\2\3766\2\6Y\305\321\2\0.\32\3\2"..., 12288, 1462272) = 12288
pread64(7, "b\362/\27\6\7\2\25\2\10\2\16\2\376/\2\0\4\f\303\377%A\2\2\0\5\263\2\0\255{"..., 8192, 1470464) = 8192
pread64(7, "\33\2\245\2\7\21\306]\10\2\0\10EH\n\2\0I\4\2\0\31\2\0\2\1\0\fL\246\371\203"..., 8192, 1474560) = 8192
pread64(7, "ginfo\0B\10\0\0\0\0\0\f\r\0epoch\0C\10\0\0\0\0\0\f\f\0"..., 8192, 11841536) = 8192
pread64(7, "\0ver\0k\7\0\0\0\0\0\v&\0may_include_delet"..., 8192, 11845632) = 8192
io_submit(140568734662656, 17, [{pwritev, fildes=5, iovec=[{"h\2544\0l\2544\0p\2544\0t\2544\0x\2544\0\204\2544\0\210\2544\0\214\2544\0"..., 28672}], offset=110595211264}, {pwritev, fildes=5, iovec=[{"r\253\0Lq\253\0\314s\253\0\315s\253\0\315t\253\0\315m\253\0\315t\253\0Mm\253\0\315"..., 24576}], offset=110596825088}, {pwritev, fildes=5, iovec=[{"\365g\1\314\364g\1\315Bh\1L;h\1\314\375g\1\314\376g\1\315\376g\1\315\373g\1L"..., 20480}], offset=110597091328}, {pwritev, fildes=5, iovec=[{"\335\1\2\315\335\1\2\315\335\1\2\315\332\1\2L\333\1\2L\334\1\2\314\337\1\2L\340\1\2\314"..., 65536}], offset=110597308416}, {pwritev, fildes=5, iovec=[{"\357\252\6\315\360\252\6M\357\252\6\315\356\252\6L\355\252\6L\360\252\6L\360\252\6\315\356\252\6L"..., 65536}], offset=110597570560}, {pwritev, fildes=5, iovec=[{"Hq\10\0Lq\10\0Pq\10\0Tq\10\0Xq\10\0\\q\10\0`q\10\0|q\10\0"..., 28672}], offset=110597869568}, {pwritev, fildes=5, iovec=[{"\\\313\20\0`\313\20\0d\313\20\0h\313\20\0\230\313\20\0\234\313\20\0\240\313\20\0\244\313\20\0"..., 24576}], offset=110598135808}, {pwritev, fildes=5, iovec=[{"\304\306\27\0\314\306\27\0\324\306\27\0\334\306\27\0\344\306\27\0\354\306\27\0\364\306\27\0\374\306\27\0"..., 28672}], offset=110598356992}, {pwritev, fildes=5, iovec=[{"\220\177\2\0\224\177\2\0\230\177\2\0\244\177\2\0\250\177\2\0\254\177\2\0\260\177\2\0\264\177\2\0"..., 65536}], offset=110598422528}, {pwritev, fildes=5, iovec=[{"\274\330\10\0\300\330\10\0\304\330\10\0\310\330\10\0\314\330\10\0\320\330\10\0\324\330\10\0\330\330\10\0"..., 24576}], offset=110598725632}, {pwritev, fildes=5, iovec=[{"MZ\220\0\3\0\0\0\4\0\0\0\377\377\0\0\270\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 16384}], offset=110598930432}, {pwritev, fildes=5, iovec=[{"$\372\37\0@\372\37\0D\372\37\0H\372\37\0P\372\37\0T\372\37\0X\372\37\0\200\372\37\0"..., 65536}], offset=110599143424}, {pwritev, fildes=5, iovec=[{"Dq)\0Hq)\0\200q)\0\204q)\0\210q)\0\214q)\0\220q)\0\224q)\0"..., 28672}], offset=110599639040}, {pwritev, fildes=5, iovec=[{"\354\3030\0\360\3030\0\364\3030\0\370\3030\0\374\3030\0\4\3040\0\10\3040\0\f\3040\0"..., 24576}], offset=110599905280}, {pwritev, fildes=5, iovec=[{"\230D\1\246\363\256\300\2\0\214+\376\3\4\276\253\223\7\16-\6\v\23\6\1\314\2\0\213\216\342\4"..., 12288}], offset=110600310784}, {pwritev, fildes=5, iovec=[{"\213\2\374X}\2w\370\2\2\0\37(\0069\2*\10\4\376\r\4\0052j9\2\35/\2\306\36"..., 8192}], offset=110600577024}, {pwritev, fildes=5, iovec=[{"\2\364\300\376@\t\215\2436\v\r\27X(\2/\2\3661\313\v\32\n\2\0\t\3N\5\3K\2"..., 4096}], offset=110600843264}]) = 17
futex(0x7ffd76da6c3c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7ffd76da6c38, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7ffd76da6e5c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7ffd76da6e58, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x562fafe31eec, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x562fafe31ec0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x562fafe31eec, FUTEX_WAIT_PRIVATE, 3, NULL/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_kv_sync_thread()' thread 7fd8a4968700 time 2018-07-17 21:07:19.635580
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.5/rpm/el7/BUILD/ceph-12.2.5/src/os/bluestore/BlueStore.cc: 8626: FAILED assert(r == 0)
2018-07-17 21:07:19.635526 7fd8a4968700 -1 rocksdb: submit_transaction_sync error: Corruption: block checksum mismatch code = 2 Rocksdb transaction:

Full logs attached. Don't hesitate to contact me for further details.


Files

fschk.txt (7.51 KB) fschk.txt Bluestore check Markus Stockhausen, 07/17/2018 07:15 PM
ceph-osd.5.log.zip (589 KB) ceph-osd.5.log.zip OSD log Markus Stockhausen, 07/17/2018 07:17 PM
out (841 KB) out detail check log Markus Stockhausen, 07/17/2018 07:23 PM
osd24.fsck.zip (227 KB) osd24.fsck.zip 2nd error fsck Markus Stockhausen, 07/21/2018 07:29 PM
ceph-osd.24.log.txt (77.7 KB) ceph-osd.24.log.txt 2nd error osd start Markus Stockhausen, 07/21/2018 07:29 PM
Actions #1

Updated by Markus Stockhausen almost 6 years ago

Log of

CEPH_ARGS="--debug-bluestore 20 --debug-bluefs 20 --err-to-stderr --log-file out" ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-5

Actions #2

Updated by Markus Stockhausen almost 6 years ago

System is Centos 7.5 with longterm kernel 4.14.52 (kernel-ml spinoff from ELRepo)

Actions #3

Updated by Markus Stockhausen almost 6 years ago

Ceph.conf

[global]
fsid = 2b3dab08-1a83-41d7-bd91-808725d406e4
mon_initial_members = cfiler102
mon_host = yyy.yyy.yyy.yyy
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

public network = xxx.xxx.xxx.xxx/21
cluster network = xxx.xxx.xxx.xxx/21

[mon]
mon_allow_pool_delete = false

[osd]
osd crush update on start = false

Actions #4

Updated by Markus Stockhausen almost 6 years ago

One strange thing from the detail log (out) is a size mismatch for bdev 2:

At the beginning of the log we see:

2018-07-17 21:21:28.905591 7f1a9ac42d80 10 bluefs add_block_device bdev 1 path /dev/fiob6
2018-07-17 21:21:28.905598 7f1a9ac42d80  1 bdev create path /dev/fiob6 type kernel
2018-07-17 21:21:28.905604 7f1a9ac42d80  1 bdev(0x560bd1301e00 /dev/fiob6) open path /dev/fiob6
2018-07-17 21:21:28.905994 7f1a9ac42d80  1 bdev(0x560bd1301e00 /dev/fiob6) open size 19999490048 (0x4a8100000, 19073 MB) block_size 4096 (4096 B) non-rotational
2018-07-17 21:21:28.906013 7f1a9ac42d80  1 bluefs add_block_device bdev 1 path /dev/fiob6 size 19073 MB
2018-07-17 21:21:28.906020 7f1a9ac42d80 10 bluestore(/dev/fiob6) _read_bdev_label
2018-07-17 21:21:28.906243 7f1a9ac42d80 10 bluestore(/dev/fiob6) _read_bdev_label got bdev(osd_uuid af6c4241-1ae2-4f11-9e10-2b8c5ec573a9, size 0x4a8100000, btime 2018-07-10 16:50:02.208763, desc bluefs db, 0 meta)
2018-07-17 21:21:28.906287 7f1a9ac42d80 10 bluestore(/var/lib/ceph/osd/ceph-5/block) _read_bdev_label
2018-07-17 21:21:28.906353 7f1a9ac42d80 10 bluestore(/var/lib/ceph/osd/ceph-5/block) _read_bdev_label got bdev(osd_uuid af6c4241-1ae2-4f11-9e10-2b8c5ec573a9, size 0x1a307c00000, btime 2018-07-10 16:50:02.208014, desc main, 10 meta)
2018-07-17 21:21:28.906400 7f1a9ac42d80 10 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-5/block
2018-07-17 21:21:28.906412 7f1a9ac42d80  1 bdev create path /var/lib/ceph/osd/ceph-5/block type kernel
2018-07-17 21:21:28.906418 7f1a9ac42d80  1 bdev(0x560bd1301200 /var/lib/ceph/osd/ceph-5/block) open path /var/lib/ceph/osd/ceph-5/block
2018-07-17 21:21:28.906758 7f1a9ac42d80  1 bdev(0x560bd1301200 /var/lib/ceph/osd/ceph-5/block) open size 1799721320448 (0x1a307c00000, 1676 GB) block_size 4096 (4096 B) rotational
2018-07-17 21:21:28.906776 7f1a9ac42d80  1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-5/block size 1676 GB
2018-07-17 21:21:28.906782 7f1a9ac42d80 10 bluestore(/var/lib/ceph/osd/ceph-5/block) _read_bdev_label
2018-07-17 21:21:28.906829 7f1a9ac42d80 10 bluestore(/var/lib/ceph/osd/ceph-5/block) _read_bdev_label got bdev(osd_uuid af6c4241-1ae2-4f11-9e10-2b8c5ec573a9, size 0x1a307c00000, btime 2018-07-10 16:50:02.208014, desc main, 10 meta)
2018-07-17 21:21:28.906851 7f1a9ac42d80 10 bluefs add_block_device bdev 0 path /dev/fiob5
2018-07-17 21:21:28.906857 7f1a9ac42d80  1 bdev create path /dev/fiob5 type kernel
2018-07-17 21:21:28.906863 7f1a9ac42d80  1 bdev(0x560bd1300600 /dev/fiob5) open path /dev/fiob5
2018-07-17 21:21:28.907270 7f1a9ac42d80  1 bdev(0x560bd1300600 /dev/fiob5) open size 1999634432 (0x77300000, 1907 MB) block_size 4096 (4096 B) non-rotational
2018-07-17 21:21:28.907289 7f1a9ac42d80  1 bluefs add_block_device bdev 0 path /dev/fiob5 size 1907 MB

~20GB for rocksdb, ~2GB for WAL and 1676GB for OSD. Later on the log prints:

2018-07-17 21:21:29.041233 7f1a89dd1700 10 bluefs get_usage bdev 0 free 1995436032 (1902 MB) / 1999630336 (1906 MB), used 0%
2018-07-17 21:21:29.041241 7f1a89dd1700 10 bluefs get_usage bdev 1 free 19941810176 (19017 MB) / 19999481856 (19072 MB), used 0%
2018-07-17 21:21:29.041248 7f1a89dd1700 10 bluefs get_usage bdev 2 free 71988936704 (68654 MB) / 71988936704 (68654 MB), used 0%

bdev 2 with only 70GB total size is definetly wrong.

Actions #5

Updated by Igor Fedotov almost 6 years ago

Markus,
first of all - I think 'improper' device size reporting is unrelated to this issue. This report contains just a fraction of device's space given (dynamically!) to BlueFS. In fact most of this disk space is for raw user data and hence not reported under bluefs get_usage report. So I believe that's OK.

As for the issue itself - it looks somewhat similar to a bunch of other tickets:
https://tracker.ceph.com/issues/22102
https://tracker.ceph.com/issues/22464
https://tracker.ceph.com/issues/24901

Some of them mentioned high memory pressure as one of the potential root cause. Isn't that the case for you as well?
Can you try to reduce bluestore_cache_size to verify that assumption?

Actions #6

Updated by Markus Stockhausen almost 6 years ago

Hi Igor,

thanks for your feedback. The ceph servers have 128GB RAM with 10 1.8TB HDD and 3 1.92TB SSD plus an 640GB FusionIO drive each. With default settings this should be 10x1GB + 3x3GB (plus overhead) and easily fit into RAM. I'm not yet fit in the sizing of the memory so I'll take your advise. New parameters are:

  1. ceph -n osd.22 --show-config | grep cache_size
    bluestore_cache_size = 0
    bluestore_cache_size_hdd = 209715200
    bluestore_cache_size_ssd = 419430400

As I have no idea how to repair the cluster I will remove all OSDs and create new ones. Will keep you informed.

Actions #7

Updated by Igor Fedotov almost 6 years ago

Markus,
before redeploying the OSDs can you monitor current memory usage with top and/or free tools for a while?
Just to realize what's happening now...

Updated by Markus Stockhausen over 5 years ago

Sorry Igor I started rebuilding the cluster shortly before your answer.

Nevertheless I was able to reproduce the error even with lower memory settings (see before).

Current state is:
  • fschk is clean (log attached)
  • start of OSD fails (log attached)

Initial error log got lost during analysis.

Actions #9

Updated by Igor Fedotov over 5 years ago

Markus,
does it refuses to start from scratch or after some load?

Is this just a single OSD or multiple ones are crashing?
Any correlation between previously failing nodes and new ones? I mean do they use the same drives or different ones?

Also I'd suggest to verify drives that are backing failing OSDs if you haven't still done that.

Actions #10

Updated by Markus Stockhausen over 5 years ago

Hi Igor,

checking the hardware was a good clue. I guess the reason is identified. Bluestore WAL/DB does not play nice with FusionIO drives and newer kernels. Building a cluster on the same nodes without FusioIO drives worked fine with medium load the last night. To sum up my (crude) test setup.

- Centos 7.5
- Kernel 4.14.52 (derived from EPEL)
- FusionIO drives (totally unoffical from https://github.com/snuf/iomemory-vsl)
- WAL/DB on /dev/fioXX
- Bluestore on normal 1.8TB SAS disks
- Only using cephfs.

As soon as I put pressure on the array or start rebalancing the OSDs crash randomly with the above mentioned error.

The drives are listed on https://ceph.com/geen-categorie/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ and so I gave the setup a try. It is obvious that FusionIO cards are no longer supported and I'm using a totally inoffical build. Nevertheless I'm willing to run tests or provide additional info for you. Maybe this can help others. Just tell me what I can do.

Actions #11

Updated by Markus Stockhausen over 5 years ago

Did some further tests. The iomemory-vsl driver seems to scramble data in some conditions. So no need to search for interoperability problems with ceph.

Ticket can be closed if you like.

Actions #12

Updated by Igor Fedotov over 5 years ago

  • Status changed from New to Closed

Hardware issues were the root cause hence closing.

Actions

Also available in: Atom PDF