Actions
Bug #50017
closedOSDs broken after nautilus->octopus upgrade: rocksdb Corruption: unknown WriteBatch tag
Status:
Resolved
Priority:
High
Assignee:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
pacific, octopus, nautilus
Regression:
Yes
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
I just started upgrading a cluster to octopus. MON and MGR are all on 15.2.10 and everything looks nice.
Then I upgraded the OSDs of one host: ceph-osd:amd64 (14.2.16-1bionic, 15.2.10-1bionic).
It found a lot of zombie spanning blobs, and then crashed. Now it refuses to start. And all the other OSDs on that host, too.
[... many more zombie blobs ...] -33> 2021-03-26T16:58:06.263+0100 7f284e679700 -1 bluestore(/var/lib/ceph/osd/ceph-80) fsck error: 6#31:210b8e42:datensicherungsserver::rbd_data.30.c2dfccc083f516.0000000000006852:head# - 1 zombie spanning blob(s) found, the first one: Blob(0x55d1c19c1100 spanning 0 blob([!~30000] csum crc32c/0x1000) use_tracker(0x3*0x10000 0x[0,0,0]) SharedBlob(0x55d1c19bf420 sbid 0x0)) -32> 2021-03-26T16:58:06.675+0100 7f284f67b700 5 prioritycache tune_memory target: 2147483648 mapped: 2631933952 unmapped: 750092288 heap: 3382026240 old mem: 134217728 new mem: 134217728 -31> 2021-03-26T16:58:06.711+0100 7f2860674d80 -1 bluestore(/var/lib/ceph/osd/ceph-80) fsck error: 6#31:2aeb130f:datensicherungsserver::rbd_data.30.c2dfccc083f516.0000000000006016:head# - 1 zombie spanning blob(s) found, the first one: Blob(0x55d162b94bc0 spanning 0 blob([!~10000] csum crc32c/0x1000) use_tracker(0x10000 0x0) SharedBlob(0x55d162b952c0 sbid 0x0)) -30> 2021-03-26T16:58:06.967+0100 7f2860674d80 0 bluestore(/var/lib/ceph/osd/ceph-80) _fsck_check_objects partial offload, done myself 572127 of 4822744objects, threads 2 -29> 2021-03-26T16:58:06.983+0100 7f2860674d80 1 bluestore(/var/lib/ceph/osd/ceph-80) _fsck_on_open checking shared_blobs -28> 2021-03-26T16:58:07.675+0100 7f284f67b700 5 prioritycache tune_memory target: 2147483648 mapped: 2612781056 unmapped: 769245184 heap: 3382026240 old mem: 134217728 new mem: 134217728 -27> 2021-03-26T16:58:08.179+0100 7f284f67b700 5 bluestore.MempoolThread(0x55d136754a98) _resize_shards cache_size: 134217728 kv_alloc: 67108864 kv_used: 66050416 meta_alloc: 67108864 meta_used: 376573864 data_alloc: 67108864 data_used: 0 -26> 2021-03-26T16:58:08.679+0100 7f284f67b700 5 prioritycache tune_memory target: 2147483648 mapped: 2605285376 unmapped: 776740864 heap: 3382026240 old mem: 134217728 new mem: 134217728 -25> 2021-03-26T16:58:09.679+0100 7f284f67b700 5 prioritycache tune_memory target: 2147483648 mapped: 2598166528 unmapped: 783859712 heap: 3382026240 old mem: 134217728 new mem: 134217728 -24> 2021-03-26T16:58:10.683+0100 7f284f67b700 5 prioritycache tune_memory target: 2147483648 mapped: 2596691968 unmapped: 785334272 heap: 3382026240 old mem: 134217728 new mem: 134217728 -23> 2021-03-26T16:58:11.683+0100 7f284f67b700 5 prioritycache tune_memory target: 2147483648 mapped: 2602442752 unmapped: 779583488 heap: 3382026240 old mem: 134217728 new mem: 134217728 -22> 2021-03-26T16:58:12.682+0100 7f284f67b700 5 prioritycache tune_memory target: 2147483648 mapped: 2603786240 unmapped: 778240000 heap: 3382026240 old mem: 134217728 new mem: 134217728 -21> 2021-03-26T16:58:13.186+0100 7f284f67b700 5 bluestore.MempoolThread(0x55d136754a98) _resize_shards cache_size: 134217728 kv_alloc: 67108864 kv_used: 66978032 meta_alloc: 67108864 meta_used: 320983536 data_alloc: 67108864 data_used: 0 -20> 2021-03-26T16:58:13.686+0100 7f284f67b700 5 prioritycache tune_memory target: 2147483648 mapped: 2603868160 unmapped: 778158080 heap: 3382026240 old mem: 134217728 new mem: 134217728 -19> 2021-03-26T16:58:14.686+0100 7f284f67b700 5 prioritycache tune_memory target: 2147483648 mapped: 2496552960 unmapped: 885473280 heap: 3382026240 old mem: 134217728 new mem: 134217728 -18> 2021-03-26T16:58:15.686+0100 7f284f67b700 5 prioritycache tune_memory target: 2147483648 mapped: 1615413248 unmapped: 1766612992 heap: 3382026240 old mem: 134217728 new mem: 353696767 -17> 2021-03-26T16:58:15.702+0100 7f2860674d80 1 bluestore(/var/lib/ceph/osd/ceph-80) _fsck_on_open checking pool_statfs -16> 2021-03-26T16:58:15.702+0100 7f2860674d80 5 bluestore(/var/lib/ceph/osd/ceph-80) _fsck_on_open marking per_pool_omap=1 -15> 2021-03-26T16:58:15.702+0100 7f2860674d80 5 bluestore(/var/lib/ceph/osd/ceph-80) _fsck_on_open applying repair results -14> 2021-03-26T16:58:16.070+0100 7f2860674d80 -1 rocksdb: submit_common error: Corruption: unknown WriteBatch tag code = 2 Rocksdb transaction: Put( Prefix = O key = 0x80800000000000000a19400f'>dumpsite!rbd_data.6.28423ad8f48ca1.00000000023037b7!='0xfffffffffffffffeffffffffffffffff'o' Value size = 1388) Put( Prefix = O key = 0x80800000000000000a194023'8dumpsite!rbd_data.6.28423ad8f48ca1.000000000294787d!='0xfffffffffffffffeffffffffffffffff'o' Value size = 1593) Put( Prefix = O key = 0x80800000000000000a194027'Rdumpsite!rbd_data.6.28423ad8f48ca1.0000000001b366ff!='0xfffffffffffffffeffffffffffffffff'o' Value size = 2964) Put( Prefix = O key = 0x80800000000000000a1940f69264756d'psite!rbd_data.6.28423ad8f48ca1.00000000011bdd0c!='0xfffffffffffffffeffffffffffffffff'o' Value size = 2435) Put( Prefix = O key = 0x80800000000000000a194109e164756d'psite!rbd_data.6.28423ad8f48ca1.0000000001817e76!='0xfffffffffffffffeffffffffffffffff'o' Value size = 442) Put( Prefix = O key = 0x80800000000000000a19411a9864756d'psite!rbd_data.6.28423ad8f48ca1.000000000185cc48!='0xfffffffffffffffeffffffffffffffff'o' Value size = 4745) [.........] -13> 2021-03-26T16:58:16.070+0100 7f2860674d80 5 bluestore(/var/lib/ceph/osd/ceph-80) _fsck_on_open repair applied -12> 2021-03-26T16:58:16.070+0100 7f2860674d80 2 bluestore(/var/lib/ceph/osd/ceph-80) _fsck_on_open 4822744 objects, 1280857 of them sharded. -11> 2021-03-26T16:58:16.070+0100 7f2860674d80 2 bluestore(/var/lib/ceph/osd/ceph-80) _fsck_on_open 27050327 extents to 25858545 blobs, 1119809 spanning, 1803008 shared. -10> 2021-03-26T16:58:16.070+0100 7f2860674d80 1 bluestore(/var/lib/ceph/osd/ceph-80) _fsck_on_open <<<FINISH>>> with 14849 errors, 2 warnings, 14851 repaired, 0 remaining in 460.305791 seconds -9> 2021-03-26T16:58:16.690+0100 7f284f67b700 5 prioritycache tune_memory target: 2147483648 mapped: 1263484928 unmapped: 2118541312 heap: 3382026240 old mem: 353696767 new mem: 627999013 -8> 2021-03-26T16:58:16.786+0100 7f2860674d80 2 osd.80 0 journal looks like hdd -7> 2021-03-26T16:58:16.786+0100 7f2860674d80 2 osd.80 0 boot -6> 2021-03-26T16:58:16.874+0100 7f2860674d80 1 osd.80 489295 init upgrade snap_mapper (first start as octopus) -5> 2021-03-26T16:58:17.690+0100 7f284f67b700 5 prioritycache tune_memory target: 2147483648 mapped: 1118093312 unmapped: 2263932928 heap: 3382026240 old mem: 627999013 new mem: 815929835 -4> 2021-03-26T16:58:18.190+0100 7f284f67b700 5 bluestore.MempoolThread(0x55d136754a98) _resize_shards cache_size: 815929835 kv_alloc: 339738624 kv_used: 66977632 meta_alloc: 297795584 meta_used: 22052828 data_alloc: 171966464 data_used: 0 -3> 2021-03-26T16:58:18.450+0100 7f284ae72700 5 bluestore(/var/lib/ceph/osd/ceph-80) _kv_sync_thread utilization: idle 462.729599522s of 462.807117764s, submitted: 0 -2> 2021-03-26T16:58:18.450+0100 7f284ae72700 -1 rocksdb: submit_common error: Corruption: unknown WriteBatch tag code = 2 Rocksdb transaction: Put( Prefix = m key = 0x00000000000000000000000000000402'.SNA_14_00000000000000BB_' Value size = 84) Put( Prefix = m key = 0x00000000000000000000000000000402'.SNA_14_00000000000000BD_' Value size = 91) Put( Prefix = m key = 0x00000000000000000000000000000402'.SNA_14_00000000000000C0_' Value size = 84) Put( Prefix = m key = 0x00000000000000000000000000000402'.SNA_14_00000000000000C2_' Value size = 91) Put( Prefix = m key = 0x00000000000000000000000000000402'.SNA_14_00000000000000C5_' Value size = 84) [......] Put( Prefix = m key = 0x00000000000000000000000000000402'.SNA_9_00000000000000E7_' Value size = 73) Put( Prefix = m key = 0x00000000000000000000000000000402'.SNA_9_00000000000000EC_' Value size = 73) -1> 2021-03-26T16:58:18.454+0100 7f284ae72700 -1 /build/ceph-15.2.10/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)' thread 7f284ae72700 time 2021-03-26T16:58:18.457201+0100 /build/ceph-15.2.10/src/os/bluestore/BlueStore.cc: 11849: FAILED ceph_assert(r == 0) ceph version 15.2.10 (27917a557cca91e4da407489bbaa64ad4352cc02) octopus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x154) [0x55d12c437c02] 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x55d12c437ddd] 3: (BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x3a0) [0x55d12c970c00] 4: (BlueStore::_kv_sync_thread()+0x12d3) [0x55d12c9ca3f3] 5: (BlueStore::KVSyncThread::entry()+0xd) [0x55d12c9ef54d] 6: (()+0x76db) [0x7f285e7ca6db] 7: (clone()+0x3f) [0x7f285d56a71f] [ OSD REBOOT by systemd ] 2021-03-26T16:58:29.198+0100 7fb6154e7d80 0 set uid:gid to 64045:64045 (ceph:ceph) 2021-03-26T16:58:29.198+0100 7fb6154e7d80 0 ceph version 15.2.10 (27917a557cca91e4da407489bbaa64ad4352cc02) octopus (stable), process ceph-osd, pid 2044931 2021-03-26T16:58:29.198+0100 7fb6154e7d80 0 pidfile_write: ignore empty --pid-file 2021-03-26T16:58:29.198+0100 7fb6154e7d80 1 bdev create path /var/lib/ceph/osd/ceph-80/block type kernel 2021-03-26T16:58:29.198+0100 7fb6154e7d80 1 bdev(0x557a59578000 /var/lib/ceph/osd/ceph-80/block) open path /var/lib/ceph/osd/ceph-80/block 2021-03-26T16:58:29.198+0100 7fb6154e7d80 1 bdev(0x557a59578000 /var/lib/ceph/osd/ceph-80/block) open size 8001524072448 (0x74700000000, 7.3 TiB) block_size 4096 (4 KiB) rotational discard not supported 2021-03-26T16:58:29.198+0100 7fb6154e7d80 1 bluestore(/var/lib/ceph/osd/ceph-80) _set_cache_sizes cache_size 1073741824 meta 0.4 kv 0.4 data 0.2 [...] 2021-03-26T16:58:30.070+0100 7fb6154e7d80 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-80/block size 7.3 TiB 2021-03-26T16:58:30.070+0100 7fb6154e7d80 1 bluefs mount 2021-03-26T16:58:30.070+0100 7fb6154e7d80 1 bluefs _init_alloc id 1 alloc_size 0x10000 size 0x74700000000 2021-03-26T16:58:30.238+0100 7fb6154e7d80 0 set rocksdb option compaction_readahead_size = 2097152 2021-03-26T16:58:30.238+0100 7fb6154e7d80 0 set rocksdb option compression = kNoCompression 2021-03-26T16:58:30.238+0100 7fb6154e7d80 0 set rocksdb option max_background_compactions = 2 2021-03-26T16:58:30.238+0100 7fb6154e7d80 0 set rocksdb option max_write_buffer_number = 4 2021-03-26T16:58:30.238+0100 7fb6154e7d80 0 set rocksdb option min_write_buffer_number_to_merge = 1 2021-03-26T16:58:30.238+0100 7fb6154e7d80 0 set rocksdb option recycle_log_file_num = 4 2021-03-26T16:58:30.238+0100 7fb6154e7d80 0 set rocksdb option writable_file_max_buffer_size = 0 2021-03-26T16:58:30.238+0100 7fb6154e7d80 0 set rocksdb option write_buffer_size = 268435456 2021-03-26T16:58:30.238+0100 7fb6154e7d80 0 set rocksdb option compaction_readahead_size = 2097152 2021-03-26T16:58:30.238+0100 7fb6154e7d80 0 set rocksdb option compression = kNoCompression 2021-03-26T16:58:30.238+0100 7fb6154e7d80 0 set rocksdb option max_background_compactions = 2 2021-03-26T16:58:30.238+0100 7fb6154e7d80 0 set rocksdb option max_write_buffer_number = 4 2021-03-26T16:58:30.238+0100 7fb6154e7d80 0 set rocksdb option min_write_buffer_number_to_merge = 1 2021-03-26T16:58:30.238+0100 7fb6154e7d80 0 set rocksdb option recycle_log_file_num = 4 2021-03-26T16:58:30.238+0100 7fb6154e7d80 0 set rocksdb option writable_file_max_buffer_size = 0 2021-03-26T16:58:30.238+0100 7fb6154e7d80 0 set rocksdb option write_buffer_size = 268435456 2021-03-26T16:58:42.438+0100 7fb6154e7d80 -1 rocksdb: Corruption: unknown WriteBatch tag 2021-03-26T16:58:42.438+0100 7fb6154e7d80 -1 bluestore(/var/lib/ceph/osd/ceph-80) _open_db erroring opening db: 2021-03-26T16:58:42.438+0100 7fb6154e7d80 1 bluefs umount 2021-03-26T16:58:42.438+0100 7fb6154e7d80 1 fbmap_alloc 0x557a58888900 shutdown 2021-03-26T16:58:42.462+0100 7fb6154e7d80 1 bdev(0x557a59578380 /var/lib/ceph/osd/ceph-80/block) close 2021-03-26T16:58:42.642+0100 7fb6154e7d80 1 bdev(0x557a59578000 /var/lib/ceph/osd/ceph-80/block) close 2021-03-26T16:58:42.910+0100 7fb6154e7d80 -1 osd.80 0 OSD:init: unable to mount object store 2021-03-26T16:58:42.910+0100 7fb6154e7d80 -1 ** ERROR: osd init failed: (5) Input/output error
And that's it, it now refuses to start.
Full log is attached.
What do?
Files
Actions