Project

General

Profile

Actions

Bug #50017

closed

OSDs broken after nautilus->octopus upgrade: rocksdb Corruption: unknown WriteBatch tag

Added by Jonas Jelten about 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
pacific, octopus, nautilus
Regression:
Yes
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I just started upgrading a cluster to octopus. MON and MGR are all on 15.2.10 and everything looks nice.

Then I upgraded the OSDs of one host: ceph-osd:amd64 (14.2.16-1bionic, 15.2.10-1bionic).
It found a lot of zombie spanning blobs, and then crashed. Now it refuses to start. And all the other OSDs on that host, too.

[... many more zombie blobs ...]
   -33> 2021-03-26T16:58:06.263+0100 7f284e679700 -1 bluestore(/var/lib/ceph/osd/ceph-80) fsck error: 6#31:210b8e42:datensicherungsserver::rbd_data.30.c2dfccc083f516.0000000000006852:head# - 1 zombie spanning blob(s) found, the first one: Blob(0x55d1c19c1100 spanning 0 blob([!~30000] csum crc32c/0x1000) use_tracker(0x3*0x10000 0x[0,0,0]) SharedBlob(0x55d1c19bf420 sbid 0x0))
   -32> 2021-03-26T16:58:06.675+0100 7f284f67b700  5 prioritycache tune_memory target: 2147483648 mapped: 2631933952 unmapped: 750092288 heap: 3382026240 old mem: 134217728 new mem: 134217728
   -31> 2021-03-26T16:58:06.711+0100 7f2860674d80 -1 bluestore(/var/lib/ceph/osd/ceph-80) fsck error: 6#31:2aeb130f:datensicherungsserver::rbd_data.30.c2dfccc083f516.0000000000006016:head# - 1 zombie spanning blob(s) found, the first one: Blob(0x55d162b94bc0 spanning 0 blob([!~10000] csum crc32c/0x1000) use_tracker(0x10000 0x0) SharedBlob(0x55d162b952c0 sbid 0x0))
   -30> 2021-03-26T16:58:06.967+0100 7f2860674d80  0 bluestore(/var/lib/ceph/osd/ceph-80) _fsck_check_objects partial offload, done myself 572127 of 4822744objects, threads 2
   -29> 2021-03-26T16:58:06.983+0100 7f2860674d80  1 bluestore(/var/lib/ceph/osd/ceph-80) _fsck_on_open checking shared_blobs
   -28> 2021-03-26T16:58:07.675+0100 7f284f67b700  5 prioritycache tune_memory target: 2147483648 mapped: 2612781056 unmapped: 769245184 heap: 3382026240 old mem: 134217728 new mem: 134217728
   -27> 2021-03-26T16:58:08.179+0100 7f284f67b700  5 bluestore.MempoolThread(0x55d136754a98) _resize_shards cache_size: 134217728 kv_alloc: 67108864 kv_used: 66050416 meta_alloc: 67108864 meta_used: 376573864 data_alloc: 67108864 data_used: 0
   -26> 2021-03-26T16:58:08.679+0100 7f284f67b700  5 prioritycache tune_memory target: 2147483648 mapped: 2605285376 unmapped: 776740864 heap: 3382026240 old mem: 134217728 new mem: 134217728
   -25> 2021-03-26T16:58:09.679+0100 7f284f67b700  5 prioritycache tune_memory target: 2147483648 mapped: 2598166528 unmapped: 783859712 heap: 3382026240 old mem: 134217728 new mem: 134217728
   -24> 2021-03-26T16:58:10.683+0100 7f284f67b700  5 prioritycache tune_memory target: 2147483648 mapped: 2596691968 unmapped: 785334272 heap: 3382026240 old mem: 134217728 new mem: 134217728
   -23> 2021-03-26T16:58:11.683+0100 7f284f67b700  5 prioritycache tune_memory target: 2147483648 mapped: 2602442752 unmapped: 779583488 heap: 3382026240 old mem: 134217728 new mem: 134217728
   -22> 2021-03-26T16:58:12.682+0100 7f284f67b700  5 prioritycache tune_memory target: 2147483648 mapped: 2603786240 unmapped: 778240000 heap: 3382026240 old mem: 134217728 new mem: 134217728
   -21> 2021-03-26T16:58:13.186+0100 7f284f67b700  5 bluestore.MempoolThread(0x55d136754a98) _resize_shards cache_size: 134217728 kv_alloc: 67108864 kv_used: 66978032 meta_alloc: 67108864 meta_used: 320983536 data_alloc: 67108864 data_used: 0
   -20> 2021-03-26T16:58:13.686+0100 7f284f67b700  5 prioritycache tune_memory target: 2147483648 mapped: 2603868160 unmapped: 778158080 heap: 3382026240 old mem: 134217728 new mem: 134217728
   -19> 2021-03-26T16:58:14.686+0100 7f284f67b700  5 prioritycache tune_memory target: 2147483648 mapped: 2496552960 unmapped: 885473280 heap: 3382026240 old mem: 134217728 new mem: 134217728
   -18> 2021-03-26T16:58:15.686+0100 7f284f67b700  5 prioritycache tune_memory target: 2147483648 mapped: 1615413248 unmapped: 1766612992 heap: 3382026240 old mem: 134217728 new mem: 353696767
   -17> 2021-03-26T16:58:15.702+0100 7f2860674d80  1 bluestore(/var/lib/ceph/osd/ceph-80) _fsck_on_open checking pool_statfs
   -16> 2021-03-26T16:58:15.702+0100 7f2860674d80  5 bluestore(/var/lib/ceph/osd/ceph-80) _fsck_on_open marking per_pool_omap=1
   -15> 2021-03-26T16:58:15.702+0100 7f2860674d80  5 bluestore(/var/lib/ceph/osd/ceph-80) _fsck_on_open applying repair results
   -14> 2021-03-26T16:58:16.070+0100 7f2860674d80 -1 rocksdb: submit_common error: Corruption: unknown WriteBatch tag code = 2 Rocksdb transaction: 
Put( Prefix = O key = 0x80800000000000000a19400f'>dumpsite!rbd_data.6.28423ad8f48ca1.00000000023037b7!='0xfffffffffffffffeffffffffffffffff'o' Value size = 1388)
Put( Prefix = O key = 0x80800000000000000a194023'8dumpsite!rbd_data.6.28423ad8f48ca1.000000000294787d!='0xfffffffffffffffeffffffffffffffff'o' Value size = 1593)
Put( Prefix = O key = 0x80800000000000000a194027'Rdumpsite!rbd_data.6.28423ad8f48ca1.0000000001b366ff!='0xfffffffffffffffeffffffffffffffff'o' Value size = 2964)
Put( Prefix = O key = 0x80800000000000000a1940f69264756d'psite!rbd_data.6.28423ad8f48ca1.00000000011bdd0c!='0xfffffffffffffffeffffffffffffffff'o' Value size = 2435)
Put( Prefix = O key = 0x80800000000000000a194109e164756d'psite!rbd_data.6.28423ad8f48ca1.0000000001817e76!='0xfffffffffffffffeffffffffffffffff'o' Value size = 442)
Put( Prefix = O key = 0x80800000000000000a19411a9864756d'psite!rbd_data.6.28423ad8f48ca1.000000000185cc48!='0xfffffffffffffffeffffffffffffffff'o' Value size = 4745)
[.........]
   -13> 2021-03-26T16:58:16.070+0100 7f2860674d80  5 bluestore(/var/lib/ceph/osd/ceph-80) _fsck_on_open repair applied
   -12> 2021-03-26T16:58:16.070+0100 7f2860674d80  2 bluestore(/var/lib/ceph/osd/ceph-80) _fsck_on_open 4822744 objects, 1280857 of them sharded.  
   -11> 2021-03-26T16:58:16.070+0100 7f2860674d80  2 bluestore(/var/lib/ceph/osd/ceph-80) _fsck_on_open 27050327 extents to 25858545 blobs, 1119809 spanning, 1803008 shared.
   -10> 2021-03-26T16:58:16.070+0100 7f2860674d80  1 bluestore(/var/lib/ceph/osd/ceph-80) _fsck_on_open <<<FINISH>>> with 14849 errors, 2 warnings, 14851 repaired, 0 remaining in 460.305791 seconds
    -9> 2021-03-26T16:58:16.690+0100 7f284f67b700  5 prioritycache tune_memory target: 2147483648 mapped: 1263484928 unmapped: 2118541312 heap: 3382026240 old mem: 353696767 new mem: 627999013
    -8> 2021-03-26T16:58:16.786+0100 7f2860674d80  2 osd.80 0 journal looks like hdd
    -7> 2021-03-26T16:58:16.786+0100 7f2860674d80  2 osd.80 0 boot
    -6> 2021-03-26T16:58:16.874+0100 7f2860674d80  1 osd.80 489295 init upgrade snap_mapper (first start as octopus)
    -5> 2021-03-26T16:58:17.690+0100 7f284f67b700  5 prioritycache tune_memory target: 2147483648 mapped: 1118093312 unmapped: 2263932928 heap: 3382026240 old mem: 627999013 new mem: 815929835
    -4> 2021-03-26T16:58:18.190+0100 7f284f67b700  5 bluestore.MempoolThread(0x55d136754a98) _resize_shards cache_size: 815929835 kv_alloc: 339738624 kv_used: 66977632 meta_alloc: 297795584 meta_used: 22052828 data_alloc: 171966464 data_used: 0
    -3> 2021-03-26T16:58:18.450+0100 7f284ae72700  5 bluestore(/var/lib/ceph/osd/ceph-80) _kv_sync_thread utilization: idle 462.729599522s of 462.807117764s, submitted: 0
    -2> 2021-03-26T16:58:18.450+0100 7f284ae72700 -1 rocksdb: submit_common error: Corruption: unknown WriteBatch tag code = 2 Rocksdb transaction: 
Put( Prefix = m key = 0x00000000000000000000000000000402'.SNA_14_00000000000000BB_' Value size = 84)
Put( Prefix = m key = 0x00000000000000000000000000000402'.SNA_14_00000000000000BD_' Value size = 91)
Put( Prefix = m key = 0x00000000000000000000000000000402'.SNA_14_00000000000000C0_' Value size = 84)
Put( Prefix = m key = 0x00000000000000000000000000000402'.SNA_14_00000000000000C2_' Value size = 91)
Put( Prefix = m key = 0x00000000000000000000000000000402'.SNA_14_00000000000000C5_' Value size = 84)
[......]
Put( Prefix = m key = 0x00000000000000000000000000000402'.SNA_9_00000000000000E7_' Value size = 73)
Put( Prefix = m key = 0x00000000000000000000000000000402'.SNA_9_00000000000000EC_' Value size = 73)
    -1> 2021-03-26T16:58:18.454+0100 7f284ae72700 -1 /build/ceph-15.2.10/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)' thread 7f284ae72700 time 2021-03-26T16:58:18.457201+0100
/build/ceph-15.2.10/src/os/bluestore/BlueStore.cc: 11849: FAILED ceph_assert(r == 0)

 ceph version 15.2.10 (27917a557cca91e4da407489bbaa64ad4352cc02) octopus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x154) [0x55d12c437c02]
 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x55d12c437ddd]
 3: (BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x3a0) [0x55d12c970c00]
 4: (BlueStore::_kv_sync_thread()+0x12d3) [0x55d12c9ca3f3]
 5: (BlueStore::KVSyncThread::entry()+0xd) [0x55d12c9ef54d]
 6: (()+0x76db) [0x7f285e7ca6db]
 7: (clone()+0x3f) [0x7f285d56a71f]

[ OSD REBOOT by systemd ]

2021-03-26T16:58:29.198+0100 7fb6154e7d80  0 set uid:gid to 64045:64045 (ceph:ceph)
2021-03-26T16:58:29.198+0100 7fb6154e7d80  0 ceph version 15.2.10 (27917a557cca91e4da407489bbaa64ad4352cc02) octopus (stable), process ceph-osd, pid 2044931
2021-03-26T16:58:29.198+0100 7fb6154e7d80  0 pidfile_write: ignore empty --pid-file
2021-03-26T16:58:29.198+0100 7fb6154e7d80  1 bdev create path /var/lib/ceph/osd/ceph-80/block type kernel
2021-03-26T16:58:29.198+0100 7fb6154e7d80  1 bdev(0x557a59578000 /var/lib/ceph/osd/ceph-80/block) open path /var/lib/ceph/osd/ceph-80/block
2021-03-26T16:58:29.198+0100 7fb6154e7d80  1 bdev(0x557a59578000 /var/lib/ceph/osd/ceph-80/block) open size 8001524072448 (0x74700000000, 7.3 TiB) block_size 4096 (4 KiB) rotational discard not supported
2021-03-26T16:58:29.198+0100 7fb6154e7d80  1 bluestore(/var/lib/ceph/osd/ceph-80) _set_cache_sizes cache_size 1073741824 meta 0.4 kv 0.4 data 0.2
[...]
2021-03-26T16:58:30.070+0100 7fb6154e7d80  1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-80/block size 7.3 TiB
2021-03-26T16:58:30.070+0100 7fb6154e7d80  1 bluefs mount
2021-03-26T16:58:30.070+0100 7fb6154e7d80  1 bluefs _init_alloc id 1 alloc_size 0x10000 size 0x74700000000
2021-03-26T16:58:30.238+0100 7fb6154e7d80  0  set rocksdb option compaction_readahead_size = 2097152
2021-03-26T16:58:30.238+0100 7fb6154e7d80  0  set rocksdb option compression = kNoCompression
2021-03-26T16:58:30.238+0100 7fb6154e7d80  0  set rocksdb option max_background_compactions = 2
2021-03-26T16:58:30.238+0100 7fb6154e7d80  0  set rocksdb option max_write_buffer_number = 4
2021-03-26T16:58:30.238+0100 7fb6154e7d80  0  set rocksdb option min_write_buffer_number_to_merge = 1
2021-03-26T16:58:30.238+0100 7fb6154e7d80  0  set rocksdb option recycle_log_file_num = 4
2021-03-26T16:58:30.238+0100 7fb6154e7d80  0  set rocksdb option writable_file_max_buffer_size = 0
2021-03-26T16:58:30.238+0100 7fb6154e7d80  0  set rocksdb option write_buffer_size = 268435456
2021-03-26T16:58:30.238+0100 7fb6154e7d80  0  set rocksdb option compaction_readahead_size = 2097152
2021-03-26T16:58:30.238+0100 7fb6154e7d80  0  set rocksdb option compression = kNoCompression
2021-03-26T16:58:30.238+0100 7fb6154e7d80  0  set rocksdb option max_background_compactions = 2
2021-03-26T16:58:30.238+0100 7fb6154e7d80  0  set rocksdb option max_write_buffer_number = 4
2021-03-26T16:58:30.238+0100 7fb6154e7d80  0  set rocksdb option min_write_buffer_number_to_merge = 1
2021-03-26T16:58:30.238+0100 7fb6154e7d80  0  set rocksdb option recycle_log_file_num = 4
2021-03-26T16:58:30.238+0100 7fb6154e7d80  0  set rocksdb option writable_file_max_buffer_size = 0
2021-03-26T16:58:30.238+0100 7fb6154e7d80  0  set rocksdb option write_buffer_size = 268435456
2021-03-26T16:58:42.438+0100 7fb6154e7d80 -1 rocksdb: Corruption: unknown WriteBatch tag
2021-03-26T16:58:42.438+0100 7fb6154e7d80 -1 bluestore(/var/lib/ceph/osd/ceph-80) _open_db erroring opening db: 
2021-03-26T16:58:42.438+0100 7fb6154e7d80  1 bluefs umount
2021-03-26T16:58:42.438+0100 7fb6154e7d80  1 fbmap_alloc 0x557a58888900 shutdown
2021-03-26T16:58:42.462+0100 7fb6154e7d80  1 bdev(0x557a59578380 /var/lib/ceph/osd/ceph-80/block) close
2021-03-26T16:58:42.642+0100 7fb6154e7d80  1 bdev(0x557a59578000 /var/lib/ceph/osd/ceph-80/block) close
2021-03-26T16:58:42.910+0100 7fb6154e7d80 -1 osd.80 0 OSD:init: unable to mount object store
2021-03-26T16:58:42.910+0100 7fb6154e7d80 -1  ** ERROR: osd init failed: (5) Input/output error

And that's it, it now refuses to start.

Full log is attached.

What do?


Files

convert-osd.80.log.xz (321 KB) convert-osd.80.log.xz Jonas Jelten, 03/26/2021 04:31 PM

Related issues 4 (1 open3 closed)

Related to bluestore - Bug #48216: Spanning blobs list might have zombie blobs that aren't of use any moreNew

Actions
Copied to bluestore - Backport #50938: pacific: OSDs broken after nautilus->octopus upgrade: rocksdb Corruption: unknown WriteBatch tagResolvedIgor FedotovActions
Copied to bluestore - Backport #50939: nautilus: OSDs broken after nautilus->octopus upgrade: rocksdb Corruption: unknown WriteBatch tagResolvedIgor FedotovActions
Copied to bluestore - Backport #50940: octopus: OSDs broken after nautilus->octopus upgrade: rocksdb Corruption: unknown WriteBatch tagResolvedCory SnyderActions
Actions

Also available in: Atom PDF