Project

General

Profile

Actions

Bug #48443

open

rocksdb: Corruption: missing start of fragmented record(2)

Added by Gabriel Goes over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Hi, Guys!

This happened after a power failure.

It seems that a simple rocksdb corruption, unfortunately, throws away ALL BLUESTORE DATA!

2020-11-23 02:20:16.948 7fc63cc22c80  4 rocksdb: [db/version_set.cc:3757] Recovered from manifest file:db/MANIFEST-000416 succeeded,manifest_file_number is 416, next_file_number is 429, last_sequence is 155326408, log_number is 426,prev_log_number is 0,max_column_family is 0,min_log_number_to_keep is 0

2020-11-23 02:20:16.948 7fc63cc22c80  4 rocksdb: [db/version_set.cc:3766] Column family [default] (ID 0), log number is 426

2020-11-23 02:20:16.948 7fc63cc22c80  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1606108816949386, "job": 1, "event": "recovery_started", "log_files": [424, 426]}
2020-11-23 02:20:16.948 7fc63cc22c80  4 rocksdb: [db/db_impl_open.cc:583] Recovering log #424 mode 0
2020-11-23 02:20:18.144 7fc63cc22c80  4 rocksdb: [db/db_impl_open.cc:583] Recovering log #426 mode 0
2020-11-23 02:20:19.524 7fc63cc22c80  3 rocksdb: [db/db_impl_open.cc:518] db.wal/000426.log: dropping 380 bytes; Corruption: missing start of fragmented record(2)
2020-11-23 02:20:19.524 7fc63cc22c80  4 rocksdb: [db/db_impl.cc:390] Shutdown: canceling all background work
2020-11-23 02:20:19.524 7fc63cc22c80  4 rocksdb: [db/db_impl.cc:563] Shutdown complete
2020-11-23 02:20:19.524 7fc63cc22c80 -1 rocksdb: Corruption: missing start of fragmented record(2)
2020-11-23 02:20:19.524 7fc63cc22c80 -1 bluestore(/var/lib/ceph/osd/ceph-6) _open_db erroring opening db:
2020-11-23 02:20:19.524 7fc63cc22c80  1 bluefs umount
2020-11-23 02:20:19.524 7fc63cc22c80  1 fbmap_alloc 0x563dafe12a00 shutdown
2020-11-23 02:20:19.524 7fc63cc22c80  1 bdev(0x563db0ab0e00 /var/lib/ceph/osd/ceph-6/block) close
2020-11-23 02:20:19.704 7fc63cc22c80  1 bluestore(/var/lib/ceph/osd/ceph-6) _upgrade_super from 0, latest 2
2020-11-23 02:20:19.704 7fc63cc22c80 -1 /build/ceph-Ad9zhX/ceph-14.2.9/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_upgrade_super()' thread 7fc63cc22c80 time 2020-11-23 02:20:19.705208
/build/ceph-Ad9zhX/ceph-14.2.9/src/os/bluestore/BlueStore.cc: 10197: FAILED ceph_assert(ondisk_format > 0)

 ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x563da556d3f4]
 2: (()+0x5085cc) [0x563da556d5cc]
 3: (BlueStore::_upgrade_super()+0x4b6) [0x563da5aa95e6]
 4: (BlueStore::_mount(bool, bool)+0x5bc) [0x563da5af217c]
 5: (OSD::init()+0x31f) [0x563da56753cf]
 6: (main()+0x37b2) [0x563da55d83a2]
 7: (__libc_start_main()+0xeb) [0x7fc63d19b09b]
 8: (_start()+0x2a) [0x563da5609d8a]

I couldn't find any way to recover this osd:

#ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-6/
2020-11-26 19:50:51.034 7f4a7ec41e00 -1 rocksdb: Corruption: missing start of fragmented record(2)
2020-11-26 19:50:51.034 7f4a7ec41e00 -1 bluestore(/var/lib/ceph/osd/ceph-6) _open_db erroring opening db:
error from fsck: (5) Input/output error

Or even access its data:


#ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-6/ --op info --pgid 7.d
/build/ceph-Ad9zhX/ceph-14.2.9/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_upgrade_super()' thread 7f2020d5d740 time 2020-11-26 19:52:31.139690
/build/ceph-Ad9zhX/ceph-14.2.9/src/os/bluestore/BlueStore.cc: 10197: FAILED ceph_assert(ondisk_format > 0)
 ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f2021ff6db2]
 2: (()+0x276f8a) [0x7f2021ff6f8a]
 3: (BlueStore::_upgrade_super()+0x4b6) [0x55c492a8ec66]
 4: (BlueStore::_mount(bool, bool)+0x5bc) [0x55c492ad77fc]
 5: (main()+0x2cb1) [0x55c492581e41]
 6: (__libc_start_main()+0xeb) [0x7f20215a509b]
 7: (_start()+0x2a) [0x55c4925ae63a]
*** Caught signal (Aborted) **
 in thread 7f2020d5d740 thread_name:ceph-objectstor
 ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)
 1: (()+0x12730) [0x7f2021754730]
 2: (gsignal()+0x10b) [0x7f20215b87bb]
 3: (abort()+0x121) [0x7f20215a3535]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x7f2021ff6e03]
 5: (()+0x276f8a) [0x7f2021ff6f8a]
 6: (BlueStore::_upgrade_super()+0x4b6) [0x55c492a8ec66]
 7: (BlueStore::_mount(bool, bool)+0x5bc) [0x55c492ad77fc]
 8: (main()+0x2cb1) [0x55c492581e41]
 9: (__libc_start_main()+0xeb) [0x7f20215a509b]
 10: (_start()+0x2a) [0x55c4925ae63a]
Aborted

Or export:


#ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-6/ --op export --pgid 7.d --file /destino/CEPH/ceph-6/export/pg-7.d
/build/ceph-Ad9zhX/ceph-14.2.9/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_upgrade_super()' thread 7f629647c740 time 2020-11-26 19:55:43.443660
/build/ceph-Ad9zhX/ceph-14.2.9/src/os/bluestore/BlueStore.cc: 10197: FAILED ceph_assert(ondisk_format > 0)
 ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f6297715db2]
 2: (()+0x276f8a) [0x7f6297715f8a]
 3: (BlueStore::_upgrade_super()+0x4b6) [0x5564cf83fc66]
 4: (BlueStore::_mount(bool, bool)+0x5bc) [0x5564cf8887fc]
 5: (main()+0x2cb1) [0x5564cf332e41]
 6: (__libc_start_main()+0xeb) [0x7f6296cc409b]
 7: (_start()+0x2a) [0x5564cf35f63a]
*** Caught signal (Aborted) **
 in thread 7f629647c740 thread_name:ceph-objectstor
 ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)
 1: (()+0x12730) [0x7f6296e73730]
 2: (gsignal()+0x10b) [0x7f6296cd77bb]
 3: (abort()+0x121) [0x7f6296cc2535]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x7f6297715e03]
 5: (()+0x276f8a) [0x7f6297715f8a]
 6: (BlueStore::_upgrade_super()+0x4b6) [0x5564cf83fc66]
 7: (BlueStore::_mount(bool, bool)+0x5bc) [0x5564cf8887fc]
 8: (main()+0x2cb1) [0x5564cf332e41]
 9: (__libc_start_main()+0xeb) [0x7f6296cc409b]
 10: (_start()+0x2a) [0x5564cf35f63a]
Aborted

Below the hardware:

./inxi -v5
System:    Host: ceph-node03 Kernel: 4.19.0-9-amd64 x86_64 bits: 64 compiler: gcc v: 8.3.0 Console: tty 1
           Distro: Debian GNU/Linux 10 (buster)
Machine:   Type: Desktop Mobo: N/A model: Intel X79 serial: N/A BIOS: American Megatrends v: 4.6.5 date: 07/17/2019
Memory:    RAM: total: 31.34 GiB used: 2.28 GiB (7.3%)
           Array-1: capacity: 96 GiB slots: 4 EC: Multi-bit ECC max module size: 24 GiB note: est.
           Device-1: Node0_Dimm0 size: 8 GiB speed: 1333 MT/s type: DDR3
           Device-2: Node0_Dimm1 size: 8 GiB speed: 1333 MT/s type: DDR3
           Device-3: Node0_Dimm2 size: 8 GiB speed: 1333 MT/s type: DDR3
           Device-4: Node0_Dimm3 size: 8 GiB speed: 1333 MT/s type: DDR3
CPU:       Info: 6-Core model: Intel Xeon E5-2640 0 bits: 64 type: MT MCP arch: Sandy Bridge rev: 7 L2 cache: 15.0 MiB
           flags: avx lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 59861
           Speed: 1202 MHz min/max: 1200/3000 MHz Core speeds (MHz): 1: 1202 2: 1197 3: 1212 4: 1289 5: 1197 6: 1197 7: 1201
           8: 1197 9: 1262 10: 1262 11: 1232 12: 1348
Graphics:  Device-1: NVIDIA G72 [GeForce 7200 GS / 7300 SE] vendor: ZOTAC driver: nouveau v: kernel bus ID: 02:00.0
           Display: server: No display server data found. Headless machine? resolution: <missing: xdpyinfo>
           Message: Unable to show advanced data. Required tool glxinfo missing.
Audio:     Device-1: Intel 7 Series/C216 Family High Definition Audio driver: snd_hda_intel v: kernel bus ID: 00:1b.0
           Sound Server: ALSA v: k4.19.0-9-amd64
Network:   Device-1: Intel 82599ES 10-Gigabit SFI/SFP+ Network driver: ixgbe v: 5.1.0-k port: e000 bus ID: 03:00.0
           IF: enp3s0 state: up speed: 10000 Mbps duplex: full mac: 00:1b:21:bb:e8:ca
           Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet driver: r8169 v: kernel port: d000 bus ID: 07:00.0
           IF: enp7s0 state: down mac: 00:e0:4c:01:83:74
           IF-ID-1: bond0 state: up speed: 10000 Mbps duplex: full mac: 00:1b:21:bb:e8:ca
           IF-ID-2: bonding_masters state: N/A speed: N/A duplex: N/A mac: N/A
Drives:    Local Storage: total: 1.93 TiB used: 3.79 GiB (0.2%)
           ID-1: /dev/sda vendor: A-Data model: SU650 size: 111.79 GiB temp: 40 C
           ID-2: /dev/sdb model: SSD 1TB size: 931.51 GiB temp: 40 C
           ID-3: /dev/sdc model: SSD 1TB size: 931.51 GiB temp: 40 C
           Message: No Optical or Floppy data was found.
RAID:      Device-1: md2 type: mdraid status: active Components: online: sda2~c1
           Info: raid: mirror blocks: 20971456 report: 2/1 _U chunk size: N/A
Partition: ID-1: / size: 19.56 GiB used: 3.79 GiB (19.4%) fs: ext4 dev: /dev/md2 label: ROOT
           uuid: 2ec99cbc-75ff-4c20-a126-b61bfdb03716
Swap:      ID-1: swap-1 type: partition size: 8.00 GiB used: 0 KiB (0.0%) dev: /dev/sda1 label: SWAP_A
           uuid: 49266ce6-71eb-4a07-8799-f339cdca4e33
Sensors:   System Temperatures: cpu: 38.0 C mobo: N/A gpu: nouveau temp: 53.0 C
           Fan Speeds (RPM): N/A
Info:      Processes: 174 Uptime: 3d 18h 42m Init: systemd runlevel: 5 Compilers: gcc: N/A Packages: 653 Shell: Bash v: 5.0.3
           inxi: 3.1.09

Full complete log:

ceph-bluestore-tool fsck --path <path> --debug-bluestore=20 --log-file=c --no-log-to-stderr

Is avaliable here (~1.3GB uncompressed, ~81MB compressaed):
https://drive.google.com/file/d/1-KvWeyWJaX_9lRYne26lg7Cc03VBMBif/view?usp=sharing


Files

ceph-osd.6.log.1.gz (31.7 KB) ceph-osd.6.log.1.gz Entire normal log Gabriel Goes, 12/03/2020 12:47 AM

No data to display

Actions

Also available in: Atom PDF