Actions
Bug #48443
openrocksdb: Corruption: missing start of fragmented record(2)
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Hi, Guys!
This happened after a power failure.
It seems that a simple rocksdb corruption, unfortunately, throws away ALL BLUESTORE DATA!
2020-11-23 02:20:16.948 7fc63cc22c80 4 rocksdb: [db/version_set.cc:3757] Recovered from manifest file:db/MANIFEST-000416 succeeded,manifest_file_number is 416, next_file_number is 429, last_sequence is 155326408, log_number is 426,prev_log_number is 0,max_column_family is 0,min_log_number_to_keep is 0 2020-11-23 02:20:16.948 7fc63cc22c80 4 rocksdb: [db/version_set.cc:3766] Column family [default] (ID 0), log number is 426 2020-11-23 02:20:16.948 7fc63cc22c80 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1606108816949386, "job": 1, "event": "recovery_started", "log_files": [424, 426]} 2020-11-23 02:20:16.948 7fc63cc22c80 4 rocksdb: [db/db_impl_open.cc:583] Recovering log #424 mode 0 2020-11-23 02:20:18.144 7fc63cc22c80 4 rocksdb: [db/db_impl_open.cc:583] Recovering log #426 mode 0 2020-11-23 02:20:19.524 7fc63cc22c80 3 rocksdb: [db/db_impl_open.cc:518] db.wal/000426.log: dropping 380 bytes; Corruption: missing start of fragmented record(2) 2020-11-23 02:20:19.524 7fc63cc22c80 4 rocksdb: [db/db_impl.cc:390] Shutdown: canceling all background work 2020-11-23 02:20:19.524 7fc63cc22c80 4 rocksdb: [db/db_impl.cc:563] Shutdown complete 2020-11-23 02:20:19.524 7fc63cc22c80 -1 rocksdb: Corruption: missing start of fragmented record(2) 2020-11-23 02:20:19.524 7fc63cc22c80 -1 bluestore(/var/lib/ceph/osd/ceph-6) _open_db erroring opening db: 2020-11-23 02:20:19.524 7fc63cc22c80 1 bluefs umount 2020-11-23 02:20:19.524 7fc63cc22c80 1 fbmap_alloc 0x563dafe12a00 shutdown 2020-11-23 02:20:19.524 7fc63cc22c80 1 bdev(0x563db0ab0e00 /var/lib/ceph/osd/ceph-6/block) close 2020-11-23 02:20:19.704 7fc63cc22c80 1 bluestore(/var/lib/ceph/osd/ceph-6) _upgrade_super from 0, latest 2 2020-11-23 02:20:19.704 7fc63cc22c80 -1 /build/ceph-Ad9zhX/ceph-14.2.9/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_upgrade_super()' thread 7fc63cc22c80 time 2020-11-23 02:20:19.705208 /build/ceph-Ad9zhX/ceph-14.2.9/src/os/bluestore/BlueStore.cc: 10197: FAILED ceph_assert(ondisk_format > 0) ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x563da556d3f4] 2: (()+0x5085cc) [0x563da556d5cc] 3: (BlueStore::_upgrade_super()+0x4b6) [0x563da5aa95e6] 4: (BlueStore::_mount(bool, bool)+0x5bc) [0x563da5af217c] 5: (OSD::init()+0x31f) [0x563da56753cf] 6: (main()+0x37b2) [0x563da55d83a2] 7: (__libc_start_main()+0xeb) [0x7fc63d19b09b] 8: (_start()+0x2a) [0x563da5609d8a]
I couldn't find any way to recover this osd:
#ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-6/ 2020-11-26 19:50:51.034 7f4a7ec41e00 -1 rocksdb: Corruption: missing start of fragmented record(2) 2020-11-26 19:50:51.034 7f4a7ec41e00 -1 bluestore(/var/lib/ceph/osd/ceph-6) _open_db erroring opening db: error from fsck: (5) Input/output error
Or even access its data:
#ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-6/ --op info --pgid 7.d /build/ceph-Ad9zhX/ceph-14.2.9/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_upgrade_super()' thread 7f2020d5d740 time 2020-11-26 19:52:31.139690 /build/ceph-Ad9zhX/ceph-14.2.9/src/os/bluestore/BlueStore.cc: 10197: FAILED ceph_assert(ondisk_format > 0) ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f2021ff6db2] 2: (()+0x276f8a) [0x7f2021ff6f8a] 3: (BlueStore::_upgrade_super()+0x4b6) [0x55c492a8ec66] 4: (BlueStore::_mount(bool, bool)+0x5bc) [0x55c492ad77fc] 5: (main()+0x2cb1) [0x55c492581e41] 6: (__libc_start_main()+0xeb) [0x7f20215a509b] 7: (_start()+0x2a) [0x55c4925ae63a] *** Caught signal (Aborted) ** in thread 7f2020d5d740 thread_name:ceph-objectstor ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable) 1: (()+0x12730) [0x7f2021754730] 2: (gsignal()+0x10b) [0x7f20215b87bb] 3: (abort()+0x121) [0x7f20215a3535] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x7f2021ff6e03] 5: (()+0x276f8a) [0x7f2021ff6f8a] 6: (BlueStore::_upgrade_super()+0x4b6) [0x55c492a8ec66] 7: (BlueStore::_mount(bool, bool)+0x5bc) [0x55c492ad77fc] 8: (main()+0x2cb1) [0x55c492581e41] 9: (__libc_start_main()+0xeb) [0x7f20215a509b] 10: (_start()+0x2a) [0x55c4925ae63a] Aborted
Or export:
#ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-6/ --op export --pgid 7.d --file /destino/CEPH/ceph-6/export/pg-7.d /build/ceph-Ad9zhX/ceph-14.2.9/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_upgrade_super()' thread 7f629647c740 time 2020-11-26 19:55:43.443660 /build/ceph-Ad9zhX/ceph-14.2.9/src/os/bluestore/BlueStore.cc: 10197: FAILED ceph_assert(ondisk_format > 0) ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x7f6297715db2] 2: (()+0x276f8a) [0x7f6297715f8a] 3: (BlueStore::_upgrade_super()+0x4b6) [0x5564cf83fc66] 4: (BlueStore::_mount(bool, bool)+0x5bc) [0x5564cf8887fc] 5: (main()+0x2cb1) [0x5564cf332e41] 6: (__libc_start_main()+0xeb) [0x7f6296cc409b] 7: (_start()+0x2a) [0x5564cf35f63a] *** Caught signal (Aborted) ** in thread 7f629647c740 thread_name:ceph-objectstor ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable) 1: (()+0x12730) [0x7f6296e73730] 2: (gsignal()+0x10b) [0x7f6296cd77bb] 3: (abort()+0x121) [0x7f6296cc2535] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x7f6297715e03] 5: (()+0x276f8a) [0x7f6297715f8a] 6: (BlueStore::_upgrade_super()+0x4b6) [0x5564cf83fc66] 7: (BlueStore::_mount(bool, bool)+0x5bc) [0x5564cf8887fc] 8: (main()+0x2cb1) [0x5564cf332e41] 9: (__libc_start_main()+0xeb) [0x7f6296cc409b] 10: (_start()+0x2a) [0x5564cf35f63a] Aborted
Below the hardware:
./inxi -v5 System: Host: ceph-node03 Kernel: 4.19.0-9-amd64 x86_64 bits: 64 compiler: gcc v: 8.3.0 Console: tty 1 Distro: Debian GNU/Linux 10 (buster) Machine: Type: Desktop Mobo: N/A model: Intel X79 serial: N/A BIOS: American Megatrends v: 4.6.5 date: 07/17/2019 Memory: RAM: total: 31.34 GiB used: 2.28 GiB (7.3%) Array-1: capacity: 96 GiB slots: 4 EC: Multi-bit ECC max module size: 24 GiB note: est. Device-1: Node0_Dimm0 size: 8 GiB speed: 1333 MT/s type: DDR3 Device-2: Node0_Dimm1 size: 8 GiB speed: 1333 MT/s type: DDR3 Device-3: Node0_Dimm2 size: 8 GiB speed: 1333 MT/s type: DDR3 Device-4: Node0_Dimm3 size: 8 GiB speed: 1333 MT/s type: DDR3 CPU: Info: 6-Core model: Intel Xeon E5-2640 0 bits: 64 type: MT MCP arch: Sandy Bridge rev: 7 L2 cache: 15.0 MiB flags: avx lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 59861 Speed: 1202 MHz min/max: 1200/3000 MHz Core speeds (MHz): 1: 1202 2: 1197 3: 1212 4: 1289 5: 1197 6: 1197 7: 1201 8: 1197 9: 1262 10: 1262 11: 1232 12: 1348 Graphics: Device-1: NVIDIA G72 [GeForce 7200 GS / 7300 SE] vendor: ZOTAC driver: nouveau v: kernel bus ID: 02:00.0 Display: server: No display server data found. Headless machine? resolution: <missing: xdpyinfo> Message: Unable to show advanced data. Required tool glxinfo missing. Audio: Device-1: Intel 7 Series/C216 Family High Definition Audio driver: snd_hda_intel v: kernel bus ID: 00:1b.0 Sound Server: ALSA v: k4.19.0-9-amd64 Network: Device-1: Intel 82599ES 10-Gigabit SFI/SFP+ Network driver: ixgbe v: 5.1.0-k port: e000 bus ID: 03:00.0 IF: enp3s0 state: up speed: 10000 Mbps duplex: full mac: 00:1b:21:bb:e8:ca Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet driver: r8169 v: kernel port: d000 bus ID: 07:00.0 IF: enp7s0 state: down mac: 00:e0:4c:01:83:74 IF-ID-1: bond0 state: up speed: 10000 Mbps duplex: full mac: 00:1b:21:bb:e8:ca IF-ID-2: bonding_masters state: N/A speed: N/A duplex: N/A mac: N/A Drives: Local Storage: total: 1.93 TiB used: 3.79 GiB (0.2%) ID-1: /dev/sda vendor: A-Data model: SU650 size: 111.79 GiB temp: 40 C ID-2: /dev/sdb model: SSD 1TB size: 931.51 GiB temp: 40 C ID-3: /dev/sdc model: SSD 1TB size: 931.51 GiB temp: 40 C Message: No Optical or Floppy data was found. RAID: Device-1: md2 type: mdraid status: active Components: online: sda2~c1 Info: raid: mirror blocks: 20971456 report: 2/1 _U chunk size: N/A Partition: ID-1: / size: 19.56 GiB used: 3.79 GiB (19.4%) fs: ext4 dev: /dev/md2 label: ROOT uuid: 2ec99cbc-75ff-4c20-a126-b61bfdb03716 Swap: ID-1: swap-1 type: partition size: 8.00 GiB used: 0 KiB (0.0%) dev: /dev/sda1 label: SWAP_A uuid: 49266ce6-71eb-4a07-8799-f339cdca4e33 Sensors: System Temperatures: cpu: 38.0 C mobo: N/A gpu: nouveau temp: 53.0 C Fan Speeds (RPM): N/A Info: Processes: 174 Uptime: 3d 18h 42m Init: systemd runlevel: 5 Compilers: gcc: N/A Packages: 653 Shell: Bash v: 5.0.3 inxi: 3.1.09
Full complete log:
ceph-bluestore-tool fsck --path <path> --debug-bluestore=20 --log-file=c --no-log-to-stderr
Is avaliable here (~1.3GB uncompressed, ~81MB compressaed):
https://drive.google.com/file/d/1-KvWeyWJaX_9lRYne26lg7Cc03VBMBif/view?usp=sharing
Files
No data to display
Actions