Project

General

Profile

Bug #48855

OSD_SUPERBLOCK Checksum failed after node restart

Added by Stellar Wang 3 months ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

【scene】
After the OSD node is restarted, the OSD power on process reads OSD_SUPERBLOCK,checksum verification failed, resulting in the OSD can not start.

【ceph log】
2021-01-12 13:13:19.501663 7f6086e72d00 0 set uid:gid to 167:167 (ceph:ceph)
2021-01-12 13:13:19.501693 7f6086e72d00 0 ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable), process ceph-osd, pid 3306
2021-01-12 13:13:19.510198 7f6086e72d00 -1 Public network was set, but cluster network was not set
2021-01-12 13:13:19.510216 7f6086e72d00 -1 Using public network also for cluster network
2021-01-12 13:13:19.582479 7f6086e72d00 0 pidfile_write: ignore empty --pid-file
2021-01-12 13:13:19.677972 7f6086e72d00 0 load: jerasure load: lrc load: isa
2021-01-12 13:13:19.678165 7f6086e72d00 1 bdev create path /var/lib/ceph/osd/ceph-4/block type kernel
2021-01-12 13:13:19.678209 7f6086e72d00 1 bdev(0x55947679f200 /var/lib/ceph/osd/ceph-4/block) open path /var/lib/ceph/osd/ceph-4/block
2021-01-12 13:13:19.678568 7f6086e72d00 1 bdev(0x55947679f200 /var/lib/ceph/osd/ceph-4/block) open size 8001016561664 (0x746e1c00000, 7.28TiB) block_size 4096 (4KiB) rotational
2021-01-12 13:13:19.679142 7f6086e72d00 1 bluestore(/var/lib/ceph/osd/ceph-4) _set_cache_sizes cache_size 1073741824 meta 0.4 kv 0.4 data 0.2
2021-01-12 13:13:19.679350 7f6086e72d00 1 bdev(0x55947679f200 /var/lib/ceph/osd/ceph-4/block) close
2021-01-12 13:13:19.711266 7f6086e72d00 1 bluestore(/var/lib/ceph/osd/ceph-4) _mount path /var/lib/ceph/osd/ceph-4
2021-01-12 13:13:19.711606 7f6086e72d00 1 bdev create path /var/lib/ceph/osd/ceph-4/block type kernel
2021-01-12 13:13:19.711613 7f6086e72d00 1 bdev(0x55947679c400 /var/lib/ceph/osd/ceph-4/block) open path /var/lib/ceph/osd/ceph-4/block
2021-01-12 13:13:19.711811 7f6086e72d00 1 bdev(0x55947679c400 /var/lib/ceph/osd/ceph-4/block) open size 8001016561664 (0x746e1c00000, 7.28TiB) block_size 4096 (4KiB) rotational
2021-01-12 13:13:19.712469 7f6086e72d00 1 bluestore(/var/lib/ceph/osd/ceph-4) _set_cache_sizes cache_size 1073741824 meta 0.4 kv 0.4 data 0.2
2021-01-12 13:13:19.712612 7f6086e72d00 1 bdev create path /var/lib/ceph/osd/ceph-4/block type kernel
2021-01-12 13:13:19.712654 7f6086e72d00 1 bdev(0x55947679d200 /var/lib/ceph/osd/ceph-4/block) open path /var/lib/ceph/osd/ceph-4/block
2021-01-12 13:13:19.712842 7f6086e72d00 1 bdev(0x55947679d200 /var/lib/ceph/osd/ceph-4/block) open size 8001016561664 (0x746e1c00000, 7.28TiB) block_size 4096 (4KiB) rotational
.........
.........
2021-01-12 13:13:38.135302 7f6086e72d00 1 bluestore(/var/lib/ceph/osd/ceph-4) _open_alloc loaded 3.14TiB in 1601116 extents
2021-01-12 13:13:38.309556 7f6086e72d00 -1 bluestore(/var/lib/ceph/osd/ceph-4) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x54197411, expected 0x533a0c92, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2021-01-12 13:13:38.309926 7f6086e72d00 -1 bluestore(/var/lib/ceph/osd/ceph-4) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x54197411, expected 0x533a0c92, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2021-01-12 13:13:38.310219 7f6086e72d00 -1 bluestore(/var/lib/ceph/osd/ceph-4) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x54197411, expected 0x533a0c92, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2021-01-12 13:13:38.310488 7f6086e72d00 -1 bluestore(/var/lib/ceph/osd/ceph-4) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x54197411, expected 0x533a0c92, device location [0x10000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2021-01-12 13:13:38.310527 7f6086e72d00 -1 osd.4 0 OSD::init() : unable to read osd superblock
2021-01-12 13:13:38.310534 7f6086e72d00 1 bluestore(/var/lib/ceph/osd/ceph-4) umount
2021-01-12 13:13:38.311015 7f6086e72d00 1 stupidalloc 0x0x55947665d300 shutdown
2021-01-12 13:13:38.335889 7f6086e72d00 1 freelist shutdown
2021-01-12 13:13:38.335951 7f6086e72d00 4 rocksdb: [/work/Product/rpmbuild/BUILD/infinity-3.2.12/src/rocksdb/db/db_impl.cc:217] Shutdown: canceling all background work
2021-01-12 13:13:38.363639 7f6071fbe700 4 rocksdb: (Original Log Time 2021/01/12-13:13:38.358086) [/work/Product/rpmbuild/BUILD/infinity-3.2.12/src/rocksdb/db/compaction_job.cc:621] [default] compacted to: base level 1 max bytes base 268435456 files[4 2 31 102 0 0 0] max score 1.00, MB/sec: 107.5 rd, 65.1 wr, level 2, files in(1, 26) out(16) MB in(29.9, 1605.7) out(990.2), read-write-amplify(87.8) write-amplify(33.1) Shutdown in progress: Database shutdown or Column family drop during compaction, records in: 4228299, records dropped: 2778

2021-01-12 13:13:38.363663 7f6071fbe700 4 rocksdb: (Original Log Time 2021/01/12-13:13:38.363576) EVENT_LOG_v1 {"time_micros": 1610428418358104, "job": 3, "event": "compaction_finished", "compaction_time_micros": 15955187, "output_level": 2, "num_output_files": 16, "total_output_size": 1038275758, "num_input_records": 3090789, "num_output_records": 3088011, "num_subcompactions": 1, "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [4, 2, 31, 102, 0, 0, 0]}
2021-01-12 13:13:38.363895 7f6086e72d00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1610428418363890, "job": 4, "event": "table_file_deletion", "file_number": 343974}
2021-01-12 13:13:38.363911 7f6086e72d00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1610428418363909, "job": 4, "event": "table_file_deletion", "file_number": 343973}
2021-01-12 13:13:38.363922 7f6086e72d00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1610428418363919, "job": 4, "event": "table_file_deletion", "file_number": 343972}
2021-01-12 13:13:38.363935 7f6086e72d00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1610428418363932, "job": 4, "event": "table_file_deletion", "file_number": 343971}
2021-01-12 13:13:38.363945 7f6086e72d00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1610428418363943, "job": 4, "event": "table_file_deletion", "file_number": 343970}
2021-01-12 13:13:38.363958 7f6086e72d00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1610428418363956, "job": 4, "event": "table_file_deletion", "file_number": 343969}
2021-01-12 13:13:38.363983 7f6086e72d00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1610428418363981, "job": 4, "event": "table_file_deletion", "file_number": 343968}
2021-01-12 13:13:38.363995 7f6086e72d00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1610428418363993, "job": 4, "event": "table_file_deletion", "file_number": 343967}
2021-01-12 13:13:38.364014 7f6086e72d00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1610428418364010, "job": 4, "event": "table_file_deletion", "file_number": 343966}
2021-01-12 13:13:38.364029 7f6086e72d00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1610428418364027, "job": 4, "event": "table_file_deletion", "file_number": 343965}
2021-01-12 13:13:38.364045 7f6086e72d00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1610428418364043, "job": 4, "event": "table_file_deletion", "file_number": 343964}
2021-01-12 13:13:38.364061 7f6086e72d00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1610428418364058, "job": 4, "event": "table_file_deletion", "file_number": 343963}
2021-01-12 13:13:38.364077 7f6086e72d00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1610428418364075, "job": 4, "event": "table_file_deletion", "file_number": 343962}
2021-01-12 13:13:38.364092 7f6086e72d00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1610428418364090, "job": 4, "event": "table_file_deletion", "file_number": 343961}
2021-01-12 13:13:38.364104 7f6086e72d00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1610428418364102, "job": 4, "event": "table_file_deletion", "file_number": 343960}
2021-01-12 13:13:38.364128 7f6086e72d00 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1610428418364124, "job": 4, "event": "table_file_deletion", "file_number": 343959}
2021-01-12 13:13:38.365366 7f6086e72d00 4 rocksdb: [/work/Product/rpmbuild/BUILD/infinity-3.2.12/src/rocksdb/db/db_impl.cc:343] Shutdown complete
2021-01-12 13:13:38.382333 7f6086e72d00 1 bluefs umount
2021-01-12 13:13:38.382527 7f6086e72d00 1 stupidalloc 0x0x55947665c3a0 shutdown
2021-01-12 13:13:38.382627 7f6086e72d00 1 bdev(0x55947679d200 /var/lib/ceph/osd/ceph-4/block) close
2021-01-12 13:13:38.651879 7f6086e72d00 1 bdev(0x55947679c400 /var/lib/ceph/osd/ceph-4/block) close
2021-01-12 13:13:38.750232 7f6086e72d00 -1 ERROR: osd init failed: (22) Invalid argument

History

#1 Updated by Stellar Wang 3 months ago

CRC verification failure also occurred in another OSD, the detailed information as follows:
【scene】
CRC verification failed when obtaining OSDMAP information during the power on of OSD.

【ceph log】
2021-01-12 13:10:43.985326 7f853d853700 -1 received signal: Terminated from PID: 1 task name: /usr/lib/systemd/systemd --switched-root --system --deserialize 21 UID: 0
2021-01-12 13:10:43.985349 7f853d853700 -1 osd.5 10280 Got signal Terminated
2021-01-12 13:10:43.985355 7f853d853700 0 osd.5 10280 prepare_to_stop telling mon we are shutting down
2021-01-12 13:10:44.248782 7f855b052700 0 ip:6831/3525 >> ip:6832/3591 conn(0x55f4f5e66000 :6831 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 9 vs existing csq=9 existing_state=STATE_STANDBY
2021-01-12 13:10:44.248967 7f855b052700 0 ip:6831/3525 >> ip:6832/3591 conn(0x55f4f5e66000 :6831 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 10 vs existing csq=9 existing_state=STATE_STANDBY
2021-01-12 13:10:44.920341 7f8551109700 0 osd.5 10281 got_stop_ack starting shutdown
2021-01-12 13:10:44.920374 7f853d853700 0 osd.5 10281 prepare_to_stop starting shutdown
2021-01-12 13:10:44.920381 7f853d853700 -1 osd.5 10281 shutdown
2021-01-12 13:13:19.509199 7fcd85af1d00 0 set uid:gid to 167:167 (ceph:ceph)
2021-01-12 13:13:19.509230 7fcd85af1d00 0 ceph version 12.2.* (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable), process ceph-osd, pid 3269
2021-01-12 13:13:19.515039 7fcd85af1d00 -1 Public network was set, but cluster network was not set
2021-01-12 13:13:19.515052 7fcd85af1d00 -1 Using public network also for cluster network
2021-01-12 13:13:19.528713 7fcd85af1d00 0 pidfile_write: ignore empty --pid-file
2021-01-12 13:13:19.594220 7fcd85af1d00 0 load: jerasure load: lrc load: isa
2021-01-12 13:13:19.594315 7fcd85af1d00 1 bdev create path /var/lib/ceph/osd/ceph-5/block type kernel
2021-01-12 13:13:19.597615 7fcd85af1d00 1 bdev(0x5593a0895200 /var/lib/ceph/osd/ceph-5/block) open path /var/lib/ceph/osd/ceph-5/block
2021-01-12 13:13:19.597923 7fcd85af1d00 1 bdev(0x5593a0895200 /var/lib/ceph/osd/ceph-5/block) open size 8001016561664 (0x746e1c00000, 7.28TiB) block_size 4096 (4KiB) rotational
2021-01-12 13:13:19.599008 7fcd85af1d00 1 bluestore(/var/lib/ceph/osd/ceph-5) _set_cache_sizes cache_size 1073741824 meta 0.4 kv 0.4 data 0.2
2021-01-12 13:13:19.599042 7fcd85af1d00 1 bdev(0x5593a0895200 /var/lib/ceph/osd/ceph-5/block) close
2021-01-12 13:13:19.637833 7fcd85af1d00 1 bluestore(/var/lib/ceph/osd/ceph-5) _mount path /var/lib/ceph/osd/ceph-5
2021-01-12 13:13:19.638302 7fcd85af1d00 1 bdev create path /var/lib/ceph/osd/ceph-5/block type kernel
2021-01-12 13:13:19.638315 7fcd85af1d00 1 bdev(0x5593a0895a00 /var/lib/ceph/osd/ceph-5/block) open path /var/lib/ceph/osd/ceph-5/block
2021-01-12 13:13:19.638543 7fcd85af1d00 1 bdev(0x5593a0895a00 /var/lib/ceph/osd/ceph-5/block) open size 8001016561664 (0x746e1c00000, 7.28TiB) block_size 4096 (4KiB) rotational
2021-01-12 13:13:19.638621 7fcd85af1d00 1 bluestore(/var/lib/ceph/osd/ceph-5) _set_cache_sizes cache_size 1073741824 meta 0.4 kv 0.4 data 0.2
2021-01-12 13:13:19.639303 7fcd85af1d00 1 bdev create path /var/lib/ceph/osd/ceph-5/block type kernel
2021-01-12 13:13:19.639313 7fcd85af1d00 1 bdev(0x5593a0893200 /var/lib/ceph/osd/ceph-5/block) open path /var/lib/ceph/osd/ceph-5/block
2021-01-12 13:13:19.639546 7fcd85af1d00 1 bdev(0x5593a0893200 /var/lib/ceph/osd/ceph-5/block) open size 8001016561664 (0x746e1c00000, 7.28TiB) block_size 4096 (4KiB) rotational
2021-01-12 13:13:19.639559 7fcd85af1d00 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-5/block size 7.28TiB
2021-01-12 13:13:19.640305 7fcd85af1d00 1 bluefs mount

2021-01-12 13:13:35.352799 7fcd85af1d00 0 _get_class not permitted to load kvs
2021-01-12 13:13:35.353258 7fcd85af1d00 0 _get_class not permitted to load lua
2021-01-12 13:13:35.364314 7fcd85af1d00 0 _get_class not permitted to load sdk
2021-01-12 13:13:35.374677 7fcd85af1d00 -1 bluestore(/var/lib/ceph/osd/ceph-5) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xba798fcf, expected 0x56169afb, device location [0x27f76b0000~1000], logical extent 0x0~1000, object #-1:9000ae72:::osdmap.10281:0#
2021-01-12 13:13:35.375057 7fcd85af1d00 -1 bluestore(/var/lib/ceph/osd/ceph-5) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xba798fcf, expected 0x56169afb, device location [0x27f76b0000~1000], logical extent 0x0~1000, object #-1:9000ae72:::osdmap.10281:0#
2021-01-12 13:13:35.375388 7fcd85af1d00 -1 bluestore(/var/lib/ceph/osd/ceph-5) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xba798fcf, expected 0x56169afb, device location [0x27f76b0000~1000], logical extent 0x0~1000, object #-1:9000ae72:::osdmap.10281:0#
2021-01-12 13:13:35.375711 7fcd85af1d00 -1 bluestore(/var/lib/ceph/osd/ceph-5) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xba798fcf, expected 0x56169afb, device location [0x27f76b0000~1000], logical extent 0x0~1000, object #-1:9000ae72:::osdmap.10281:0#
2021-01-12 13:13:35.375729 7fcd85af1d00 -1 osd.5 0 failed to load OSD map for epoch 10281, got 0 bytes
2021-01-12 13:13:35.377684 7fcd85af1d00 -1 ceph/src/osd/OSD.h: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7fcd85af1d00 time 2021-01-12 13:13:35.375743
/src/osd/OSD.h: 988: FAILED assert(ret)

ceph version 12.2.* (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x559397037a50]
2: (OSDService::get_map(unsigned int)+0x3d) [0x559396b0317d]
3: (OSD::init()+0x2072) [0x559396ab3692]
4: (main()+0x2d07) [0x5593969b4dd7]
5: (__libc_start_main()+0xf5) [0x7fcd82155c05]
6: (()+0x4c0ca3) [0x559396a54ca3]

Also available in: Atom PDF