Project

General

Profile

Actions

Bug #57537

closed

unable to read osd superblock on AArch64 with page size 64K

Added by Rixin Luo over 1 year ago. Updated 8 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Tags:
backport_processed
Backport:
quincy, pacific
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

On aarch64 with page size 64k, it occurs occasionally "OSD::init(): unable to read osd superblock" when deploying osd
the log write superlock is:

2022-09-14 15:23:31.745 fffe96070040 30 bluestore.OnodeSpace(0xaaae04c5b170 in 0xaaae04cdd7a0) add #-1:7b3f43c4:::osd_superblock:0# 0xaaae04c7c600
2022-09-14 15:23:31.745 fffe96070040 15 bluestore(/var/lib/ceph/osd/ceph-0/) _write meta #-1:7b3f43c4:::osd_superblock:0# 0x0~1ff
2022-09-14 15:23:31.745 fffe96070040 20 bluestore(/var/lib/ceph/osd/ceph-0/) _assign_nid 1
2022-09-14 15:23:31.745 fffe96070040 20 bluestore(/var/lib/ceph/osd/ceph-0/) _do_write #-1:7b3f43c4:::osd_superblock:0# 0x0~1ff - have 0x0 (0) bytes fadvise_flags 0x0
2022-09-14 15:23:31.745 fffe96070040 30 _dump_onode 0xaaae04c7c600 #-1:7b3f43c4:::osd_superblock:0# nid 1 size 0x0 (0) expected_object_size 0 expected_write_size 0 in 0 shards, 0 spanning blobs
2022-09-14 15:23:31.745 fffe96070040 20 bluestore(/var/lib/ceph/osd/ceph-0/) _choose_write_options prefer csum_order 12 target_blob_size 0x10000 compress=0 buffered=0
2022-09-14 15:23:31.745 fffe96070040 30 bluestore.extentmap(0xaaae04c7c750) fault_range 0x0~1ff
2022-09-14 15:23:31.745 fffe96070040 10 bluestore(/var/lib/ceph/osd/ceph-0/) _do_write_small 0x0~1ff
2022-09-14 15:23:31.745 fffe96070040 30 bluestore.extentmap(0xaaae04c7c750) fault_range 0x0~10000
2022-09-14 15:23:31.745 fffe96070040 30 bluestore(/var/lib/ceph/osd/ceph-0/) _pad_zeros 0x0~1ff chunk_size 0x1000
2022-09-14 15:23:31.745 fffe96070040 40 bluestore(/var/lib/ceph/osd/ceph-0/) before:
2022-09-14 15:23:31.745 fffe96070040 20 bluestore(/var/lib/ceph/osd/ceph-0/) _pad_zeros pad 0x0 + 0xe01 on front/back, now 0x0~1000
2022-09-14 15:23:31.745 fffe96070040 40 bluestore(/var/lib/ceph/osd/ceph-0/) after:
2022-09-14 15:23:31.745 fffe96070040 20 bluestore(/var/lib/ceph/osd/ceph-0/) _do_alloc_write txc 0xaaae059e8e00 1 blobs
2022-09-14 15:23:31.745 fffe96070040 10 fbmap_alloc 0xaaae04c15900 allocate 0x1000/1000,1000,0
2022-09-14 15:23:31.745 fffe96070040 10 fbmap_alloc 0xaaae04c15900 allocate extent: 0x2000~1000/1000,1000,0
2022-09-14 15:23:31.745 fffe96070040 20 bluestore(/var/lib/ceph/osd/ceph-0/) _do_alloc_write prealloc [0x2000~1000]

the log read superblock is:
2022-09-14 15:23:38.165 fffef8560040 20 bluestore(/var/lib/ceph/osd/ceph-0) _do_read 0x0~1ff size 0x1ff (511)
2022-09-14 15:23:38.165 fffef8560040 30 bluestore.extentmap(0xaaade0131050) fault_range 0x0~1ff
2022-09-14 15:23:38.165 fffef8560040 30 _dump_onode 0xaaade0130f00 #-1:7b3f43c4:::osd_superblock:0# nid 1 size 0x1ff (511) expected_object_size 0 expected_write_size 0 in 0 shards, 0 spanning blobs
2022-09-14 15:23:38.165 fffef8560040 30 _dump_extent_map  0x0~1ff: 0x0~1ff Blob(0xaaaddf517180 blob([0x2000~1000] csum crc32c/0x1000) use_tracker(0x1000 0x1ff) SharedBlob(0xaaaddf5172d0 sbid 0x0))
2022-09-14 15:23:38.165 fffef8560040 30 _dump_extent_map      csum: [cc119710]
2022-09-14 15:23:38.165 fffef8560040 20 bluestore(/var/lib/ceph/osd/ceph-0) _do_read  blob Blob(0xaaaddf517180 blob([0x2000~1000] csum crc32c/0x1000) use_tracker(0x1000 0x1ff) SharedBlob(0xaaaddf5172d0 sbid 0x0)) need 0x0~1ff cache has 0x[]
2022-09-14 15:23:38.165 fffef8560040 30 bluestore(/var/lib/ceph/osd/ceph-0) _do_read    will read 0x0: 0x0~1ff
2022-09-14 15:23:38.165 fffef8560040 20 bluestore(/var/lib/ceph/osd/ceph-0) _do_read  blob Blob(0xaaaddf517180 blob([0x2000~1000] csum crc32c/0x1000) use_tracker(0x1000 0x1ff) SharedBlob(0xaaaddf5172d0 sbid 0x0)) need {<0x0, 0x1000> : [0x0:0~1ff]}
2022-09-14 15:23:38.165 fffef8560040 20 bluestore(/var/lib/ceph/osd/ceph-0) _do_read    region 0x0: 0x0 reading 0x0~1000
2022-09-14 15:23:38.165 fffef8560040  5 bdev(0xaaaddf43f200 /var/lib/ceph/osd/ceph-0/block) read 0x2000~1000 (direct)
2022-09-14 15:23:38.165 fffef8560040 20 bdev(0xaaaddf43f200 /var/lib/ceph/osd/ceph-0/block) _aio_log_start 0x2000~1000
2022-09-14 15:23:38.165 fffef8560040 20 bdev(0xaaaddf43f200 /var/lib/ceph/osd/ceph-0/block) _aio_log_finish 1 0x2000~1000
2022-09-14 15:23:38.165 fffef8560040 20 bluestore(/var/lib/ceph/osd/ceph-0) _do_read  blob Blob(0xaaaddf517180 blob([0x2000~1000] csum crc32c/0x1000) use_tracker(0x1000 0x1ff) SharedBlob(0xaaaddf5172d0 sbid 0x0)) need 0x{<0x0, 0x1000> : [0x0:0~1ff]}
2022-09-14 15:23:38.165 fffef8560040 -1 bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x6706be76, expected 0xcc119710, device location [0x2000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2022-09-14 15:23:38.165 fffef8560040 10 bluestore(/var/lib/ceph/osd/ceph-0) read meta #-1:7b3f43c4:::osd_superblock:0# 0x0~1ff = -5
2022-09-14 15:23:38.165 fffef8560040 -1 osd.0 0 OSD::init() : unable to read osd superblock


Related issues 2 (0 open2 closed)

Copied to bluestore - Backport #57687: pacific: unable to read osd superblock on AArch64 with page size 64KResolvedIgor FedotovActions
Copied to bluestore - Backport #57688: quincy: unable to read osd superblock on AArch64 with page size 64KResolvedIgor FedotovActions
Actions #1

Updated by Igor Fedotov over 1 year ago

Hi @luo - could you please answer the following questions:
- what Ceph release are you using?
- Is this a containerized or bare metal deployment?
- IIRC the indicated checksum (0x6706be76) matches all-zero block so I'm curious what would be the content of the 4K block at offset 0x2000 after that unsuccessful initialization. Could you please try to read it with dd tool and check the content. Is is really all-zeros? And is this reported checksums the same for every failure case?

Actions #2

Updated by Igor Fedotov over 1 year ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 48092
Actions #3

Updated by Igor Fedotov over 1 year ago

Well, I can see your PR hence the root cause is apparently more or less clear. So no much sense in the above questions...

Actions #4

Updated by Igor Fedotov over 1 year ago

  • Backport set to quincy, pacific
Actions #5

Updated by Igor Fedotov over 1 year ago

  • Status changed from Fix Under Review to Pending Backport
Actions #6

Updated by Backport Bot over 1 year ago

  • Copied to Backport #57687: pacific: unable to read osd superblock on AArch64 with page size 64K added
Actions #7

Updated by Backport Bot over 1 year ago

  • Copied to Backport #57688: quincy: unable to read osd superblock on AArch64 with page size 64K added
Actions #8

Updated by Backport Bot over 1 year ago

  • Tags set to backport_processed
Actions #9

Updated by Konstantin Shalygin over 1 year ago

  • Assignee set to Rixin Luo
Actions #10

Updated by Igor Fedotov 8 months ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF