Bug #52138
closedos/bluestore/BlueStore.cc: FAILED ceph_assert(lcl_extnt_map[offset] == length)
0%
Description
2021-08-04T11:05:33.959 INFO:teuthology.orchestra.run.smithi115.stderr:2021-08-04T11:05:33.966+0000 7fd3d5145ec0 -1 bluestore::NCB::restore_allocator::No Valid allocation info on disk (empty file) 2021-08-04T11:05:33.971 INFO:teuthology.orchestra.run.smithi115.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-6641-g626e0d0d/rpm/el8/BUILD/ceph-17.0.0-6641-g626e0d0d/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::read_allocation_from_single_onode(Allocator*, BlueStore::OnodeRef&, BlueStore::read_alloc_stats_t&)' thread 7fd3d5145ec0 time 2021-08-04T11:05:33.980161+0000 2021-08-04T11:05:33.972 INFO:teuthology.orchestra.run.smithi115.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-6641-g626e0d0d/rpm/el8/BUILD/ceph-17.0.0-6641-g626e0d0d/src/os/bluestore/BlueStore.cc: 17446: FAILED ceph_assert(lcl_extnt_map[offset] == length) 2021-08-04T11:05:33.976 INFO:teuthology.orchestra.run.smithi115.stderr: ceph version 17.0.0-6641-g626e0d0d (626e0d0d5c988bd8a05cd7d0a41c0b5e1a20d68b) quincy (dev) 2021-08-04T11:05:33.977 INFO:teuthology.orchestra.run.smithi115.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x139) [0x7fd3d3159b52] 2021-08-04T11:05:33.977 INFO:teuthology.orchestra.run.smithi115.stderr: 2: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x7fd3d3159da2] 2021-08-04T11:05:33.977 INFO:teuthology.orchestra.run.smithi115.stderr: 3: (BlueStore::read_allocation_from_single_onode(Allocator*, boost::intrusive_ptr<BlueStore::Onode>&, BlueStore::read_alloc_stats_t&)+0x1ba) [0x55a283b31596] 2021-08-04T11:05:33.977 INFO:teuthology.orchestra.run.smithi115.stderr: 4: (BlueStore::read_allocation_from_onodes(Allocator*, BlueStore::read_alloc_stats_t&)+0xdd6) [0x55a283b8e32e] 2021-08-04T11:05:33.978 INFO:teuthology.orchestra.run.smithi115.stderr: 5: (BlueStore::reconstruct_allocations(Allocator*, BlueStore::read_alloc_stats_t&)+0x6e9) [0x55a283b8f98d] 2021-08-04T11:05:33.978 INFO:teuthology.orchestra.run.smithi115.stderr: 6: (BlueStore::read_allocation_from_drive_on_startup()+0x1cb) [0x55a283b8fcc7] 2021-08-04T11:05:33.978 INFO:teuthology.orchestra.run.smithi115.stderr: 7: (BlueStore::_init_alloc()+0xb93) [0x55a283b90fcb] 2021-08-04T11:05:33.979 INFO:teuthology.orchestra.run.smithi115.stderr: 8: (BlueStore::_open_db_and_around(bool, bool)+0x615) [0x55a283bd78eb] 2021-08-04T11:05:33.979 INFO:teuthology.orchestra.run.smithi115.stderr: 9: (BlueStore::_mount()+0x7cf) [0x55a283bda6af] 2021-08-04T11:05:33.979 INFO:teuthology.orchestra.run.smithi115.stderr: 10: (BlueStore::mount()+0xd) [0x55a283c13899] 2021-08-04T11:05:33.979 INFO:teuthology.orchestra.run.smithi115.stderr: 11: main() 2021-08-04T11:05:33.980 INFO:teuthology.orchestra.run.smithi115.stderr: 12: __libc_start_main() 2021-08-04T11:05:33.980 INFO:teuthology.orchestra.run.smithi115.stderr: 13: _start()
/a/benhanokh-2021-08-04_06:12:22-rados-wip_gbenhano_ncbz-distro-basic-smithi/6310694/
Has been exposed by https://github.com/ceph/ceph/pull/39871
Updated by Kefu Chai over 2 years ago
/a/kchai-2021-08-17_04:49:07-rados-wip-kefu-testing-2021-08-17-0902-distro-basic-smithi/6343511
Updated by Igor Fedotov over 2 years ago
- Status changed from New to Triaged
Looks like a bug in that new NCB stuff. I managed to repro the issue and here is the relevant onode dump (see offset 0x67b000).
-57> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_onode 0x55badaf7aa00 #555:05000000:::OBJ_1002:33c0e35e#1 nid 7636 size 0x30400 (197632) expected_object_size 0 expected_write_size 0 in 0 shards, 0 spanning blobs
-56> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_extent_map 0x6200~e00: 0x200~e00 Blob(0x55bada1e55e0 blob([0x678000~1000] csum+shared crc32c/0x1000) use_tracker(0x1000 0xe00) SharedBlob(0x55bada1e57a0 sbid 0x44e2))
-55> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_extent_map csum: [ea75a93b]
-54> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_extent_map 0x7000~2000: 0x0~2000 Blob(0x55bada1e5490 blob([0x679000~2000] csum+shared crc32c/0x1000) use_tracker(0x2*0x1000 0x[1000,1000]) SharedBlob(0x55bada1e5650 sbid 0x44e3)
)
-53> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_extent_map csum: [21d7c34,a37a85b6]
-52> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_extent_map 0x9000~1000: 0x0~1000 Blob(0x55bada15c690 blob([0x67b000~1000,!~1000] csum+shared crc32c/0x1000) use_tracker(0x2*0x1000 0x[1000,0]) SharedBlob(0x55bada15c770 sbid 0x4
426))
-51> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_extent_map csum: [64c328cb,6651e142]
-50> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_extent_map 0xa000~e00: 0x2000~e00 Blob(0x55bada15c850 blob([0x8d8000~8000] clen 0x10000 -> 0x7d44 compressed+csum+shared crc32c/0x8000) use_tracker(0x10000 0xe00) SharedBlob(0x5
5bada15c7e0 sbid 0x4466))
-49> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_extent_map csum: [ba64aa3c]
-48> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_extent_map 0xae00~200: 0x1e00~200 Blob(0x55bada1e4230 blob([0x67b000~2000] csum+shared crc32c/0x1000) use_tracker(0x2*0x1000 0x[0,200]) SharedBlob(0x55bada15c770 sbid 0x4426))
-47> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_extent_map csum: [64c328cb,6651e142]
...
This means having multiple blobs with the same offset and different length is a valid case.
Hence apparently the current allocation map rebuild implementation to be revised for shared blobs. See how fsck deals with that - for each unique sbid it builds an instance of bluestore_extent_ref_map_t which merges shared blob references among all the relevant objects. And once all the onodes are traversed these maps are used to adjust expected allocator's state.
/// building extent_ref_maps
BlueStore::_fsck_check_objects_shallow
...
if (blob.is_shared()) {
...
/// final checking on the resuling extent_ref_map:
BlueStore::_fsck_on_open()
dout(1) << func << " checking shared_blobs" << dendl;
...
errors += _fsck_check_extents(sbi.cid,
sbi.oids.front(),
extents,
sbi.compressed,
used_blocks,
fm->get_alloc_size(),
repair ? &repairer : nullptr,
*expected_statfs,
depth);
Updated by Aishwarya Mathuria over 2 years ago
Kefu Chai wrote:
/a/kchai-2021-08-17_04:49:07-rados-wip-kefu-testing-2021-08-17-0902-distro-basic-smithi/6343511
Updated by Neha Ojha over 2 years ago
- Status changed from Triaged to Fix Under Review
- Assignee set to Gabriel BenHanokh
- Pull request ID set to 42991
Updated by Sebastian Wagner over 2 years ago
- Related to Bug #52502: src/os/bluestore/BlueStore.cc: FAILED ceph_assert(collection_ref) added
Updated by Neha Ojha over 2 years ago
- Status changed from Fix Under Review to Resolved