Project

General

Profile

Actions

Bug #52138

closed

os/bluestore/BlueStore.cc: FAILED ceph_assert(lcl_extnt_map[offset] == length)

Added by Neha Ojha over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2021-08-04T11:05:33.959 INFO:teuthology.orchestra.run.smithi115.stderr:2021-08-04T11:05:33.966+0000 7fd3d5145ec0 -1 bluestore::NCB::restore_allocator::No Valid allocation info on disk (empty file)
2021-08-04T11:05:33.971 INFO:teuthology.orchestra.run.smithi115.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-6641-g626e0d0d/rpm/el8/BUILD/ceph-17.0.0-6641-g626e0d0d/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::read_allocation_from_single_onode(Allocator*, BlueStore::OnodeRef&, BlueStore::read_alloc_stats_t&)' thread 7fd3d5145ec0 time 2021-08-04T11:05:33.980161+0000
2021-08-04T11:05:33.972 INFO:teuthology.orchestra.run.smithi115.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-6641-g626e0d0d/rpm/el8/BUILD/ceph-17.0.0-6641-g626e0d0d/src/os/bluestore/BlueStore.cc: 17446: FAILED ceph_assert(lcl_extnt_map[offset] == length)
2021-08-04T11:05:33.976 INFO:teuthology.orchestra.run.smithi115.stderr: ceph version 17.0.0-6641-g626e0d0d (626e0d0d5c988bd8a05cd7d0a41c0b5e1a20d68b) quincy (dev)
2021-08-04T11:05:33.977 INFO:teuthology.orchestra.run.smithi115.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x139) [0x7fd3d3159b52]
2021-08-04T11:05:33.977 INFO:teuthology.orchestra.run.smithi115.stderr: 2: (ceph::register_assert_context(ceph::common::CephContext*)+0) [0x7fd3d3159da2]
2021-08-04T11:05:33.977 INFO:teuthology.orchestra.run.smithi115.stderr: 3: (BlueStore::read_allocation_from_single_onode(Allocator*, boost::intrusive_ptr<BlueStore::Onode>&, BlueStore::read_alloc_stats_t&)+0x1ba) [0x55a283b31596]
2021-08-04T11:05:33.977 INFO:teuthology.orchestra.run.smithi115.stderr: 4: (BlueStore::read_allocation_from_onodes(Allocator*, BlueStore::read_alloc_stats_t&)+0xdd6) [0x55a283b8e32e]
2021-08-04T11:05:33.978 INFO:teuthology.orchestra.run.smithi115.stderr: 5: (BlueStore::reconstruct_allocations(Allocator*, BlueStore::read_alloc_stats_t&)+0x6e9) [0x55a283b8f98d]
2021-08-04T11:05:33.978 INFO:teuthology.orchestra.run.smithi115.stderr: 6: (BlueStore::read_allocation_from_drive_on_startup()+0x1cb) [0x55a283b8fcc7]
2021-08-04T11:05:33.978 INFO:teuthology.orchestra.run.smithi115.stderr: 7: (BlueStore::_init_alloc()+0xb93) [0x55a283b90fcb]
2021-08-04T11:05:33.979 INFO:teuthology.orchestra.run.smithi115.stderr: 8: (BlueStore::_open_db_and_around(bool, bool)+0x615) [0x55a283bd78eb]
2021-08-04T11:05:33.979 INFO:teuthology.orchestra.run.smithi115.stderr: 9: (BlueStore::_mount()+0x7cf) [0x55a283bda6af]
2021-08-04T11:05:33.979 INFO:teuthology.orchestra.run.smithi115.stderr: 10: (BlueStore::mount()+0xd) [0x55a283c13899]
2021-08-04T11:05:33.979 INFO:teuthology.orchestra.run.smithi115.stderr: 11: main()
2021-08-04T11:05:33.980 INFO:teuthology.orchestra.run.smithi115.stderr: 12: __libc_start_main()
2021-08-04T11:05:33.980 INFO:teuthology.orchestra.run.smithi115.stderr: 13: _start()

/a/benhanokh-2021-08-04_06:12:22-rados-wip_gbenhano_ncbz-distro-basic-smithi/6310694/

Has been exposed by https://github.com/ceph/ceph/pull/39871


Related issues 1 (0 open1 closed)

Related to bluestore - Bug #52502: src/os/bluestore/BlueStore.cc: FAILED ceph_assert(collection_ref)Can't reproduceGabriel BenHanokh

Actions
Actions #1

Updated by Kefu Chai over 2 years ago

/a/kchai-2021-08-17_04:49:07-rados-wip-kefu-testing-2021-08-17-0902-distro-basic-smithi/6343511

Actions #2

Updated by Igor Fedotov over 2 years ago

  • Status changed from New to Triaged

Looks like a bug in that new NCB stuff. I managed to repro the issue and here is the relevant onode dump (see offset 0x67b000).

-57> 2021-08-25T13:55:37.933+0300 7fe4c2e54200  0 _dump_onode 0x55badaf7aa00 #555:05000000:::OBJ_1002:33c0e35e#1 nid 7636 size 0x30400 (197632) expected_object_size 0 expected_write_size 0 in 0 shards, 0 spanning blobs
-56> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_extent_map 0x6200~e00: 0x200~e00 Blob(0x55bada1e55e0 blob([0x678000~1000] csum+shared crc32c/0x1000) use_tracker(0x1000 0xe00) SharedBlob(0x55bada1e57a0 sbid 0x44e2))
-55> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_extent_map csum: [ea75a93b]
-54> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_extent_map 0x7000~2000: 0x0~2000 Blob(0x55bada1e5490 blob([0x679000~2000] csum+shared crc32c/0x1000) use_tracker(0x2*0x1000 0x[1000,1000]) SharedBlob(0x55bada1e5650 sbid 0x44e3)
)
-53> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_extent_map csum: [21d7c34,a37a85b6]
-52> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_extent_map 0x9000~1000: 0x0~1000 Blob(0x55bada15c690 blob([0x67b000~1000,!~1000] csum+shared crc32c/0x1000) use_tracker(0x2*0x1000 0x[1000,0]) SharedBlob(0x55bada15c770 sbid 0x4
426))
-51> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_extent_map csum: [64c328cb,6651e142]
-50> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_extent_map 0xa000~e00: 0x2000~e00 Blob(0x55bada15c850 blob([0x8d8000~8000] clen 0x10000 -> 0x7d44 compressed+csum+shared crc32c/0x8000) use_tracker(0x10000 0xe00) SharedBlob(0x5
5bada15c7e0 sbid 0x4466))
-49> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_extent_map csum: [ba64aa3c]
-48> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_extent_map 0xae00~200: 0x1e00~200 Blob(0x55bada1e4230 blob([0x67b000~2000] csum+shared crc32c/0x1000) use_tracker(0x2*0x1000 0x[0,200]) SharedBlob(0x55bada15c770 sbid 0x4426))
-47> 2021-08-25T13:55:37.933+0300 7fe4c2e54200 0 _dump_extent_map csum: [64c328cb,6651e142]
...

This means having multiple blobs with the same offset and different length is a valid case.
Hence apparently the current allocation map rebuild implementation to be revised for shared blobs. See how fsck deals with that - for each unique sbid it builds an instance of bluestore_extent_ref_map_t which merges shared blob references among all the relevant objects. And once all the onodes are traversed these maps are used to adjust expected allocator's state.

/// building extent_ref_maps

BlueStore::_fsck_check_objects_shallow

...

if (blob.is_shared()) {
...

/// final checking on the resuling extent_ref_map:

BlueStore::_fsck_on_open()

dout(1) << func << " checking shared_blobs" << dendl;
...
errors += _fsck_check_extents(sbi.cid,
sbi.oids.front(),
extents,
sbi.compressed,
used_blocks,
fm->get_alloc_size(),
repair ? &repairer : nullptr,
*expected_statfs,
depth);
Actions #3

Updated by Aishwarya Mathuria over 2 years ago

Kefu Chai wrote:

/a/kchai-2021-08-17_04:49:07-rados-wip-kefu-testing-2021-08-17-0902-distro-basic-smithi/6343511

https://pulpito.ceph.com/yuriw-2021-08-27_21:19:22-rados-wip-yuri6-testing-2021-08-27-1207-distro-basic-smithi/6363378/

Actions #4

Updated by Neha Ojha over 2 years ago

  • Status changed from Triaged to Fix Under Review
  • Assignee set to Gabriel BenHanokh
  • Pull request ID set to 42991
Actions #5

Updated by Sebastian Wagner over 2 years ago

  • Related to Bug #52502: src/os/bluestore/BlueStore.cc: FAILED ceph_assert(collection_ref) added
Actions #6

Updated by Neha Ojha over 2 years ago

  • Status changed from Fix Under Review to Resolved
Actions

Also available in: Atom PDF