Project

General

Profile

Actions

Bug #54226

closed

bluestore crash and not repairable scrub errors

Added by Manuel Lausch about 2 years ago. Updated about 2 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

An object with the Name "c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_00400000" makes trouble if stored on bluestore. With filestore all is fine.

I get scrub errors which says, the Object is mission on the bluestore OSD. Repair says all fixed, but a furhter deep-scrub brings back the same missing error

2022-02-08 13:39:28.095 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff shard 3 1:ffffffff:::c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_00400000:head : missing
2022-02-08 13:39:28.095 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff deep-scrub 1 missing, 0 inconsistent objects
2022-02-08 13:39:28.095 7f600dfec700 -1 log_channel(cluster) log [ERR] : 1.7fff deep-scrub 1 errors

I mounted the bulestore device with ceph-objectstore-tool --op fuse --data-path /var/lib/ceph/osd/ceph-3 --mountpoint /mnt
The Object indeed is missing in appropriated PG folder.
But I can cd into the object dir with the name "#1:ffffffff:::c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_00400000:head#". The data in there has the correct content.

This all happend on one of our productive Clusters running with ceph Nautilus 14.2.22

Just for fun I put on a testcluster with ceph Pacific 16.2.6 an object with the same objectname. The issue is exactly the same on this cluster.

I assume the bitfield "ffffffff" on the object is related to the objectname and in the handling with this max value is something wrong.

prior of this scrub error I had some backfilling from filestore to bluestore migration running. While this two from four OSDs wich contains the affected PG crashed repeatedly with the following error (bluestore debuglevel 20)

   -40> 2022-02-07 16:28:20.489 7f550723a700 20 bluestore(/var/lib/ceph/osd/ceph-410).collection(1.7fff_head 0x564161314600)  r 0 v.len 512
   -39> 2022-02-07 16:28:20.489 7f550723a700 15 bluestore(/var/lib/ceph/osd/ceph-410) getattrs 1.7fff_head #1:ffffffeb:::9b6886fa3639e64c892813ba7c9da9f4411f0a5fb73c89517b5f3f68acdaa388_00400000:head#
   -38> 2022-02-07 16:28:20.489 7f550723a700 10 bluestore(/var/lib/ceph/osd/ceph-410) getattrs 1.7fff_head #1:ffffffeb:::9b6886fa3639e64c892813ba7c9da9f4411f0a5fb73c89517b5f3f68acdaa388_00400000:head# = 0
   -37> 2022-02-07 16:28:20.489 7f550723a700 10 bluestore(/var/lib/ceph/osd/ceph-410) stat 1.7fff_head #1:ffffffef:::bda22ca861e6999694841deb44bce5d37d7c35d0ffc9387d649d80acf818c341_0014f39d:head#
   -36> 2022-02-07 16:28:20.489 7f550723a700 20 bluestore(/var/lib/ceph/osd/ceph-410).collection(1.7fff_head 0x564161314600) get_onode oid #1:ffffffef:::bda22ca861e6999694841deb44bce5d37d7c35d0ffc9387d649d80acf818c341_0014f39d:head# key 0x7f8000000000000001ffffffef216264'a22ca861e6999694841deb44bce5d37d7c35d0ffc9387d649d80acf818c341_0014f39d!='0xfffffffffffffffeffffffffffffffff'o'
   -35> 2022-02-07 16:28:20.489 7f550723a700 20 bluestore(/var/lib/ceph/osd/ceph-410).collection(1.7fff_head 0x564161314600)  r 0 v.len 843
   -34> 2022-02-07 16:28:20.489 7f550723a700 15 bluestore(/var/lib/ceph/osd/ceph-410) getattrs 1.7fff_head #1:ffffffef:::bda22ca861e6999694841deb44bce5d37d7c35d0ffc9387d649d80acf818c341_0014f39d:head#
   -33> 2022-02-07 16:28:20.489 7f550723a700 10 bluestore(/var/lib/ceph/osd/ceph-410) getattrs 1.7fff_head #1:ffffffef:::bda22ca861e6999694841deb44bce5d37d7c35d0ffc9387d649d80acf818c341_0014f39d:head# = 0
   -32> 2022-02-07 16:28:20.489 7f550723a700 10 bluestore(/var/lib/ceph/osd/ceph-410) stat 1.7fff_head #1:fffffffb:::98c8a3708cceb042f5ec0d5dd49416968adc95cf6019796fdf6ae1a1f7fd929d_00400000:head#
   -31> 2022-02-07 16:28:20.489 7f550723a700 20 bluestore(/var/lib/ceph/osd/ceph-410).collection(1.7fff_head 0x564161314600) get_onode oid #1:fffffffb:::98c8a3708cceb042f5ec0d5dd49416968adc95cf6019796fdf6ae1a1f7fd929d_00400000:head# key 0x7f8000000000000001fffffffb213938'c8a3708cceb042f5ec0d5dd49416968adc95cf6019796fdf6ae1a1f7fd929d_00400000!='0xfffffffffffffffeffffffffffffffff'o'
   -30> 2022-02-07 16:28:20.489 7f550723a700 20 bluestore(/var/lib/ceph/osd/ceph-410).collection(1.7fff_head 0x564161314600)  r 0 v.len 512
   -29> 2022-02-07 16:28:20.489 7f550723a700 15 bluestore(/var/lib/ceph/osd/ceph-410) getattrs 1.7fff_head #1:fffffffb:::98c8a3708cceb042f5ec0d5dd49416968adc95cf6019796fdf6ae1a1f7fd929d_00400000:head#
   -28> 2022-02-07 16:28:20.489 7f550723a700 10 bluestore(/var/lib/ceph/osd/ceph-410) getattrs 1.7fff_head #1:fffffffb:::98c8a3708cceb042f5ec0d5dd49416968adc95cf6019796fdf6ae1a1f7fd929d_00400000:head# = 0
   -27> 2022-02-07 16:28:20.494 7f550723a700 15 bluestore(/var/lib/ceph/osd/ceph-410) collection_list 1.7fff_head start #1:ffffffff:::c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_00400000:0# end #MAX# max 2147483647
   -26> 2022-02-07 16:28:20.494 7f550723a700 20 bluestore(/var/lib/ceph/osd/ceph-410) _collection_list range #-3:fffe0000::::0#0 to #-3:ffffffff::::0#0 and #1:fffe0000::::0#0 to #1:ffffffff::::0#0 start #1:ffffffff:::c76c7ac2014adb9f0f0837ac1e85fd1e241af225908b6a0c3d3a44d6b866e732_00400000:0#
   -25> 2022-02-07 16:28:20.506 7f550723a700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/14.2.22/rpm/el8/BUILD/ceph-14.2.22/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_collection_list(BlueStore::Collection*, const ghobject_t&, const ghobject_t&, int, bool, std::vector<ghobject_t>*, ghobject_t*)' thread 7f550723a700 time 2022-02-07 16:28:20.495642
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/14.2.22/rpm/el8/BUILD/ceph-14.2.22/src/os/bluestore/BlueStore.cc: 10157: FAILED ceph_assert(start >= coll_range_start && start < coll_range_end)

The crash didn't happen again but the pseudo missing object is still a issue. And I am a bit afraid that this issue seems to be related to the objectname itself.


Related issues 1 (0 open1 closed)

Is duplicate of bluestore - Bug #52705: pg scrub stat mismatch with special objects that have hash 'ffffffff'ResolvedKefu Chai

Actions
Actions #1

Updated by Igor Fedotov about 2 years ago

  • Project changed from Ceph to bluestore
Actions #2

Updated by Igor Fedotov about 2 years ago

  • Is duplicate of Bug #52705: pg scrub stat mismatch with special objects that have hash 'ffffffff' added
Actions #3

Updated by Igor Fedotov about 2 years ago

  • Status changed from New to Duplicate
Actions

Also available in: Atom PDF