Bug #65185: OSD_SCRUB_ERROR, inconsistent pg in upgrade tests - RADOS - Ceph

Actions

Copy link

Bug #65185

open

OSD_SCRUB_ERROR, inconsistent pg in upgrade tests

Added by Laura Flores about 1 month ago. Updated 9 days ago.

Status:

Fix Under Review

Priority:

Immediate

Assignee:

Samuel Just

Category:

Target version:

% Done:

Source:

Tags:

Backport:

squid

Regression:

Severity:

3 - minor

Reviewed:

Affected Versions:

ceph-qa-suite:

Component(RADOS):

Pull request ID:

56995

Crash signature (v1):

Crash signature (v2):

Description

/a/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/7616025/remote/smithi098/log/b1f19696-e81a-11ee-95cd-87774f69a715/ceph.log.gz

2024-03-22T09:20:00.000187+0000 mon.a (mon.0) 7863 : cluster 4 [ERR] OSD_SCRUB_ERRORS: 1 scrub errors
2024-03-22T09:20:00.000194+0000 mon.a (mon.0) 7864 : cluster 4 [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
2024-03-22T09:19:59.897409+0000 mon.a (mon.0) 7860 : cluster 0 osdmap e3595: 8 total, 8 up, 8 in
2024-03-22T09:20:00.000202+0000 mon.a (mon.0) 7865 : cluster 4     pg 103.14 is active+clean+inconsistent, acting [5,1,2]
2024-03-22T09:20:00.000151+0000 mon.a (mon.0) 7861 : cluster 4 Health detail: HEALTH_ERR noscrub flag(s) set; 1 scrub errors; Possible data damage: 1 pg inconsistent

More in this run: https://pulpito.ceph.com/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Updated by Laura Flores about 1 month ago

Laura Flores wrote:

/a/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/7616025/remote/smithi098/log/b1f19696-e81a-11ee-95cd-87774f69a715/ceph.log.gz
[...]

More in this run: https://pulpito.ceph.com/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/

Note the "failed to encode expected crc" log messages right before this. Could it be related?

2024-03-22T09:19:46.882532+0000 osd.4 (osd.4) 3037 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:46.883679+0000 osd.5 (osd.5) 1588 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:46.884492+0000 osd.4 (osd.4) 3038 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:46.884932+0000 osd.7 (osd.7) 929 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:46.885034+0000 osd.5 (osd.5) 1589 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:46.941945+0000 osd.6 (osd.6) 310 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:47.452680+0000 mgr.y (mgr.25274) 3825 : cluster 0 pgmap v7002: 129 pgs: 1 active+undersized+remapped, 1 active+clean+inconsistent, 127 active+clean; 308 MiB data, 2.3 GiB used, 713 GiB / 715 GiB avail
2024-03-22T09:19:46.882532+0000 osd.4 (osd.4) 3037 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:46.883679+0000 osd.5 (osd.5) 1588 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:46.884492+0000 osd.4 (osd.4) 3038 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:47.887150+0000 mon.a (mon.0) 7841 : cluster 0 osdmap e3583: 8 total, 8 up, 8 in
2024-03-22T09:19:46.884932+0000 osd.7 (osd.7) 929 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:46.885034+0000 osd.5 (osd.5) 1589 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:46.941945+0000 osd.6 (osd.6) 310 : cluster 3 failed to encode map e3582 with expected crc
2024-03-22T09:19:47.452680+0000 mgr.y (mgr.25274) 3825 : cluster 0 pgmap v7002: 129 pgs: 1 active+undersized+remapped, 1 active+clean+inconsistent, 127 active+clean; 308 MiB data, 2.3 GiB used, 713 GiB / 715 GiB avail
2024-03-22T09:19:47.887150+0000 mon.a (mon.0) 7841 : cluster 0 osdmap e3583: 8 total, 8 up, 8 in
2024-03-22T09:19:48.262121+0000 mon.a (mon.0) 7843 : cluster 4 Health check failed: 1 scrub errors (OSD_SCRUB_ERRORS)
2024-03-22T09:19:48.262163+0000 mon.a (mon.0) 7844 : cluster 4 Health check failed: Possible data damage: 1 pg inconsistent (PG_DAMAGED)
2024-03-22T09:19:48.262121+0000 mon.a (mon.0) 7843 : cluster 4 Health check failed: 1 scrub errors (OSD_SCRUB_ERRORS)
2024-03-22T09:19:48.262163+0000 mon.a (mon.0) 7844 : cluster 4 Health check failed: Possible data damage: 1 pg inconsistent (PG_DAMAGED)

Actions

Copy link

Updated by Laura Flores about 1 month ago

Related to Bug #63389: Failed to encode map X with expected CRC added

Actions

Copy link

Updated by Radoslaw Zarzynski about 1 month ago

Assignee set to Ronen Friedman

The OSDMap CRC issue is clearly there but I'm not sure / I doubt it can explain the scrub error.
Let's ask Ronen for taking a look.
In the meantime we can rerun with squid backport of the CRC fix inside (https://github.com/ceph/ceph/pull/56553).

Actions

Copy link

Updated by Ronen Friedman 29 days ago

Status changed from New to In Progress

There is a problem with most 'upgrade' tests, which may be the culprit here:
The command to initiate a deep scrub was modified between Reef & Squid:
from 'deep_scrub' to 'deep-scrub'. Thus, in a mixed (upgrade) environment,
both formats fail for some OSDs...

I am looking into it.

Actions

Copy link

Updated by Laura Flores 20 days ago

Related to deleted (Bug #63389: Failed to encode map X with expected CRC)

Actions

Copy link

Updated by Radoslaw Zarzynski 20 days ago

This tracker seems to be another one about the deep_scrub story. If so, it should be fixed with the upcoming fix.

Actions

Copy link

Updated by Laura Flores 18 days ago

Laura Flores wrote:

/a/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/7616025/remote/smithi098/log/b1f19696-e81a-11ee-95cd-87774f69a715/ceph.log.gz
[...]

More in this run: https://pulpito.ceph.com/teuthology-2024-03-22_02:08:13-upgrade-squid-distro-default-smithi/

The original issue also had this in the cluster log:

2024-03-22T09:19:45.700815+0000 osd.5 (osd.5) 1581 : cluster 4 103.14 soid 103:2a6ba587:::smithi098868219-21:150 : object info inconsistent , attr value mismatch '__header'
2024-03-22T09:19:45.700818+0000 osd.5 (osd.5) 1582 : cluster 4 103.14 soid 103:2a6ba587:::smithi098868219-21:150 : object info inconsistent , attr value mismatch '__header'

Actions

Copy link

Updated by Laura Flores 18 days ago

/a/yuriw-2024-04-09_14:58:21-upgrade-wip-yuri4-testing-2024-04-08-1432-distro-default-smithi/7648938/remote/smithi122/log/274b2744-f6e5-11ee-b64a-cb9ed24678a4/ceph.log.gz

2024-04-10T04:28:53.349252+0000 osd.2 (osd.2) 40 : cluster 4 61.6 soid 61:60b1a869:::smithi092684856-187:head : object info inconsistent , snapset inconsistent , attr value mismatch '__header'
2024-04-10T04:28:53.349331+0000 osd.2 (osd.2) 41 : cluster 4 61.6 soid 61:610d2ff8:::smithi092684856-86:head : object info inconsistent , snapset inconsistent , attr value mismatch '__header'
2024-04-10T04:28:53.349400+0000 osd.2 (osd.2) 42 : cluster 4 61.6 soid 61:632fe97c:::smithi092684856-451:head : object info inconsistent , snapset inconsistent , attr value mismatch '__header'
2024-04-10T04:28:53.349468+0000 osd.2 (osd.2) 43 : cluster 4 61.6 soid 61:63e36c90:::smithi092684856-78:head : object info inconsistent , snapset inconsistent , attr value mismatch '__header'
2024-04-10T04:28:53.349534+0000 osd.2 (osd.2) 44 : cluster 4 61.6 soid 61:64271e8a:::smithi092684856-320:head : object info inconsistent , snapset inconsistent , attr value mismatch '__header'
2024-04-10T04:28:53.349600+0000 osd.2 (osd.2) 45 : cluster 4 61.6 soid 61:64bb76df:::smithi092684856-488:head : object info inconsistent , snapset inconsistent , attr value mismatch '__header'
2024-04-10T04:28:53.349665+0000 osd.2 (osd.2) 46 : cluster 4 61.6 soid 61:64fcca3a:::smithi092684856-350:head : object info inconsistent , snapset inconsistent , attr value mismatch '__header'
2024-04-10T04:28:53.349727+0000 osd.2 (osd.2) 47 : cluster 4 61.6 soid 61:6525a1fb:::smithi092684856-98:head : object info inconsistent , snapset inconsistent , attr value mismatch '__header'
2024-04-10T04:28:53.349791+0000 osd.2 (osd.2) 48 : cluster 4 61.6 soid 61:660a2a17:::smithi092684856-347:head : object info inconsistent , snapset inconsistent , attr value mismatch '__header'
2024-04-10T04:28:53.349856+0000 osd.2 (osd.2) 49 : cluster 4 61.6 soid 61:660fe2df:::smithi092684856-233:head : object info inconsistent , snapset inconsistent , attr value mismatch '__header'
2024-04-10T04:28:53.349916+0000 osd.2 (osd.2) 50 : cluster 4 61.6 soid 61:66a4894e:::smithi092684856-304:head : object info inconsistent , snapset inconsistent , attr value mismatch '__header'
2024-04-10T04:28:53.349976+0000 osd.2 (osd.2) 51 : cluster 4 61.6 soid 61:66fa1e2f:::smithi092684856-166:head : object info inconsistent , snapset inconsistent , attr value mismatch '__header'
2024-04-10T04:28:53.350035+0000 osd.2 (osd.2) 52 : cluster 4 61.6 soid 61:67a0aa39:::smithi092684856-418:head : object info inconsistent , snapset inconsistent , attr value mismatch '__header'
2024-04-10T04:28:53.350093+0000 osd.2 (osd.2) 53 : cluster 4 61.6 soid 61:67f04970:::smithi092684856-390:head : object info inconsistent , snapset inconsistent , attr value mismatch '__header'
2024-04-10T04:28:53.350709+0000 osd.2 (osd.2) 54 : cluster 4 61.6 deep-scrub 0 missing, 14 inconsistent objects
2024-04-10T04:28:53.350712+0000 osd.2 (osd.2) 55 : cluster 4 61.6 deep-scrub 14 errors
2024-04-10T04:28:53.491937+0000 mgr.y (mgr.34104) 2767 : cluster 0 pgmap v4256: 129 pgs: 1 active+clean+inconsistent, 1 remapped+peering, 2 peering, 8 unknown, 20 active+clean+remapped, 5 active+undersized+degraded, 28 active+undersized, 64 active+clean; 1.2 GiB data, 7.1 GiB used, 619 GiB / 626 GiB avail; 50/1353 objects degraded (3.695%); 78/1353 objects misplaced (5.765%)
2024-04-10T04:28:54.066550+0000 osd.1 (osd.1) 9 : cluster 0 61.9 scrub starts
2024-04-10T04:28:54.068714+0000 osd.1 (osd.1) 10 : cluster 0 61.9 scrub ok
2024-04-10T04:28:54.096089+0000 mon.a (mon.0) 5801 : cluster 4 Health check failed: 10 scrub errors (OSD_SCRUB_ERRORS)
2024-04-10T04:28:54.096140+0000 mon.a (mon.0) 5802 : cluster 4 Health check failed: Possible data damage: 1 pg inconsistent (PG_DAMAGED)

Actions

Copy link

Updated by Laura Flores 18 days ago

/a/yuriw-2024-04-09_14:58:21-upgrade-wip-yuri4-testing-2024-04-08-1432-distro-default-smithi/7648890
/a/yuriw-2024-04-09_14:58:21-upgrade-wip-yuri4-testing-2024-04-08-1432-distro-default-smithi/7648926
/a/yuriw-2024-04-09_14:58:21-upgrade-wip-yuri4-testing-2024-04-08-1432-distro-default-smithi/7648938

Actions

Copy link

#10

Updated by Samuel Just 15 days ago · Edited

I created a test branch with some extra logging and managed to reproduce the issue with slightly more info.

ceph-osd.1.log.gz:2024-04-13T23:05:39.137+0000 7fd18cdba700  0 osd.1 pg_epoch: 1806 pg[59.10( v 1806'74 (0'0,1806'74] local-lis/les=1805/1806 n=4 ec=1796/1757 lis/c=1805/1805 les/c/f=1806/1806/0 sis=1805) [1,6]/[1,6,4] r=0 lpr=1805 crt=1806'74 mlcod 1806'73 active+clean+remapped+scrubbing+deep [ 59.10:  ]  TIME_FOR_DEEP ps=[2~1,d~1,f~3,13~4]] scrubber<Session/Act/WaitReplicas>:  b.e.: compare_obj_details: oi inconsistent, obj : 59:081fdcbd:::smithi059796643-41:15, auth_oi: 59:081fdcbd:::smithi059796643-41:15(1801'69 client.51719.0:163 dirty s 2672280 uv 9 alloc_hint [0 0 0]), cand_oi: 59:081fdcbd:::smithi059796643-41:15(1801'69 client.51719.0:163 dirty s 2672280 uv 9 alloc_hint [0 0 0]), can_bl.length(): 250, auth_bl.length(): 1500
ceph-osd.1.log.gz:2024-04-13T23:05:39.137+0000 7fd18cdba700  0 osd.1 pg_epoch: 1806 pg[59.10( v 1806'74 (0'0,1806'74] local-lis/les=1805/1806 n=4 ec=1796/1757 lis/c=1805/1805 les/c/f=1806/1806/0 sis=1805) [1,6]/[1,6,4] r=0 lpr=1805 crt=1806'74 mlcod 1806'73 active+clean+remapped+scrubbing+deep [ 59.10:  ]  TIME_FOR_DEEP ps=[2~1,d~1,f~3,13~4]] scrubber<Session/Act/WaitReplicas>:  b.e.: compare_obj_details: oi inconsistent, obj : 59:081fdcbd:::smithi059796643-41:15, auth_oi: 59:081fdcbd:::smithi059796643-41:15(1801'69 client.51719.0:163 dirty s 2672280 uv 9 alloc_hint [0 0 0]), cand_oi: 59:081fdcbd:::smithi059796643-41:15(1801'69 client.51719.0:163 dirty s 2672280 uv 9 alloc_hint [0 0 0]), can_bl.length(): 250, auth_bl.length(): 1500
ceph-osd.1.log.gz:2024-04-13T23:05:39.137+0000 7fd18cdba700  0 osd.1 pg_epoch: 1806 pg[59.10( v 1806'74 (0'0,1806'74] local-lis/les=1805/1806 n=4 ec=1796/1757 lis/c=1805/1805 les/c/f=1806/1806/0 sis=1805) [1,6]/[1,6,4] r=0 lpr=1805 crt=1806'74 mlcod 1806'73 active+clean+remapped+scrubbing+deep [ 59.10:  ]  TIME_FOR_DEEP ps=[2~1,d~1,f~3,13~4]] scrubber<Session/Act/WaitReplicas>:  b.e.: compare_obj_details: oi inconsistent, obj : 59:081fdcbd:::smithi059796643-41:head, auth_oi: 59:081fdcbd:::smithi059796643-41:head(1801'70 client.51719.0:656 dirty s 2672280 uv 64 alloc_hint [0 0 0]), cand_oi: 59:081fdcbd:::smithi059796643-41:head(1801'70 client.51719.0:656 dirty s 2672280 uv 64 alloc_hint [0 0 0]), can_bl.length(): 273, auth_bl.length(): 1638
ceph-osd.1.log.gz:2024-04-13T23:05:39.137+0000 7fd18cdba700  0 osd.1 pg_epoch: 1806 pg[59.10( v 1806'74 (0'0,1806'74] local-lis/les=1805/1806 n=4 ec=1796/1757 lis/c=1805/1805 les/c/f=1806/1806/0 sis=1805) [1,6]/[1,6,4] r=0 lpr=1805 crt=1806'74 mlcod 1806'73 active+clean+remapped+scrubbing+deep [ 59.10:  ]  TIME_FOR_DEEP ps=[2~1,d~1,f~3,13~4]] scrubber<Session/Act/WaitReplicas>:  b.e.: compare_obj_details: oi inconsistent, obj : 59:081fdcbd:::smithi059796643-41:head, auth_oi: 59:081fdcbd:::smithi059796643-41:head(1801'70 client.51719.0:656 dirty s 2672280 uv 64 alloc_hint [0 0 0]), cand_oi: 59:081fdcbd:::smithi059796643-41:head(1801'70 client.51719.0:656 dirty s 2672280 uv 64 alloc_hint [0 0 0]), can_bl.length(): 273, auth_bl.length(): 1638

Looks a bit like the object_info has the same contents, but the buffer is longer. That's pretty weird.

pulpito link: https://pulpito.ceph.com/sjust-2024-04-13_21:17:11-upgrade:reef-x:stress-split-wip-sjust-squid-testing-65185-2024-04-12-distro-default-smithi/7655166
branch: wip-sjust-squid-testing-65185-2024-04-12 (the snapset logging is a bit off, prints whether it's inconsistent or not)

For others looking to reproduce:
/virtualenv/bin/teuthology-suite -m smithi -s upgrade/reef-x/stress-split --priority 51 -c <branch> --limit 5

Actions

Copy link

#11

Updated by Ronen Friedman 14 days ago

Analysis (WIP)¶

the following test run is a sure way to create the ‘__header’ failure in ‘main’:
./teuthology-suite --seed=7341 -l40 -m smithi -t main -c wip-rf-comb1 -s upgrade --filter-all ' 3-stress-tasks/{radosbench rbd-cls rbd-import-export rbd_api'
the failure did not appear in my ‘reef’ run.

Comparing ‘main’ and Reef:¶

wip-rf-comb1 (‘main’ + some fixes):¶

tested specifying ‘main’ as the test suit
all 40 jobs run using :./teuthology-suite --seed=7341 -l40 -m smithi -t main -c wip-rf-comb1 -s upgrade --filter-all ' 3-stress-tasks/{radosbench rbd-cls rbd-import-export rbd_api'
https://pulpito.ceph.com/rfriedma-2024-04-12_09:02:49-upgrade-wip-rf-comb1-distro-default-smithi/
The important failure: e.g. 7653757: “2024-04-12T10:57:38.744468+0000 osd.1 (osd.1) 20 : cluster [ERR] 61.9 soid 61:90b0d9bc:::smithi099682301-188:head : object info inconsistent , snapset inconsistent , attr value mismatch '__header'" in cluster log”
Also: OSD_DOWN, POOL_APP_NOT_ENABLED, MDS_ALL_DOWN, 1/3 mons down - but these all appear in the successful Reef runs. I.e. - it’s a whitelist issue.
And 7653761 - timeout - that deserves a look.

wip-rf-testing-65185-reef (== reef):¶

tested specifying ‘main’ as the test suit

test 1: rfriedma-2024-04-13_11:33:33-upgrade-wip-rf-testing-65185-reef-distro-default-smithi:

-seed=7341 -l40 -m smithi -t main -c .... -s upgrade --filter-all ' 3-stress-tasks/{radosbench rbd-cls rbd-import-export rbd_api'
16 failed. None with ‘__header’. Only 1 failure of interest (7655002). TBD.
Successful tests show and override OSD_DOWN, POOL_APP_NOT_ENABLED, MDS_ALL_DOWN, ...

multiple other successful runs. See https://pulpito.ceph.com/?branch=wip-rf-testing-65185-reef

Actions

Copy link

#12

Updated by Ronen Friedman 12 days ago · Edited

I am considering the following suspect(s):
PR #54930 modified ScrubMap::object::attrs (where we see a problem) from being a map of buffer pointers to a bufferlist.
There are two conceivable issues here, to be investigated:
1) a slight diff in the way the two constructs are encoded/decoded (as we've only seen the problem in mixed clusters), and
2) some bug (esp - using a buffer while it is being modified) in the code that changed in Squid to handle the new 'attrs' type.
@Radoslaw Zarzynski - any insight? I am already suspecting this PR re the 'counting modified shards' bug

Actions

Copy link

#13

Updated by Radoslaw Zarzynski 12 days ago

Yeah, there is a change in attrs processing. Already prepared a commit: https://github.com/rzarzynski/ceph/commit/c2635dcd3457f9c24cddc36d3fdfde9bdfb93017.

Actions

Copy link

#14

Updated by Samuel Just 10 days ago · Edited

https://github.com/rzarzynski/ceph/commit/1a4d3f01816cedb15106fe2cdb52322029482827 changed ScrubMap::object::attrs to use a bufferlist rather than a bufferptr. A quick scan of the patch in question doesn't demonstrate any obvious problems. However, ObjectStore.h happens to have an extra overload for getattrs:

  /**
   * getattrs -- get all of the xattrs of an object
   *
   * @param cid collection for object
   * @param oid oid of object
   * @param aset place to put output result.
   * @returns 0 on success, negative error code on failure.
   */
  virtual int getattrs(CollectionHandle &c, const ghobject_t& oid,
               std::map<std::string,ceph::buffer::ptr, std::less<>>& aset) = 0;

  /**
   * getattrs -- get all of the xattrs of an object
   *
   * @param cid collection for object
   * @param oid oid of object
   * @param aset place to put output result.
   * @returns 0 on success, negative error code on failure.
   */
  int getattrs(CollectionHandle &c, const ghobject_t& oid,
           std::map<std::string,ceph::buffer::list,std::less<>>& aset) {
    std::map<std::string,ceph::buffer::ptr,std::less<>> bmap;
    int r = getattrs(c, oid, bmap);
    for (auto i = bmap.begin(); i != bmap.end(); ++i) {
      aset[i->first].append(i->second);
    }
    return r;
  }

PGBackend::be_scan_list (which is how both primary and replica actually perform scrub scans) uses getattrs as follows:

    ScrubMap::object &o = map.objects[poid];
    o.size = st.st_size;
    ceph_assert(!o.negative);
    store->getattrs(
      ch,
      ghobject_t(
    poid, ghobject_t::NO_GEN, get_parent()->whoami_shard().shard),
      o.attrs);

This wasn't modified by the above patch as it didn't cause a compile issue (and, honestly, should be fine). However,

      aset[i->first].append(i->second);

seems like a plausible handle on why we're seeing attributes which are 5-7x (always an integer multiple from what I've seen) the size of the actual encoded structure if we are somehow invoking be_scan_list on the same entry several times. The other overload in BlueStore.cc

   for (auto& i : o->onode.attrs) {
      aset.emplace(i.first.c_str(), i.second);
    }

would be a noop for any keys that already exist, so if there were such a bug causing us to getattrs on the same object multiple times it would have been hidden.

I'm looking for evidence that be_scan_list is somehow populating the same entry multiple times.

Actions

Copy link

#15

Updated by Samuel Just 10 days ago · Edited

ceph-osd.2.log.gz:2024-04-16T02:55:26.081+0000 7f73fa56d700  0 osd.2 pg_epoch: 1921 pg[62.0( v 1921'255 (0'0,1921'255] local-lis/les=1915/1916 n=2 ec=1731/1731 lis/c=1915/1915 les/c/f=1916/1916/0 sis=1915) [2,3,6] r=0 lpr=1915 crt=1921'255 lcod 1916'253 mlcod 1916'253 active+clean+scrubbing+deep [ 62.0:  ]  TIME_FOR_DEEP ps=[21~1,25~2,2d~1,32~1,3b~1,3f~1,42~d,50~1,52~4,58~3,5e~2]] scrubber<Session/Act/WaitReplicas>:  b.e.: compare_obj_details: oi inconsistent, obj : 62:07286d5f:::smithi045520396-5:5b, auth_oi: 62:07286d5f:::smithi045520396-5:5b(1916'252 client.39480.0:1163 dirty s 3277365 uv 136 alloc_hint [0 0 0]), cand_oi: 62:07286d5f:::smithi045520396-5:5b(1916'252 client.39480.0:1163 dirty s 3277365 uv 136 alloc_hint [0 0 0]), can_bl.length(): 249, auth_bl.length(): 1743

For 62:07286d5f:::smithi045520396-5:5b, the attribute on the primary has length 1743 compared with 249 elsewhere. That's exactly 7x what it should be. Earlier in that chunk:

2024-04-16T02:55:26.061+0000 7f73f6565700 10 osd.2 pg_epoch: 1921 pg[62.0( v 1921'255 (0'0,1921'255] local-lis/les=1915/1916 n=2 ec=1731/1731 lis/c=1915/1915 les/c/f=1916/1916/0 sis=1915) [2,3,6] r=0 lpr=1915 crt=1921'255 lcod 1916'253 mlcod 1916'253 active+clean+scrubbing+deep [ 62.0:  ]  TIME_FOR_DEEP ps=[21~1,25~2,2d~1,32~1,3b~1,3f~1,42~d,50~1,52~4,58~3,5e~2]] be_scan_list (0/3 62:07286d5f:::smithi045520396-5:5b deep)
2024-04-16T02:55:26.061+0000 7f73f6565700 10 osd.2 pg_epoch: 1921 pg[62.0( v 1921'255 (0'0,1921'255] local-lis/les=1915/1916 n=2 ec=1731/1731 lis/c=1915/1915 les/c/f=1916/1916/0 sis=1915) [2,3,6] r=0 lpr=1915 crt=1921'255 lcod 1916'253 mlcod 1916'253 active+clean+scrubbing+deep [ 62.0:  ]  TIME_FOR_DEEP ps=[21~1,25~2,2d~1,32~1,3b~1,3f~1,42~d,50~1,52~4,58~3,5e~2]] be_deep_scrub 62:07286d5f:::smithi045520396-5:5b pos (0/3 62:07286d5f:::smithi045520396-5:5b deep)
2024-04-16T02:55:26.061+0000 7f73f6565700 20 osd.2 pg_epoch: 1921 pg[62.0( v 1921'255 (0'0,1921'255] local-lis/les=1915/1916 n=2 ec=1731/1731 lis/c=1915/1915 les/c/f=1916/1916/0 sis=1915) [2,3,6] r=0 lpr=1915 crt=1921'255 lcod 1916'253 mlcod 1916'253 active+clean+scrubbing+deep [ 62.0:  ]  TIME_FOR_DEEP ps=[21~1,25~2,2d~1,32~1,3b~1,3f~1,42~d,50~1,52~4,58~3,5e~2]] be_deep_scrub  62:07286d5f:::smithi045520396-5:5b more data, digest so far 0x16849bf1
2024-04-16T02:55:26.061+0000 7f73f6565700 10 osd.2 pg_epoch: 1921 pg[62.0( v 1921'255 (0'0,1921'255] local-lis/les=1915/1916 n=2 ec=1731/1731 lis/c=1915/1915 les/c/f=1916/1916/0 sis=1915) [2,3,6] r=0 lpr=1915 crt=1921'255 lcod 1916'253 mlcod 1916'253 active+clean+scrubbing+deep [ 62.0:  ]  TIME_FOR_DEEP ps=[21~1,25~2,2d~1,32~1,3b~1,3f~1,42~d,50~1,52~4,58~3,5e~2]] scrubber<Session/Act/BuildMap>: build_scrub_map_chunk [62:00000000::::0,MAX)  pos (0/3 62:07286d5f:::smithi045520396-5:5b deep) Deep: 1
2024-04-16T02:55:26.061+0000 7f73f6565700 10 osd.2 pg_epoch: 1921 pg[62.0( v 1921'255 (0'0,1921'255] local-lis/les=1915/1916 n=2 ec=1731/1731 lis/c=1915/1915 les/c/f=1916/1916/0 sis=1915) [2,3,6] r=0 lpr=1915 crt=1921'255 lcod 1916'253 mlcod 1916'253 active+clean+scrubbing+deep [ 62.0:  ]  TIME_FOR_DEEP ps=[21~1,25~2,2d~1,32~1,3b~1,3f~1,42~d,50~1,52~4,58~3,5e~2]] be_scan_list (0/3 62:07286d5f:::smithi045520396-5:5b deep)
...
2024-04-16T02:55:26.065+0000 7f73fa56d700 10 osd.2 pg_epoch: 1921 pg[62.0( v 1921'255 (0'0,1921'255] local-lis/les=1915/1916 n=2 ec=1731/1731 lis/c=1915/1915 les/c/f=1916/1916/0 sis=1915) [2,3,6] r=0 lpr=1915 crt=1921'255 lcod 1916'253 mlcod 1916'253 active+clean+scrubbing+deep [ 62.0:  ]  TIME_FOR_DEEP ps=[21~1,25~2,2d~1,32~1,3b~1,3f~1,42~d,50~1,52~4,58~3,5e~2]] be_scan_list (0/3 62:07286d5f:::smithi045520396-5:5b deep)
...
2024-04-16T02:55:26.065+0000 7f73f6565700 10 osd.2 pg_epoch: 1921 pg[62.0( v 1921'255 (0'0,1921'255] local-lis/les=1915/1916 n=2 ec=1731/1731 lis/c=1915/1915 les/c/f=1916/1916/0 sis=1915) [2,3,6] r=0 lpr=1915 crt=1921'255 lcod 1916'253 mlcod 1916'253 active+clean+scrubbing+deep [ 62.0:  ]  TIME_FOR_DEEP ps=[21~1,25~2,2d~1,32~1,3b~1,3f~1,42~d,50~1,52~4,58~3,5e~2]] be_scan_list (0/3 62:07286d5f:::smithi045520396-5:5b deep)
...
2024-04-16T02:55:26.065+0000 7f73fa56d700 10 osd.2 pg_epoch: 1921 pg[62.0( v 1921'255 (0'0,1921'255] local-lis/les=1915/1916 n=2 ec=1731/1731 lis/c=1915/1915 les/c/f=1916/1916/0 sis=1915) [2,3,6] r=0 lpr=1915 crt=1921'255 lcod 1916'253 mlcod 1916'253 active+clean+scrubbing+deep [ 62.0:  ]  TIME_FOR_DEEP ps=[21~1,25~2,2d~1,32~1,3b~1,3f~1,42~d,50~1,52~4,58~3,5e~2]] be_scan_list (0/3 62:07286d5f:::smithi045520396-5:5b deep)
...
2024-04-16T02:55:26.065+0000 7f73f6565700 10 osd.2 pg_epoch: 1921 pg[62.0( v 1921'255 (0'0,1921'255] local-lis/les=1915/1916 n=2 ec=1731/1731 lis/c=1915/1915 les/c/f=1916/1916/0 sis=1915) [2,3,6] r=0 lpr=1915 crt=1921'255 lcod 1916'253 mlcod 1916'253 active+clean+scrubbing+deep [ 62.0:  ]  TIME_FOR_DEEP ps=[21~1,25~2,2d~1,32~1,3b~1,3f~1,42~d,50~1,52~4,58~3,5e~2]] be_scan_list (0/3 62:07286d5f:::smithi045520396-5:5b deep)
...
2024-04-16T02:55:26.065+0000 7f73fa56d700 10 osd.2 pg_epoch: 1921 pg[62.0( v 1921'255 (0'0,1921'255] local-lis/les=1915/1916 n=2 ec=1731/1731 lis/c=1915/1915 les/c/f=1916/1916/0 sis=1915) [2,3,6] r=0 lpr=1915 crt=1921'255 lcod 1916'253 mlcod 1916'253 active+clean+scrubbing+deep [ 62.0:  ]  TIME_FOR_DEEP ps=[21~1,25~2,2d~1,32~1,3b~1,3f~1,42~d,50~1,52~4,58~3,5e~2]] be_scan_list (0/3 62:07286d5f:::smithi045520396-5:5b deep)
2024-04-16T02:55:26.065+0000 7f73fa56d700 10 osd.2 pg_epoch: 1921 pg[62.0( v 1921'255 (0'0,1921'255] local-lis/les=1915/1916 n=2 ec=1731/1731 lis/c=1915/1915 les/c/f=1916/1916/0 sis=1915) [2,3,6] r=0 lpr=1915 crt=1921'255 lcod 1916'253 mlcod 1916'253 active+clean+scrubbing+deep [ 62.0:  ]  TIME_FOR_DEEP ps=[21~1,25~2,2d~1,32~1,3b~1,3f~1,42~d,50~1,52~4,58~3,5e~2]] be_deep_scrub 62:07286d5f:::smithi045520396-5:5b pos (0/3 62:07286d5f:::smithi045520396-5:5b deep)
2024-04-16T02:55:26.065+0000 7f73fa56d700 20 osd.2 pg_epoch: 1921 pg[62.0( v 1921'255 (0'0,1921'255] local-lis/les=1915/1916 n=2 ec=1731/1731 lis/c=1915/1915 les/c/f=1916/1916/0 sis=1915) [2,3,6] r=0 lpr=1915 crt=1921'255 lcod 1916'253 mlcod 1916'253 active+clean+scrubbing+deep [ 62.0:  ]  TIME_FOR_DEEP ps=[21~1,25~2,2d~1,32~1,3b~1,3f~1,42~d,50~1,52~4,58~3,5e~2]] be_deep_scrub  62:07286d5f:::smithi045520396-5:5b done with data, digest 0xb1799809

It seems we call PGBackend::be_scan_list 7 times in order to scrub this object. This is because each call into be_scan_list also invokes be_deep_scrub which only scrubs a limited portion of the object before returning EINPROGRESS. The final one returns 0 indicating completion. So far so reasonable. The problem is simply that we call getattrs unconditionally each time. Combined with the behavior I describe above we have our bug.

Fix will be simple since this hasn't made it into a real release yet -- I'll have a PR out this evening.

If all of the members are on main/squid, all of them will end up calling be_scan_list the same number of times resulting in attributes repeated the same number of times -- that's why we haven't seen this in our normal runs.

Actions

Copy link

#16