Feature #13505: scrub/repair: persist scrub results. - Ceph - Ceph

Actions

Copy link

Feature #13505

open

scrub/repair: persist scrub results.

Added by Kefu Chai over 8 years ago. Updated almost 4 years ago.

Status:

In Progress

Priority:

Normal

Assignee:

Category:

Target version:

% Done:

Source:

other

Tags:

Backport:

Reviewed:

Affected Versions:

Pull request ID:

Description


- write out temp object as scrub goes. with key of object name, value will present what's wrong with the object,
 object name => whats_wrong: inconsistency_t
   inconsistency_t:
    most recent log version, prior version
    osd_id => shard_info_t
      shard_info_t
      - exists
      - omap_sha1
      - data_sha1
      - size
      - xattrs -> useronly
      missing on clone -> snapset
      - object_info_t
      - data error?
      - metadata error?
- use pagination when querying the scrub result.
- should always pass epic of the begin of the interval in the scrub APIs. if the epoch passes, EAGAIN is returned.

1. dump above metadata related to scrub/repair in the form of temp object, (they are already in the scrub map)
2. add simple pg command to dump it
3. add teuthology test accordingly

Related issues 4 (3 open — 1 closed)

Actions

Copy link

Updated by Kefu Chai over 8 years ago

Description updated (diff)

Actions

Copy link

Updated by Kefu Chai over 8 years ago

Description updated (diff)

Actions

Copy link

Updated by Kefu Chai over 8 years ago

Description updated (diff)

Actions

Copy link

Updated by Kefu Chai over 8 years ago

Subject changed from new scrub and repair to scrub/repair: persist scrub results.

Actions

Copy link

Updated by David Zafman over 8 years ago

There are some scrub errors which are not related to a specific object or involve multiple objects.

1. The pg_stat_t (object_stat_sum_t) contains stats for the pg as a whole. Needs to be fixed last.
2. A missing SnapSet in a head object requires rebuilding the SnapSet or removing all clones. Are the clones in error or the head object?
3. A corruption of the clone_overlap requires clone_size to be repaired first. We could use a hierarchy of inconsistencies.

For the first stage of this change, we should worry about object data and omap inconsistencies keeping in mind some of these more complex error types will be handled later. For pg_stat_t we could just have repair run after the last object is repaired.

Actions

Copy link

Updated by Kefu Chai over 8 years ago

Status changed from New to In Progress
Assignee set to Kefu Chai

Actions

Copy link

Updated by Kefu Chai over 8 years ago

2. A missing SnapSet in a head object requires rebuilding the SnapSet or removing all clones. Are the clones in error or the head object?

they will be in the error.

Actions

Copy link

Updated by David Zafman over 8 years ago

Target version set to v10.0.4

Actions

Copy link

Updated by Loïc Dachary over 8 years ago

Draft implementation at https://github.com/ceph/ceph/pull/6898

Actions

Copy link

#10

Updated by Kefu Chai over 8 years ago

wondering how can we fix the snapset inconsistencies like

snap missing in snapset,clone_overlap
snapset.clone_size mismatches with snapset.clone_overlap

for the first problem, probably the simplest way is to remove the impacted snap. while the second problem is either caused by a bug or bitrot of the authorised replica. if it's the case of bitrot, #13509 would be helpful. otherwise we can hardly tell which replica is the correct copy without using some heuristic magic in PGBackend::be_select_auth_object(),

dzafman, i found that @ReplicatedPG::_scrub() are repeating the check for missing/corrupted OI_ATTR done by PGBackend::be_select_auth_object(), is this on purpose?

Actions

Copy link

#11

Updated by Kefu Chai over 8 years ago

note to myself, in the last discussion with david, he advised that we should not overwrite the scrub result of deep scrub with the shallow one. considering an OSD with low workload, the shallow scrub is performed once a day, while the deep scrub is performed once a week. so on the week end the deep scrub result overwrites the shallow scrub result. hence some of the discrepancies are overlooked.

data_digest_mismatch
omap_digest_mismatch
read_error

if the content of the object/omap in question is rewritten after the deep scrub and before we do the repair, the error is very likely persists.

to implement this feature, we can have two omap entries for each object. one for shallow errors, the other for deep errors. and the deep scrub can rewrite both of them, while the shallow scrub can only overwrite the former one.

Actions

Copy link

#12

Updated by David Zafman over 8 years ago

Kefu Chai wrote:

dzafman, i found that @ReplicatedPG::_scrub() are repeating the check for missing/corrupted OI_ATTR done by PGBackend::be_select_auth_object(), is this on purpose?

It is true that ReplicatedPG::_scrub() is called with an authmap selected in PGBackend::be_select_auth_object() that is present and decodes. Since this code is moving into user mode, we don't need to fix it now. Those particular checks should have been asserts. When I fixed _scrub() I was obsessed with not letting a corruption cause an OSD to assert during scrubbing. But in this case it shouldn't be possible.

Actions

Copy link

#13