Feature #13505
open
scrub/repair: persist scrub results.
Added by Kefu Chai over 8 years ago.
Updated almost 4 years ago.
Description
- write out temp object as scrub goes. with key of object name, value will present what's wrong with the object,
object name => whats_wrong: inconsistency_t
inconsistency_t:
most recent log version, prior version
osd_id => shard_info_t
shard_info_t
- exists
- omap_sha1
- data_sha1
- size
- xattrs -> useronly
missing on clone -> snapset
- object_info_t
- data error?
- metadata error?
- use pagination when querying the scrub result.
- should always pass epic of the begin of the interval in the scrub APIs. if the epoch passes, EAGAIN is returned.
1. dump above metadata related to scrub/repair in the form of temp object, (they are already in the scrub map)
2. add simple pg command to dump it
3. add teuthology test accordingly
- Description updated (diff)
- Description updated (diff)
- Description updated (diff)
- Subject changed from new scrub and repair to scrub/repair: persist scrub results.
There are some scrub errors which are not related to a specific object or involve multiple objects.
1. The pg_stat_t (object_stat_sum_t) contains stats for the pg as a whole. Needs to be fixed last.
2. A missing SnapSet in a head object requires rebuilding the SnapSet or removing all clones. Are the clones in error or the head object?
3. A corruption of the clone_overlap requires clone_size to be repaired first. We could use a hierarchy of inconsistencies.
For the first stage of this change, we should worry about object data and omap inconsistencies keeping in mind some of these more complex error types will be handled later. For pg_stat_t we could just have repair run after the last object is repaired.
- Status changed from New to In Progress
- Assignee set to Kefu Chai
2. A missing SnapSet in a head object requires rebuilding the SnapSet or removing all clones. Are the clones in error or the head object?
they will be in the error.
- Target version set to v10.0.4
wondering how can we fix the snapset inconsistencies like
- snap missing in snapset,clone_overlap
- snapset.clone_size mismatches with snapset.clone_overlap
for the first problem, probably the simplest way is to remove the impacted snap. while the second problem is either caused by a bug or bitrot of the authorised replica. if it's the case of bitrot, #13509 would be helpful. otherwise we can hardly tell which replica is the correct copy without using some heuristic magic in PGBackend::be_select_auth_object()
,
dzafman, i found that @ReplicatedPG::_scrub()
are repeating the check for missing/corrupted OI_ATTR done by PGBackend::be_select_auth_object(), is this on purpose?
note to myself, in the last discussion with david, he advised that we should not overwrite the scrub result of deep scrub with the shallow one. considering an OSD with low workload, the shallow scrub is performed once a day, while the deep scrub is performed once a week. so on the week end the deep scrub result overwrites the shallow scrub result. hence some of the discrepancies are overlooked.
- data_digest_mismatch
- omap_digest_mismatch
- read_error
if the content of the object/omap in question is rewritten after the deep scrub and before we do the repair, the error is very likely persists.
to implement this feature, we can have two omap entries for each object. one for shallow errors, the other for deep errors. and the deep scrub can rewrite both of them, while the shallow scrub can only overwrite the former one.
Kefu Chai wrote:
dzafman, i found that @ReplicatedPG::_scrub()
are repeating the check for missing/corrupted OI_ATTR done by PGBackend::be_select_auth_object(), is this on purpose?
It is true that ReplicatedPG::_scrub()
is called with an authmap selected in PGBackend::be_select_auth_object() that is present and decodes. Since this code is moving into user mode, we don't need to fix it now. Those particular checks should have been asserts. When I fixed _scrub()
I was obsessed with not letting a corruption cause an OSD to assert during scrubbing. But in this case it shouldn't be possible.
should not return the scrub result if the scrub is still in progress. we can
- check the status of pg before serving the scrubls pg command, or
- add a sentry scrub object at the end of scrub
- Copied to Feature #14860: scrub/repair: persist scrub results (do not overwrite deep scrub results with non-deep scrub) added
- Copied to deleted (Feature #14860: scrub/repair: persist scrub results (do not overwrite deep scrub results with non-deep scrub))
- Related to Feature #14860: scrub/repair: persist scrub results (do not overwrite deep scrub results with non-deep scrub) added
- Assignee deleted (
Kefu Chai)
- Target version deleted (
v10.0.4)
Unsetting old target version for open tickets.
Also available in: Atom
PDF