Project

General

Profile

Feature #13505

scrub/repair: persist scrub results.

Added by Kefu Chai over 3 years ago. Updated over 1 year ago.

Status:
In Progress
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
10/16/2015
Due date:
% Done:

0%

Source:
other
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description


- write out temp object as scrub goes. with key of object name, value will present what's wrong with the object,
 object name => whats_wrong: inconsistency_t
   inconsistency_t:
    most recent log version, prior version
    osd_id => shard_info_t
      shard_info_t
      - exists
      - omap_sha1
      - data_sha1
      - size
      - xattrs -> useronly
      missing on clone -> snapset
      - object_info_t
      - data error?
      - metadata error?
- use pagination when querying the scrub result.
- should always pass epic of the begin of the interval in the scrub APIs. if the epoch passes, EAGAIN is returned.

1. dump above metadata related to scrub/repair in the form of temp object, (they are already in the scrub map)
2. add simple pg command to dump it
3. add teuthology test accordingly


Related issues

Related to Ceph - Feature #13506: scrub/repair: add librados APIs New 10/16/2015
Related to Ceph - Feature #13507: scrub APIs to read replica New 10/16/2015
Related to Ceph - Feature #13508: scrub/repair: repair corrupted/missing objects In Progress 01/13/2016
Related to Ceph - Feature #14860: scrub/repair: persist scrub results (do not overwrite deep scrub results with non-deep scrub) Duplicate 10/16/2015

Associated revisions

Revision cb4efbd7 (diff)
Added by Kefu Chai almost 3 years ago

librados: add get_inconsistent_pgs() to librados

to list the inconsistent PGs of given pool, it's a wrapper
around the "ceph pg ls" command.

Fixes: #13505
Signed-off-by: Kefu Chai <>

Revision b43d4809 (diff)
Added by Kefu Chai almost 3 years ago

librados: add `inconsistent_obj_t` types

which present the inconsistent objects found in scrub

Fixes: #13505
Signed-off-by: Kefu Chai <>

Revision 3dea4f1f (diff)
Added by Kefu Chai almost 3 years ago

osd: add CEPH_OSD_OP_SCRUBLS pg op

it is a new pg op which returns the encoded objects stored when
scrubbing.

Fixes: #13505
Signed-off-by: Kefu Chai <>

Revision dfc2f482 (diff)
Added by Kefu Chai almost 3 years ago

librados: add get_inconsistent_objects() API

Fixes: #13505
Signed-off-by: Kefu Chai <>

Revision c9b593d2 (diff)
Added by Kefu Chai almost 3 years ago

librados: add get_inconsistent_snapsets() API

Fixes: #13505
Signed-off-by: Kefu Chai <>

Revision e2374c43 (diff)
Added by Kefu Chai almost 3 years ago

rados: add "list-inconsistent-snapset" cmd

to list inconsistent snapsets of a given PG, this command exposes
get_inconsistent_snapsets() rados API to user.

Fixes: #13505
Signed-off-by: Kefu Chai <>

History

#1 Updated by Kefu Chai over 3 years ago

  • Description updated (diff)

#2 Updated by Kefu Chai over 3 years ago

  • Description updated (diff)

#3 Updated by Kefu Chai over 3 years ago

  • Description updated (diff)

#4 Updated by Kefu Chai over 3 years ago

  • Subject changed from new scrub and repair to scrub/repair: persist scrub results.

#5 Updated by David Zafman about 3 years ago

There are some scrub errors which are not related to a specific object or involve multiple objects.

1. The pg_stat_t (object_stat_sum_t) contains stats for the pg as a whole. Needs to be fixed last.
2. A missing SnapSet in a head object requires rebuilding the SnapSet or removing all clones. Are the clones in error or the head object?
3. A corruption of the clone_overlap requires clone_size to be repaired first. We could use a hierarchy of inconsistencies.

For the first stage of this change, we should worry about object data and omap inconsistencies keeping in mind some of these more complex error types will be handled later. For pg_stat_t we could just have repair run after the last object is repaired.

#6 Updated by Kefu Chai about 3 years ago

  • Status changed from New to In Progress
  • Assignee set to Kefu Chai

#7 Updated by Kefu Chai about 3 years ago

2. A missing SnapSet in a head object requires rebuilding the SnapSet or removing all clones. Are the clones in error or the head object?

they will be in the error.

#8 Updated by David Zafman about 3 years ago

  • Target version set to v10.0.4

#9 Updated by Loic Dachary about 3 years ago

#10 Updated by Kefu Chai about 3 years ago

wondering how can we fix the snapset inconsistencies like

  1. snap missing in snapset,clone_overlap
  2. snapset.clone_size mismatches with snapset.clone_overlap

for the first problem, probably the simplest way is to remove the impacted snap. while the second problem is either caused by a bug or bitrot of the authorised replica. if it's the case of bitrot, #13509 would be helpful. otherwise we can hardly tell which replica is the correct copy without using some heuristic magic in PGBackend::be_select_auth_object(),

dzafman, i found that @ReplicatedPG::_scrub() are repeating the check for missing/corrupted OI_ATTR done by PGBackend::be_select_auth_object(), is this on purpose?

#11 Updated by Kefu Chai about 3 years ago

note to myself, in the last discussion with david, he advised that we should not overwrite the scrub result of deep scrub with the shallow one. considering an OSD with low workload, the shallow scrub is performed once a day, while the deep scrub is performed once a week. so on the week end the deep scrub result overwrites the shallow scrub result. hence some of the discrepancies are overlooked.

  • data_digest_mismatch
  • omap_digest_mismatch
  • read_error

if the content of the object/omap in question is rewritten after the deep scrub and before we do the repair, the error is very likely persists.

to implement this feature, we can have two omap entries for each object. one for shallow errors, the other for deep errors. and the deep scrub can rewrite both of them, while the shallow scrub can only overwrite the former one.

#12 Updated by David Zafman about 3 years ago

Kefu Chai wrote:

dzafman, i found that @ReplicatedPG::_scrub() are repeating the check for missing/corrupted OI_ATTR done by PGBackend::be_select_auth_object(), is this on purpose?

It is true that ReplicatedPG::_scrub() is called with an authmap selected in PGBackend::be_select_auth_object() that is present and decodes. Since this code is moving into user mode, we don't need to fix it now. Those particular checks should have been asserts. When I fixed _scrub() I was obsessed with not letting a corruption cause an OSD to assert during scrubbing. But in this case it shouldn't be possible.

#13 Updated by Kefu Chai about 3 years ago

should not return the scrub result if the scrub is still in progress. we can

  • check the status of pg before serving the scrubls pg command, or
  • add a sentry scrub object at the end of scrub

#14 Updated by Kefu Chai almost 3 years ago

  • Copied to Feature #14860: scrub/repair: persist scrub results (do not overwrite deep scrub results with non-deep scrub) added

#15 Updated by Kefu Chai almost 3 years ago

  • Copied to deleted (Feature #14860: scrub/repair: persist scrub results (do not overwrite deep scrub results with non-deep scrub))

#16 Updated by Kefu Chai almost 3 years ago

  • Related to Feature #14860: scrub/repair: persist scrub results (do not overwrite deep scrub results with non-deep scrub) added

#17 Updated by Kefu Chai over 1 year ago

  • Assignee deleted (Kefu Chai)

Also available in: Atom PDF