Project

General

Profile

Osd - Scrub and Repair » History » Version 5

Shinobu Kinjo, 05/20/2016 06:02 AM

1 3 Loïc Dachary
h3. Osd - Scrub and Repair
2 1 Jessica Mack
3
h3. Summary
4
5
Current scrub and repair is fairly primitive.  There are several improvements which need to be made:
6
1) There needs to be a way to query the results of the most recent scrub on a pg.
7
2) The user should be able to query the contents of the replica objects in the event of an inconsistency (including data payload, xattrs, and omap).  This probably coopts the existing replica read machinery.
8
3) The user should be able to specify which replica to use for repair using the above information.
9
10
h3. Owners
11
12
* Samuel Just (Red Hat)
13
* Name (Affiliation)
14
* Name
15
16
h3. Interested Parties
17
18
* Guang Yang (Yahoo!)
19
* Loic Dachary (Red Hat)
20
* Danny Al-Gaaf (Deutsche Telekom)
21 5 Shinobu Kinjo
* Shinobu Kinjo (Red Hat)
22 1 Jessica Mack
* Name
23
24
h3. Current Status
25
26
There are scrub and repair mechanisms, this blueprint aims to expand and improve them.
27 2 Jessica Mack
28
h3. Detailed Description
29
30 1 Jessica Mack
On the osd side, the first change is that the primary needs to track the inconsistency information as scrub progresses.  As this might involve a large number of objects (though probably not), we do not want to keep this in memory.  I suggest storing the information in a per-pg scratch object which is cleared during peering reset.  The machinery used for the SnapMapper object can be re-used to handle maintaining a cache of unstable keys.
31
 
32
Next, I suggest adding a librados interface to ferry that information out to the user:
33
<pre>
34
 
35
/// get currently inconsistent pgs
36
void get_inconsistent_pgs(
37
  pg_t last,            ///< [in] list pgs > last
38
  list<pg_t> *out   ///< [out] listed pgs
39
  );
40
 
41
/// get information about inconsistent objects in a pg
42
bool query_inconsistent_pg(
43
  pg_t to_query,                      ///< [in] pg to query
44
  pair<string, string> last,       ///< [in] begin listing objects > last, (locator, object)
45
  epoch_t *activation_epoch, ///< [out] activation epoch for the interval in which this query was serviced
46
  list<pair<pair<string, string>, inconsistent_info_t> *out ///< [out] listed inconsistency information
47
  ); ///< @return true iff primary has a populated scrub info structure
48
 
49
inconsistent_info_t will include all relevant information about each inconsistent object.
50
 
51
/// allows directed repair of an object
52
void repair_inconsistent_object(
53
  pg_t pg_to_repair,                   ///< [in] pg in which we want to perform a repair
54
  pair<string, string> to_repair, ///< [in] object to repair
55
  replica_t replica_to_use,        ///< [in] replica to use for repair
56
  epoch_t activation_epoch,     ///< [in] from query_inconsistent_pg
57
  ...                                              ///< stuff to support async?
58
  );
59
</pre>
60
61
The repair_inconsistent_object return machinery will return an -EAGAIN (or some other error) if activation_epoch is prior to the primary's current interval activation_epoch.  This ensures that the replica_to_use value is not out of date.
62
 
63
We'll also need interfaces to allow reading based on replica_t, but we don't want to duplicate all of the current read interfaces.  Possibly a ioctx method to set replica_to_use, pg_to_repair, and activation_epoch?