Project

General

Profile

Subtask #5433

Feature #4929: Erasure encoded placement group

Factor out the ReplicatedPG object replication and client IO logic as a PGBackend interface

Added by Loïc Dachary over 10 years ago. Updated about 10 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
OSD
Target version:
-
% Done:

100%

Spent time:
Source:
Development
Tags:
Backport:
Reviewed:
Affected Versions:
Pull request ID:

Description

work in progress

Moving code PG <=> ReplicatedPG

Prior to defining the interface, the code is moved to/from PG/ReplicatedPG so that ReplicatedPG only contains replica specific code.

  • scrubbing : The scrubbing logic (normal & deep) belongs to PG. The tests to figure out if an object is ok belong to PGBackend : bool normal_test(hobject_t) & bool deep_test(hobject_t). The normal test only verifies the size / existence of replica / chunks. The deep test computes and compares replicas checksums. In the case of erasure code it computes the checksum of each chunk and compares them with the one stored as an attribute of the chunk. Since erasure code only supports full writes / append there is no significant penalty in computing the checksum each time the object is modified. It would be a problem for replicas when objects are modified with partial writes.
  • missing objects : Figuring out which objects are missing from which OSDs and updating the pg_missing_t belongs to PG and PGLog which is a member of PG. Recovering a missing object is done by a series of operations that is provided by PGBackend. PG asks PGBackend for the operations to recover a given object and then either pushes ( if it is missing from a replica ) or pulls ( if it is missing from the primary ). A full write to / from the peer who has a copy of the object is enough for the replica backend. For erasure code the backend will also need to first restore the missing object.
  • backfilling : not sure yet
  • object context : The ObjectContext and SnapSetContext registry is common to both the replica and the erasure code backend. Only the erasure code backend won't use the SnapSetContext because it is not supported. PG knows nothing about this registry. Once the #5510 pull request is merged, it will be ready to be moved to a PGBackend base class.

Class division

  • class PG ( has a PGBackendInterface* member )
  • class ReplicatedPGBackend : public PGBackend
  • class PGBackend : PGBackendInterface ( has a PG* member )
  • PGBackendInterface ( abstract )
  • PGBackendInterface* pg_backend_factory

TODO

  • Sketch out what should be moved between PG and ReplicatedPG
  • Create a PGBackend interface: this will be the interface used by PG for handling client IO and replication
  • Refactor ReplicatedPG logic in terms of PGBackend
  • Write tests for the ReplicatedPG PGBackend implementation

Obsolete in this context


Related issues

Related to Ceph - Feature #5990: EC: [link] Factor out the ReplciatedPG object replication and client write paths Resolved 08/15/2013

History

#1 Updated by Loïc Dachary over 10 years ago

  • Description updated (diff)

#2 Updated by Loïc Dachary over 10 years ago

  • Description updated (diff)

#3 Updated by Loïc Dachary over 10 years ago

  • Description updated (diff)

#4 Updated by Loïc Dachary over 10 years ago

  • Description updated (diff)

#5 Updated by Loïc Dachary over 10 years ago

  • Parent task changed from #5046 to #4929
Sam's comments:
  • It is better to do the changes within the same file to better read the diffs
  • The changes are in the right direction but they will need to be smaller

#6 Updated by Samuel Just over 10 years ago

  • Subject changed from Factor out PG::RecoveryState to Factor out the ReplicatedPG object replication and client IO logic as a PGBackend interface

I've reconsidered a bit, and I think it would be easier to leave most of the PG implementation where it is and factor out all of the ReplicatedPG logic (handling client IO, recovery) as implementing a PGBackend interface used by PG. This will be similar to the initial approach of cleaning up the PG/ReplicatedPG divison and implementing an ErasureCodedPG, except that we will be able to test the replicated and erasure coded implementations of the PGBackend through the PGBackend interface. This will allow us to more easily maintain a single implementation for most of the scrub, peering, and backfill logic (which should be fundamentally similar for both PG implementations).

[15:46:25] <sjustlaptop> the more I think about it, the more it became clear that factoring out the PG RecoveryState logic was going to be a nightmare
[15:46:32] <sjustlaptop> instead, we should essentially factor out everything else
[15:47:01] <sjustlaptop> nearly all of what we've done so far still applies
[15:48:44] <loicd> sjustlaptop: that makes sense to me. So PG.{cc,h} as it is would essentially become the RecoveryState & supporting functions. And what does not belong gets factored out. Is this what you mean ? 
[15:49:41] -*- loicd slightly paraphrasing http://tracker.ceph.com/issues/5433#note-6 :-)
[15:50:09] <sjustlaptop> loicd: yeah, that's the gist of it
[15:50:18] <sjustlaptop> also, backfill, scrub stay where they are
[15:50:31] <sjustlaptop> some replication logic (e.g., what to replicate and when) will move from ReplicatedPG to PG
[15:50:41] <sjustlaptop> how to replicate will remain in ReplicatedPG
[15:51:59] <sjustlaptop> correction: backfill is in ReplicatedPG, some of that logic will float up to PG
[15:52:28] <loicd> sjustlaptop: so ReplicatedPG would no longer exist as a derived class of PG. It would become a PGBackend from which PGReplicatedBackend and PGErasureCodeBackend are derived. And PG would use the PGBackend interface ?
[15:52:40] <sjustlaptop> yeah
[15:52:47] <loicd> I like that :-)
[15:52:50] <sjustlaptop> and hopefully, the PGBackends won't need a PG interface to work with at all
[15:53:38] <loicd> I'll go in this direction sjustlaptop , thanks :-)
[15:53:38] <sjustlaptop> that's my main beef with the current ReplicatedPG/PG setup, ReplicatedPG knows way too much about the contents of PG

#7 Updated by Samuel Just over 10 years ago

  • Description updated (diff)

#8 Updated by Loïc Dachary over 10 years ago

  • Description updated (diff)

#9 Updated by Loïc Dachary over 10 years ago

  • Description updated (diff)

#10 Updated by Loïc Dachary over 10 years ago

  • Description updated (diff)

#11 Updated by Loïc Dachary over 10 years ago

  • Description updated (diff)

#12 Updated by Loïc Dachary over 10 years ago

  • Description updated (diff)

#13 Updated by Loïc Dachary over 10 years ago

  • Description updated (diff)

#14 Updated by Loïc Dachary over 10 years ago

  • Description updated (diff)

#15 Updated by Loïc Dachary over 10 years ago

  • Assignee deleted (Loïc Dachary)

#16 Updated by Samuel Just over 10 years ago

  • Status changed from In Progress to 12

#17 Updated by Loïc Dachary about 10 years ago

  • Status changed from 12 to Rejected
  • translation missing: en.field_remaining_hours set to 0.0

Done elsewhere

#18 Updated by Loïc Dachary about 10 years ago

  • % Done changed from 0 to 100
  • Estimated time set to 0.00 h

Also available in: Atom PDF