Project

General

Profile

Subtask #5433

Updated by Loïc Dachary over 10 years ago

"work in progress":https://github.com/dachary/ceph/tree/wip-5433

h3. Moving code PG <=> ReplicatedPG

Prior to defining the interface, the code is moved to/from PG/ReplicatedPG so that ReplicatedPG only contains replica specific code.

* *scrubbing* : The scrubbing logic (normal & deep) belongs to PG. The tests to figure out if an object is ok belong to PGBackend : bool normal_test(hobject_t) & bool deep_test(hobject_t). The normal test only verifies the size / existence of replica / chunks. The deep test computes and compares replicas checksums. In the case of erasure code it computes the checksum of each chunk and compares them with the one stored as an attribute of the chunk. Since erasure code only supports full writes / append there is no significant penalty in computing the checksum each time the object is modified. It would be a problem for replicas when objects are modified with partial writes.
* *missing objects* : Figuring out which objects are missing from which OSDs and updating the pg_missing_t belongs to PG and PGLog which is a member of PG. Recovering a missing object is done by a series of operations that is provided by PGBackend. PG asks PGBackend for the operations to recover a given object and then either pushes ( if it is missing from a replica ) or pulls ( if it is missing from the primary ). A full write to / from the peer who has a copy of the object is enough for the replica backend. For erasure code the backend will also need to first restore the missing object.
* *backfilling* : "not sure yet":http://dachary.org/?p=2182
* *object context* : The ObjectContext and SnapSetContext registry is common to both the replica and the erasure code backend. Only the erasure code backend won't use the SnapSetContext because it is not supported. PG knows nothing about this registry.


h3. Class division

* class PG ( has a PGBackendInterface* member )
* class ReplicatedPGBackend : public PGBackend
* class PGBackend : PGBackendInterface ( has a PG* member )
* PGBackendInterface ( abstract )
* PGBackendInterface* pg_backend_factory

h3. TODO

* Sketch out what should be moved between PG and ReplicatedPG
* Create a PGBackend interface: this will be the interface used by PG for handling client IO and replication
* Refactor ReplicatedPG logic in terms of PGBackend
* Write tests for the ReplicatedPG PGBackend implementation

h3. Obsolete in this context

* "Attempt to factor out part of the RecoveryState":https://github.com/dachary/ceph/commit/431147bd59e5bf9f8cb28c51c1442ce61412be0c
* "Separating Peering from PG":http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/15755
* "Ceph Placement Groups peering":http://dachary.org/?p=2061
* "Description of the Peering Process":https://github.com/ceph/ceph/blob/b89d7420e3501247d6ed282d2253c95c758526b1/doc/dev/peering.rst

Back