Bug #3143
closedObsync object verification takes too long
0%
Description
In Summary:
obsync verification for each objects takes about 3 seconds - with a large amount of objects per bucket this adds a huge amount of time for verification.
obsync is theoretically parallelizable and potential speed gains could be massive.
DHO is happy to collab on/test improvements as they come along.
The following is what came in from DHO:
I observed obsync taking 3 seconds per object to verify equality - for a bucket with 700000 objects, as a customer is already trying to migrate, just verification would take on the order of 3 weeks if it had to be rerun, and as it is, just verifying (not even copying) 7000 objects is taking the better part of 10 hours.
As to potential improvements, a simple patch might be to allow a marker to allow skipping of known synced data.
Also, all of the information required to establish equality could be obtained from the GET Bucket requests on each end, as opposed to a HEAD request to each service for every object, so it would be possible to greatly speed up equality verification by simply enumerating objects on both services and discarding those that are present and equal in both.
Further, obsync is theoretically very parallelizable, so massive speed gains could be found there.
In the current state, however, my more recent testing suggests it is not ready for primetime as a migration tool under the loads we are already observing. (I'm fairly certain one customer is trying to migrate a backup on S3 of a linux filesystem). I would be happy to collaborate on or help test improvements as they come.