Bug #24775
rgw-multisite: radosgw-admin misreport sync status
0%
Description
Recently we get a situation where data in different zones are far away from being consistency, but `radosgw-admin sync status` shows each zone has caught up with others.
We upload data in zone A, and other zones just sync data from zone A. After all data have been uploaded in zone A successfully, run `radosgw-admin bucket stats --bucket=<bucket>` in other zones show data size in those zones are increasing. However, `radosgw-admin sync status` in these zones show they have caught up with zone A.
History
#1 Updated by Xinying Song over 5 years ago
#2 Updated by Casey Bodley over 5 years ago
- Assignee set to Casey Bodley
#3 Updated by Casey Bodley over 5 years ago
- Status changed from New to 7
#4 Updated by Casey Bodley over 5 years ago
- Backport set to jewel luminous mimic
#5 Updated by Xinying Song over 5 years ago
This problem seems not so easy to fix as above pr.
The RGWDataSyncSingleEntryCR is spawned in three ways:
1. out-band data notify
2. keys recorded in error_repo
3. normal log records read from peer zone
Only the third way have the opportunity to update the sync marker. However, due to the bucket lock, RGWDataSyncSingleEntryCR spawned in the third way will fail with retcode -16, which means failed to get a lock. So if we update sync marker only when the retcode 0, there will be a lot of misreports too.
The root cause for misreport is because we don't know if a out-band notify have successfully finished the sync job. I think we should find another way to fix it.
#6 Updated by Patrick Donnelly over 4 years ago
- Status changed from 7 to Fix Under Review