Bug #55390
closedrgw-ms/resharding: Observing sync inconsistencies ~50K out of 20M objects, did not sync.
0%
Files
Updated by Vidushi Mishra about 2 years ago
1. ceph version 17.0.0-10783-ge38464a1 (e38464a10ae9e8c7b43bae5a9a7395eb2cbb2444) quincy (dev)
2. Steps to reproduce:
i. Create a multi-site with 14 rgws on each site [4 for ms sync and 10 for client IO.]
ii. The 4 rgws for ms sync are not behind any LB.
iii. create a bucket 'test-sync-no-lb-1' and upload 20M objects [10M from either site.]
iv. Wait for the workload to complete.
v. Monitor sync, and wait for the sync to complete.
3. Result:
i. We observe 411672 objects not synced to the secondary.
ii. sync status on the secondary site reports 128 shards recovering.
4. Additional info:
i. On both the sites, 'ceph s' shows pgs are in backfilling state. http://magna002.ceph.redhat.com/ceph-qe-logs/vidushi/upstream-dbr-2022/55390/bucket-stats-sec
ii. logs:
- period get : http://magna002.ceph.redhat.com/ceph-qe-logs/vidushi/upstream-dbr-2022/55390/period-get
- primary bucket stats : http://magna002.ceph.redhat.com/ceph-qe-logs/vidushi/upstream-dbr-2022/55390/bucket-stats-pri
- secondary bucket stats
- ceph status secondary- http://magna002.ceph.redhat.com/ceph-qe-logs/vidushi/upstream-dbr-2022/55390/ceph-s_sec
Updated by Vidushi Mishra about 2 years ago
system details and config:
http://magna002.ceph.redhat.com/ceph-qe-logs/vidushi/upstream-dbr-2022/55390/system_details
cosbench workload:
- http://magna002.ceph.redhat.com/ceph-qe-logs/vidushi/upstream-dbr-2022/55390/cosbench/
- workload ID: http://10.8.128.16:19088/controller/workload.html?id=w54
http://10.8.128.100:19088/controller/workload.html?id=w49
Updated by Vidushi Mishra about 2 years ago
We see error like failed to sync object(2300) Unknown error 2300 in the 'radosgw-admin sync error list' on the secondary site.
Updated by Casey Bodley almost 2 years ago
- Status changed from New to Can't reproduce
if the cluster isn't healthy, we can't really treat this as an rgw bug