Bug #52963
closedpubsub: duplicate events are seen when there are more than 2 zones
0%
Description
when there are N zones in a zonegroup, and 1 pubsub zones is is possible to see: N-1 instances of each bucket notification.
this is because notifications are sent for every object sync try, even if eventually the object does not need to be synced.
according to this email thread: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/DPPEPYPAWLQIRPRZAEJAWJ72S2W6INNN/
when there is more than one pubsub zone (2 in that case) there could be many more notifications
Updated by Alex Kershaw over 2 years ago
Some more detail re the two pubsub zones.
I ran an OSD database compaction on one side of our cluster and some manual deep scrubs following that, and saw the pubsub events go through the roof again - hit 2.5M or so events before I powered off the second site at which point they stopped. The events seemed to be continuing to grow even after the scrubs were finished.
--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
device_health_metrics 1 1 1.3 KiB 0 3.8 KiB 0 176 GiB
cephfs_data 2 32 109 GiB 1.58M 328 GiB 38.34 176 GiB
cephfs_metadata 3 32 3.9 GiB 1.67M 12 GiB 2.20 176 GiB
siteB.rgw.buckets.data 4 32 19 GiB 735.12k 60 GiB 10.17 176 GiB
.rgw.root 5 4 56 KiB 47 552 KiB 0 176 GiB
siteB.rgw.log 6 4 397 MiB 1.63k 1.2 GiB 0.22 176 GiB
siteB.rgw.control 7 4 0 B 8 0 B 0 176 GiB
siteB.rgw.meta 8 4 19 KiB 37 443 KiB 0 176 GiB
siteB.rgw.buckets.index 10 4 818 MiB 31 2.4 GiB 0.45 176 GiB
siteBpubsub.rgw.log 15 4 338 MiB 787 1017 MiB 0.19 176 GiB
siteBpubsub.rgw.control 16 4 0 B 8 0 B 0 176 GiB
siteBpubsub.rgw.meta 17 4 11 KiB 40 452 KiB 0 176 GiB
siteBpubsub.rgw.buckets.index 18 4 4.0 GiB 47 12 GiB 2.23 176 GiB
siteBpubsub.rgw.buckets.data 19 4 770 MiB 2.13M 24 GiB 4.41 176 GiB
The events seem to be the same as in the email trail - lots of old events that I can only presume are S3 objects being moved around as a result of the compaction/scrubbing, with lots of duplicates also.
Updated by Yuval Lifshitz over 1 year ago
- Backport deleted (
pacific, octopus) - Pull request ID set to 48996
pubsub functionality removed