Project

General

Profile

Actions

Bug #52963

closed

pubsub: duplicate events are seen when there are more than 2 zones

Added by Yuval Lifshitz over 2 years ago. Updated over 1 year ago.

Status:
Won't Fix
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
pubsub
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

when there are N zones in a zonegroup, and 1 pubsub zones is is possible to see: N-1 instances of each bucket notification.
this is because notifications are sent for every object sync try, even if eventually the object does not need to be synced.

according to this email thread: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/DPPEPYPAWLQIRPRZAEJAWJ72S2W6INNN/
when there is more than one pubsub zone (2 in that case) there could be many more notifications

Actions #1

Updated by Alex Kershaw over 2 years ago

Some more detail re the two pubsub zones.

I ran an OSD database compaction on one side of our cluster and some manual deep scrubs following that, and saw the pubsub events go through the roof again - hit 2.5M or so events before I powered off the second site at which point they stopped. The events seemed to be continuing to grow even after the scrubs were finished.

--- POOLS ---
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
device_health_metrics 1 1 1.3 KiB 0 3.8 KiB 0 176 GiB
cephfs_data 2 32 109 GiB 1.58M 328 GiB 38.34 176 GiB
cephfs_metadata 3 32 3.9 GiB 1.67M 12 GiB 2.20 176 GiB
siteB.rgw.buckets.data 4 32 19 GiB 735.12k 60 GiB 10.17 176 GiB
.rgw.root 5 4 56 KiB 47 552 KiB 0 176 GiB
siteB.rgw.log 6 4 397 MiB 1.63k 1.2 GiB 0.22 176 GiB
siteB.rgw.control 7 4 0 B 8 0 B 0 176 GiB
siteB.rgw.meta 8 4 19 KiB 37 443 KiB 0 176 GiB
siteB.rgw.buckets.index 10 4 818 MiB 31 2.4 GiB 0.45 176 GiB
siteBpubsub.rgw.log 15 4 338 MiB 787 1017 MiB 0.19 176 GiB
siteBpubsub.rgw.control 16 4 0 B 8 0 B 0 176 GiB
siteBpubsub.rgw.meta 17 4 11 KiB 40 452 KiB 0 176 GiB
siteBpubsub.rgw.buckets.index 18 4 4.0 GiB 47 12 GiB 2.23 176 GiB
siteBpubsub.rgw.buckets.data 19 4 770 MiB 2.13M 24 GiB 4.41 176 GiB

The events seem to be the same as in the email trail - lots of old events that I can only presume are S3 objects being moved around as a result of the compaction/scrubbing, with lots of duplicates also.

Actions #2

Updated by Yuval Lifshitz over 1 year ago

  • Backport deleted (pacific, octopus)
  • Pull request ID set to 48996

pubsub functionality removed

Actions #3

Updated by Yuval Lifshitz over 1 year ago

  • Status changed from New to Won't Fix
Actions

Also available in: Atom PDF