Project

General

Profile

Actions

Bug #47129

open

[pubsub] Pubsub event can not be acked

Added by David Piper over 3 years ago. Updated almost 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Details of the current setup:
  • ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable)
  • running in a containerized ceph cluster
  • two 3-node clusters using Ceph RGW Multisite. Called "siteA" and "siteB". Each site has a single RGW zone and a separate pubsub zone configured to sync from its local RGW zone. Each zone has three endpoints, one on each node. We use port 7480 for RGW and 7481 for pubsub. See zonegroup config below.

We have a process on our client that pubsub pushes events to. Every 5 minutes, this process checks the pubsub subscription for events it has missed, and acks them.

  1. Expected

Events disappear from the pubsub queue once they are acked.

  1. Actual

We have noticed recently that one event (id="1597311622.580756.3b1ddcc7") cannot be acked on siteBpubsub. Our client sends the ack HTTP request for this event, and receives a 200 OK response, but the event continues to be listed by the subscription. See tcpdump attached.

This has been happening for over a week (same event).

Generally we're seeing the pubsub mechanism work well. All other events have been successfully acked and are no longer listed in the subscription.

  1. Diags (pubsub.zip)

RGW pubsub logs from ceph nodes. I've turned up the logging levels of debug_rgw and debug_rgw_sync on my pubsub instances to 20/20, but cannot see anything relating to the ack dialogue.

  • albamilana_sc0.pubsub.log (10.225.36.197)
  • albamilana_sc1.pubsub.log (10.225.36.198)
  • albamilana_sc2.pubsub.log (10.225.36.199)

TCPdump (from client):

  • pubsub.cap

- client proxies unencrypted pubsub traffic though a local haproxy process, which handles encryption for remote hop to ceph.
- client (10.225.35.254) <-- HTTP --> haproxy (127.3.3.3) <-- HTTPS --> ceph (10.225.36.197-199)
- 09:44:43.761 GET /subscriptions/albamonssub?events (tcp.stream 19 / packet 193 )
- 09:44:43.786 200 OK including event 1597311622.580756.3b1ddcc7 (tcp.stream 19 / packet 200)
- 09:44:43.810 POST /subscriptions/albamonssub?ack&event-id=1597311622.580756.3b1ddcc7 (tcp.stream 22/ packet 204)
- 09:44:43.825 200 OK (tcp.stream 22/ packet 217)
- ...
- 09:49:44.047 GET /subscriptions/albamonssub?events (tcp.stream 474/ packet 3882 )
- 09:49:44.070 200 OK including event 1597311622.580756.3b1ddcc7 (tcp.stream 474/ packet 3889)

  1. Config

[qs-admin@albamilana_sc0 ~]$ radosgw-admin zonegroup get
+ sudo docker ps --filter name=ceph-rgw-.*rgw -q
+
sudo docker exec d7baa6bc479c radosgw-admin {
"id": "3b89fa78-3c17-4cfd-85a3-d7101fb0e8c3",
"name": "geored_zg",
"api_name": "geored_zg",
"is_master": "true",
"endpoints": [
"https://10.225.21.213:7480"
],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "c2fbaf9a-f3c5-4212-9a4d-a8cca889dd1a",
"zones": [ {
"id": "321b7e8d-81eb-4dbb-b03e-0984913279e9",
"name": "siteApubsub",
"endpoints": [
"https://10.225.21.213:7481",
"https://10.225.21.214:7481",
"https://10.225.21.215:7481"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "pubsub",
"sync_from_all": "false",
"sync_from": [
"siteA"
],
"redirect_zone": ""
}, {
"id": "625b1e76-5f19-41a8-9633-74e5e34ca07c",
"name": "siteBpubsub",
"endpoints": [
"https://10.225.36.197:7481",
"https://10.225.36.198:7481",
"https://10.225.36.199:7481"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "pubsub",
"sync_from_all": "false",
"sync_from": [
"siteB"
],
"redirect_zone": ""
}, {
"id": "838e1bdb-a707-4a82-8b2f-4b8620762755",
"name": "siteB",
"endpoints": [
"https://10.225.36.197:7480",
"https://10.225.36.198:7480",
"https://10.225.36.199:7480"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
}, {
"id": "c2fbaf9a-f3c5-4212-9a4d-a8cca889dd1a",
"name": "siteA",
"endpoints": [
"https://10.225.21.213:7480",
"https://10.225.21.214:7480",
"https://10.225.21.215:7480"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false",
"tier_type": "",
"sync_from_all": "true",
"sync_from": [],
"redirect_zone": ""
}
],
"placement_targets": [ {
"name": "default-placement",
"tags": [],
"storage_classes": [
"STANDARD"
]
}
],
"default_placement": "default-placement",
"realm_id": "f1d79706-ca99-46b5-84d2-b75c48893e4b"
}


Files

pubsub.zip (951 KB) pubsub.zip David Piper, 08/25/2020 09:31 AM
Actions #1

Updated by Greg Farnum almost 3 years ago

  • Project changed from Ceph to rgw
Actions

Also available in: Atom PDF