Project

General

Profile

Actions

Bug #58375

open

rgw bucket notification failed to reserve notification on queue: . error: -28 when put_obj.

Added by xiaobao wen over 1 year ago. Updated 12 months ago.

Status:
Need More Info
Priority:
Normal
Target version:
-
% Done:

0%

Source:
Community (user)
Tags:
rgw
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

I used rook to deploy bucket notification.
topic config:
apiVersion: ceph.rook.io/v1
kind: CephBucketTopic
metadata:
name: c-dev-prod-bucket-notification
namespace: rook-ceph
spec: # kubectl get CephObjectStore -n rook-ceph
objectStoreName: os-dsglczutvqsgowpz
objectStoreNamespace: rook-ceph
opaqueData: xxxxx
persistent: true
endpoint:
kafka:
uri: kafka://dev-kafka.xxxxx.cn:9092
disableVerifySSL: true
ackLevel: broker # none, broker (default)
useSSL: false

on day after put-bucket-notification-configuration to bucket, rgw log like this and put obj failed
debug 2022-12-27T07:31:50.861+0000 7f8339ec7700 1 ====== starting new request req=0x7f83c48ba630 =====
debug 2022-12-27T07:31:50.864+0000 7f8301e57700 1 req 1303171812437185995 0.003000025s s3:put_obj ERROR: failed to reserve notification on queue: c-dev-prod-bucket-notification. error: -28
debug 2022-12-27T07:31:50.864+0000 7f8301e57700 1 ====== req done req=0x7f83c48ba630 op status=-2218 http_status=503 latency=0.003000025s ======
debug 2022-12-27T07:31:50.864+0000 7f8301e57700 1 beast: 0x7f83c48ba630: 10.3.8.33 - mlp [27/Dec/2022:07:31:50.861 +0000] "PUT /mlp-data-warehouse/pioneer_test.sql HTTP/1.1" 503 266 - "aws-sdk-go/1.40.25 (go1.18.3; linux; amd64) S3Manager" - latency=0.003000025s


Files

rgwlog.tar.gz (101 KB) rgwlog.tar.gz xiaobao wen, 01/04/2023 03:02 AM
topic_info.txt (19.2 KB) topic_info.txt xiaobao wen, 01/13/2023 04:02 AM
Actions #1

Updated by xiaobao wen over 1 year ago

log file

Actions #2

Updated by Yuval Lifshitz over 1 year ago

from the log file it seems like the kafka broker is disconnected:

4678:%3|1672125044.018|FAIL|rdkafka#producer-1| [thrd:dev-kafka.deeproute.cn:9092/bootstrap]: dev-kafka.deeproute.cn:9092/bootstrap: Receive failed: Disconnected
4679:%3|1672125044.018|ERROR|rdkafka#producer-1| [thrd:dev-kafka.deeproute.cn:9092/bootstrap]: dev-kafka.deeproute.cn:9092/bootstrap: Receive failed: Disconnected
6070:%3|1672125343.964|FAIL|rdkafka#producer-1| [thrd:10.9.9.115:9092/0]: 10.9.9.115:9092/0: Receive failed: Disconnected
6071:%3|1672125343.964|ERROR|rdkafka#producer-1| [thrd:10.9.9.115:9092/0]: 10.9.9.115:9092/0: Receive failed: Disconnected

this is cauaing the notification queue to fill up and eventually cause failures in the put_obj operation (we cannot reserve a place in the queue since it is full):
6450:debug 2022-12-27T07:31:48.342+0000 7f82badc9700  1 req 750000557323249480 0.006000049s s3:put_obj ERROR: failed to reserve notification on queue: c-dev-prod-bucket-notification. error: -28

Actions #3

Updated by xiaobao wen over 1 year ago

Yuval Lifshitz wrote:

from the log file it seems like the kafka broker is disconnected:
[...]
this is cauaing the notification queue to fill up and eventually cause failures in the put_obj operation (we cannot reserve a place in the queue since it is full):
[...]

the topic with put_obj failed not use dev-kafka.xxx. I put three topics, and dev-kafka.xxx is only used for development before used.

do disconnected kafka broker will adversely affect all topics?

Actions #4

Updated by Casey Bodley over 1 year ago

@Yuval is this the expected behavior? if so, should we close this?

Actions #5

Updated by Casey Bodley over 1 year ago

  • Status changed from New to Need More Info
Actions #6

Updated by xiaobao wen over 1 year ago

Add some notification configuration info, we used dev/stg/prod kafka topic. We find dev-kafka disconnected from log. But errors notification configuration is prod-kafka

Actions #7

Updated by Ilya Dryomov about 1 year ago

  • Target version deleted (v16.2.11)
Actions #8

Updated by Casey Bodley 12 months ago

  • Assignee set to Yuval Lifshitz

Casey Bodley wrote:

@Yuval is this the expected behavior? if so, should we close this?

Actions #9

Updated by Yuval Lifshitz 12 months ago

each topic has its own queue, so, if the dev-kafka broker is down, only notifications sent to its topic should fill up the queue and eventually fail.
the topic (and queue) that points to the kafka broker which is up, should not be affected by that (to some extent, the retries to the failed broker use the same kafka thread) and we should not see fqueue fill up for the broker whic his up.

please keep tracker open. i will investigate

Actions #10

Updated by xiaobao wen 12 months ago

Yuval Lifshitz wrote:

each topic has its own queue, so, if the dev-kafka broker is down, only notifications sent to its topic should fill up the queue and eventually fail.
the topic (and queue) that points to the kafka broker which is up, should not be affected by that (to some extent, the retries to the failed broker use the same kafka thread) and we should not see fqueue fill up for the broker whic his up.

please keep tracker open. i will investigate

Thanks a lot.
It has been too long to obtain additional information on the environment.I am going to do a simple test based on v16.2.12 recently.

Actions

Also available in: Atom PDF