Project

General

Profile

Bug #53330

ceph client request connection with an old invalid key.

Added by wencong wan over 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

We have a production ceph cluster with 3 mons and 516 osds.
Ceph version: 14.2.8
CPU: Intel(R) Xeon(R) Gold 5218
MEM: 187 GB
NIC: 10 G
Node mon-2 down for some reason at 2021-11-17 04:02:00, the rest works well util ceph-mon daemon on mon-2 restart at 2021-11-17 09:05:03。
mon-3 calling new election due to lease_timeout at 09:05:16
mon-1 become leader at 2021-11-17 09:07:16
Then many osds were marked down by mon due to heartbeat timeout with other osds.
After restarting all OSDs, the cluster returns to health.

We can see a lot of verify_authorizer failed record in osd's log of on different node

[root@*****-ceph-1 ceph]# zcat /var/log/ceph/ceph-osd.*.log-20211118.gz | grep "verify_authorizer could not get service secret for service osd secret_id=9082" | wc -l
20057946

[root@*****-ceph-10 ~]# zcat /var/log/ceph/ceph-osd.*.log-20211118.gz | grep "verify_authorizer could not get service secret for service osd secret_id=9082" | wc -l
64982126

The secret_id add 1 ever hour, according to the secret_id of the current time,secret_id 9082 seems the valid id before mon-2 down at 2021-11-17 04:02:00.

Similar problem was reported six years ago

https://tracker.ceph.com/issues/4282 #9

Also available in: Atom PDF