Actions
Bug #41065
closednew osd added to cluster upgraded from 13 to 14 will down after some days
Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Hi all.
My ceph cluster upgraded from 13.2.5 and 14.2.2
I am not enable mgr v2 and add 2 new mon.
ceph mon dump dumped monmap epoch 6 epoch 6 fsid 1eaa6824-55fb-4cfd-bdd1-8839296f5cf8 last_changed 2019-07-30 15:23:45.147026 created 2018-12-07 12:46:14.001608 min_mon_release 14 (nautilus) 0: v1:172.25.7.151:6789/0 mon.bat-cinder-1 1: v1:172.25.7.152:6789/0 mon.bat-cinder-2 2: v1:172.25.7.153:6789/0 mon.bat-cinder-3 3: [v2:172.25.7.155:3300/0,v1:172.25.7.155:6789/0] mon.bat-cinder-5 4: [v2:172.25.7.154:3300/0,v1:172.25.7.154:6789/0] mon.bat-cinder-4
All old osd working fine.
I add new crush to ceph cluster has 4 host.
After 1 day some pg down, osd not down, i restarted osd but pg still down.
osd log:
2019-08-05 11:06:04.774 7fa82a5e1700 -1 osd.210 689149 get_health_metrics reporting 1 slow ops, oldest is osd_pg_create(e689149 24.28:678578 24.2f:678578 24.42:678578 24.50:678578 24.6e:678578 24.94:678578 24.9f:678578 24.ff:678578 24.150:678578 24.19d:678578 24.1a6:678578 24.1b8:678578 24.215:678578 24.21e:678578 24.250:678578 24.281:678578 24.2f5:678578 24.2f7:678578 24.344:678578 24.35a:678578 24.36a:678578 24.37c:678578 24.39c:678578 24.3c0:678578 24.3fb:678578 24.40f:678578 24.410:678578 24.422:678578 24.43e:678578 24.463:678578 24.4ba:678578 24.532:678578 24.536:678578 24.569:678578 24.56b:678578 24.56d:678578 24.5ac:678578 24.5bc:678578 24.5c7:678578 24.5d8:678578 24.5db:678578 24.62b:678578 24.62e:678578 24.6c6:678578 24.6e2:678578 24.702:678578 24.72c:678578 24.797:678578 24.7c7:678578)
TING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_message_2: got bad authorizer, auth_reply_len=0 2019-08-05 03:18:01.994 7fcf9fdc9700 0 auth: could not find secret_id=5773 2019-08-05 03:18:01.994 7fcf9fdc9700 0 cephx: verify_authorizer could not get service secret for service osd secret_id=5773 2019-08-05 03:18:01.994 7fcf9fdc9700 0 --1- [v2:172.25.6.61:6828/986104,v1:172.25.6.61:6830/986104] >> v1:172.25.7.153:0/3633830937 conn(0x55f79b33c000 0x55f79b345800 :6830 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_message_2: got bad authorizer, auth_reply_len=0 2019-08-05 03:18:01.995 7fcfa05ca700 0 auth: could not find secret_id=5773 2019-08-05 03:18:01.995 7fcfa05ca700 0 cephx: verify_authorizer could not get service secret for service osd secret_id=5773 2019-08-05 03:18:01.995 7fcfa05ca700 0 --1- [v2:172.25.6.61:6828/986104,v1:172.25.6.61:6830/986104] >> v1:172.25.7.152:0/1671746468 conn(0x55f699524c00 0x55f9ad60b800 :6830 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_message_2: got bad authorizer, auth_reply_len=0 2019-08-05 03:18:01.995 7fcfa05ca700 0 auth: could not find secret_id=5773 2019-08-05 03:18:01.995 7fcfa05ca700 0 cephx: verify_authorizer could not get service secret for service osd secret_id=5773 2019-08-05 03:18:01.995 7fcf9fdc9700 0 auth: could not find secret_id=5773 2019-08-05 03:18:01.995 7fcf9fdc9700 0 cephx: verify_authorizer could not get service secret for service osd secret_id=5773 2019-08-05 03:18:01.995 7fcfa05ca700 0 --1- [v2:172.25.6.61:6828/986104,v1:172.25.6.61:6830/986104] >> v1:172.25.7.153:0/3633830937 conn(0x55f6eb81d400 0x55f5e7cb5800 :6830 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_message_2: got bad authorizer, auth_reply_len=0 2019-08-05 03:18:01.995 7fcf9fdc9700 0 --1- [v2:172.25.6.61:6828/986104,v1:172.25.6.61:6830/986104] >> v1:172.25.7.151:0/3105714850 conn(0x55f78e940800 0x55f79b356800 :6830 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_message_2: got bad authorizer, auth_reply_len=0 2019-08-05 03:18:01.995 7fcf9fdc9700 0 auth: could not find secret_id=5773 2019-08-05 03:18:01.995 7fcf9fdc9700 0 cephx: verify_authorizer could not get service secret for service osd secret_id=5773 2019-08-05 03:18:01.995 7fcf9fdc9700 0 --1- [v2:172.25.6.61:6828/986104,v1:172.25.6.61:6830/986104] >> v1:172.25.7.152:0/1671746468 conn(0x55f69954bc00 0x55fa775eb800 :6830 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_message_2: got bad authorizer, auth_reply_len=0
Thanks.
Updated by hoan nv over 4 years ago
log in per osd
2019-08-05 15:13:51.370 7f7b52b44700 -1 osd.210 689629 get_health_metrics reporting 35 slow ops, oldest is osd_pg_create(e689595 24.28:678578 24.2f:678578 24.50:678578 24.6e:678578 24.94:678578 24.9f:678578 24.ff:678578 24.150:678578 24.19d:678578 24.1a6:678578 24.1b8:678578 24.215:678578 24.21e:678578 24.250:678578 24.281:678578 24.2f5:678578 24.2f7:678578 24.344:678578 24.35a:678578 24.36a:678578 24.37c:678578 24.39c:678578 24.3c0:678578 24.3fb:678578 24.410:678578 24.422:678578 24.43e:678578 24.463:678578 24.4ba:678578 24.532:678578 24.536:678578 24.569:678578 24.56b:678578 24.56d:678578 24.5ac:678578 24.5bc:678578 24.5c7:678578 24.5d8:678578 24.5db:678578 24.62b:678578 24.62e:678578 24.6c6:678578 24.6e2:678578 24.702:678578 24.72c:678578 24.797:678578 24.7c7:678578) 2019-08-05 15:13:52.330 7f7b52b44700 -1 osd.210 689629 get_health_metrics reporting 35 slow ops, oldest is osd_pg_create(e689595 24.28:678578 24.2f:678578 24.50:678578 24.6e:678578 24.94:678578 24.9f:678578 24.ff:678578 24.150:678578 24.19d:678578 24.1a6:678578 24.1b8:678578 24.215:678578 24.21e:678578 24.250:678578 24.281:678578 24.2f5:678578 24.2f7:678578 24.344:678578 24.35a:678578 24.36a:678578 24.37c:678578 24.39c:678578 24.3c0:678578 24.3fb:678578 24.410:678578 24.422:678578 24.43e:678578 24.463:678578 24.4ba:678578 24.532:678578 24.536:678578 24.569:678578 24.56b:678578 24.56d:678578 24.5ac:678578 24.5bc:678578 24.5c7:678578 24.5d8:678578 24.5db:678578 24.62b:678578 24.62e:678578 24.6c6:678578 24.6e2:678578 24.702:678578 24.72c:678578 24.797:678578 24.7c7:678578) 2019-08-05 15:13:53.378 7f7b52b44700 -1 osd.210 689629 get_health_metrics reporting 35 slow ops, oldest is osd_pg_create(e689595 24.28:678578 24.2f:678578 24.50:678578 24.6e:678578 24.94:678578 24.9f:678578 24.ff:678578 24.150:678578 24.19d:678578 24.1a6:678578 24.1b8:678578 24.215:678578 24.21e:678578 24.250:678578 24.281:678578 24.2f5:678578 24.2f7:678578 24.344:678578 24.35a:678578 24.36a:678578 24.37c:678578 24.39c:678578 24.3c0:678578 24.3fb:678578 24.410:678578 24.422:678578 24.43e:678578 24.463:678578 24.4ba:678578 24.532:678578 24.536:678578 24.569:678578 24.56b:678578 24.56d:678578 24.5ac:678578 24.5bc:678578 24.5c7:678578 24.5d8:678578 24.5db:678578 24.62b:678578 24.62e:678578 24.6c6:678578 24.6e2:678578 24.702:678578 24.72c:678578 24.797:678578 24.7c7:678578) 2019-08-05 15:13:54.416 7f7b52b44700 -1 osd.210 689629 get_health_metrics reporting 35 slow ops, oldest is osd_pg_create(e689595 24.28:678578 24.2f:678578 24.50:678578 24.6e:678578 24.94:678578 24.9f:678578 24.ff:678578 24.150:678578 24.19d:678578 24.1a6:678578 24.1b8:678578 24.215:678578 24.21e:678578 24.250:678578 24.281:678578 24.2f5:678578 24.2f7:678578 24.344:678578 24.35a:678578 24.36a:678578 24.37c:678578 24.39c:678578 24.3c0:678578 24.3fb:678578 24.410:678578 24.422:678578 24.43e:678578 24.463:678578 24.4ba:678578 24.532:678578 24.536:678578 24.569:678578 24.56b:678578 24.56d:678578 24.5ac:678578 24.5bc:678578 24.5c7:678578 24.5d8:678578 24.5db:678578 24.62b:678578 24.62e:678578 24.6c6:678578 24.6e2:678578 24.702:678578 24.72c:678578 24.797:678578 24.7c7:678578)
osd has slow ops : osd_pg_create
ceph osd ops
"ops": [ { "description": "osd_pg_create(e689622 24.28:678578 24.2f:678578 24.50:678578 24.6e:678578 24.94:678578 24.9f:678578 24.ff:678578 24.150:678578 24.1a6:678578 24.1b8:678578 24.215:678578 24.21e:678578 24.250:678578 24.281:678578 24.2f5:678578 24.2f7:678578 24.344:678578 24.35a:678578 24.36a:678578 24.37c:678578 24.39c:678578 24.3c0:678578 24.3fb:678578 24.410:678578 24.422:678578 24.43e:678578 24.463:678578 24.4ba:678578 24.532:678578 24.536:678578 24.569:678578 24.56b:678578 24.56d:678578 24.5ac:678578 24.5bc:678578 24.5c7:678578 24.5d8:678578 24.5db:678578 24.62b:678578 24.62e:678578 24.6c6:678578 24.6e2:678578 24.702:678578 24.72c:678578 24.797:678578 24.7c7:678578)", "initiated_at": "2019-08-05 14:55:46.960570", "age": 1174.229103933, "duration": 1174.2291527780001, "type_data": { "flag_point": "delayed", "events": [ { "time": "2019-08-05 14:55:46.960570", "event": "initiated" }, { "time": "2019-08-05 14:55:46.960570", "event": "header_read" }, { "time": "2019-08-05 14:55:46.960573", "event": "throttled" }, { "time": "2019-08-05 14:55:46.960640", "event": "all_read" }, { "time": "2019-08-05 15:08:08.045234", "event": "dispatched" }, { "time": "2019-08-05 15:08:08.045273", "event": "wait for new map" } ] } },
Updated by Greg Farnum over 4 years ago
- Project changed from Ceph to RADOS
- Status changed from New to Closed
It's not clear from these snippets what issue you're actually experiencing. The "bad authorizer" suggests either a clock sync issue or a CephX misconfiguration.
Updated by Neha Ojha over 4 years ago
- Related to Bug #43048: nautilus: upgrade/mimic-x/stress-split: failed to recover before timeout expired added
Actions