Actions
Bug #24481
open[mon] can't decode unknown message type 1544 MSG_AUTH=17
Status:
Fix Under Review
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:
0%
Source:
Community (user)
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
Hi, friends.
I updated cluster from Hammer(0.94.10) to Jewel(10.2.10) recently an all works fine.
Now, i can't update cluster from Jewel to Luminous(12.2.5).
I have 5 MONs.
I updated packages, restarted MON service but MON didn't joined to cluster.
In logs of new MOn i see:
2018-06-09 18:31:58.401195 7f53c4b94e40 0 mon.c@-1(probing).osd e97076 crush map has features 1107558400, adjusting msgr requires
2018-06-09 18:31:58.401199 7f53c4b94e40 0 mon.c@-1(probing).osd e97076 crush map has features 144115189183414272, adjusting msgr requires
2018-06-09 18:31:58.401201 7f53c4b94e40 0 mon.c@-1(probing).osd e97076 crush map has features 1107558400, adjusting msgr requires
2018-06-09 18:31:58.401640 7f53c4b94e40 1 mon.c@-1(probing).paxosservice(auth 27501..27729) refresh upgraded, format 0 -> 1
2018-06-09 18:31:58.405124 7f53c4b94e40 0 mon.c@-1(probing) e14 my rank is now 4 (was -1)
2018-06-09 18:31:58.407637 7f53bcf98700 0 -- 192.168.144.96:6789/0 >> 192.168.144.94:6789/0 conn(0x7f53cf49d800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=5862845 cs=1 l=0).process missed message? skipped from seq 0 to 324975786
2018-06-09 18:31:58.407698 7f53bc797700 0 -- 192.168.144.96:6789/0 >> 192.168.144.95:6789/0 conn(0x7f53cf49f000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=5570472 cs=1 l=0).process missed message? skipped from seq 0 to 581869062
2018-06-09 18:31:58.407885 7f53b833d700 1 mon.c@4(synchronizing) e14 sync_obtain_latest_monmap
2018-06-09 18:31:58.408133 7f53bd799700 0 -- 192.168.144.96:6789/0 >> 192.168.144.72:6789/0 conn(0x7f53cf49c000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=6280990 cs=1 l=0).process missed message? skipped from seq 0 to 638861135
2018-06-09 18:31:58.408206 7f53b833d700 1 mon.c@4(synchronizing) e14 sync_obtain_latest_monmap obtained monmap e14
2018-06-09 18:31:58.408212 7f53bc797700 0 -- 192.168.144.96:6789/0 >> 192.168.144.6:6789/0 conn(0x7f53cf48e800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=7342571 cs=1 l=0).process missed message? skipped from seq 0 to 1859752335
2018-06-09 18:31:59.480169 7f53b0c4b700 1 leveldb: Level-0 table #502: started
2018-06-09 18:31:59.611803 7f53b0c4b700 1 leveldb: Level-0 table #502: 33053398 bytes OK
2018-06-09 18:31:59.618460 7f53b0c4b700 1 leveldb: Delete type=0 #500
2018-06-09 18:31:59.866280 7f53b0c4b700 1 leveldb: Level-0 table #504: started
2018-06-09 18:31:59.991979 7f53b0c4b700 1 leveldb: Level-0 table #504: 33792335 bytes OK
2018-06-09 18:31:59.997304 7f53b0c4b700 1 leveldb: Delete type=0 #501
2018-06-09 18:32:04.346362 7f53b833d700 1 mon.c@4(synchronizing).mds e1 Unable to load 'last_metadata'
2018-06-09 18:32:04.400960 7f53b833d700 0 log_channel(cluster) log [INF] : mon.c calling monitor election
2018-06-09 18:32:04.401069 7f53b833d700 1 mon.c@4(electing).elector(3893) init, last seen epoch 3893, mid-election, bumping
2018-06-09 18:32:08.405774 7f53bc797700 0 -- 192.168.144.96:6789/0 >> 192.168.144.6:6789/0 conn(0x7f53cf48e800 :-1 s=STATE_OPEN pgs=7342571 cs=1 l=0).fault initiating reconnect
2018-06-09 18:32:08.407720 7f53bc797700 0 -- 192.168.144.96:6789/0 >> 192.168.144.6:6789/0 conn(0x7f53cf48e800 :-1 s=STATE_OPEN pgs=7342573 cs=3 l=0).fault initiating reconnect
2018-06-09 18:32:08.409374 7f53bc797700 0 -- 192.168.144.96:6789/0 >> 192.168.144.6:6789/0 conn(0x7f53cf48e800 :-1 s=STATE_OPEN pgs=7342574 cs=5 l=0).fault initiating reconnect
2018-06-09 18:32:08.411206 7f53bc797700 0 -- 192.168.144.96:6789/0 >> 192.168.144.6:6789/0 conn(0x7f53cf48e800 :-1 s=STATE_OPEN pgs=7342575 cs=7 l=0).fault initiating reconnect
2018-06-09 18:32:08.413154 7f53bc797700 0 -- 192.168.144.96:6789/0 >> 192.168.144.6:6789/0 conn(0x7f53cf48e800 :-1 s=STATE_OPEN pgs=7342576 cs=9 l=0).fault initiating reconnect
in logs of current active MON i see:
can't decode unknown message type 1544 MSG_AUTH=17
and some time this:
2018-06-08 23:42:02.910601 7fc6ddb8b700 0 mon.e@0(leader) e8 ms_verify_authorizer cephx enabled, but no authorizer (required for mon)
2018-06-08 23:42:02.910613 7fc6ddb8b700 0 -- 192.168.144.6:6789/0 >> 192.168.144.94:6789/0 pipe(0x7fc6ff23e000 sd=12 :6789 s=0 pgs=0 cs=0 l=0 c=0x7fc6fb237800).accept: got bad authorizer
I naven't reproduce this in test environment. I tested update Hammer->Jewel-Luminous many times in different setups and always all was fine.
Any recommendation, please.
Actions