Actions
Bug #22462
closedmon: unknown message type 1537 in luminous->mimic upgrade tests
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Monitor
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
mon.a was mimic, mon.b and c were still luminous. mon.a was out of quorum, with logs including:
2017-12-15 17:05:26.772 7fb0e30ec700 1 -- 172.21.2.165:6789/0 --> 172.21.2.121:6789/0 -- forward(log(1 entries from seq 10 at 2017-12-15 17:05:26.775932) v1 caps allow * tid 14 con_features 2305244844817580027) v3 -- 0x5558549d5600 con 0 2017-12-15 17:05:26.772 7fb0df0e4700 10 _calc_signature seq 67 front_crc_ = 3288517548 middle_crc = 0 data_crc = 0 sig = 4805415437538457764 2017-12-15 17:05:26.772 7fb0df0e4700 20 Putting signature in client message(seq # 67): sig = 4805415437538457764 2017-12-15 17:05:26.860 7fb0df0e4700 10 _calc_signature seq 204841642 front_crc_ = 2129739439 middle_crc = 0 data_crc = 0 sig = 7961375364363924504 2017-12-15 17:05:26.860 7fb0df0e4700 0 can't decode unknown message type 1537 MSG_AUTH=17 2017-12-15 17:05:26.860 7fb0df0e4700 1 -- 172.21.2.165:6789/0 >> 172.21.2.121:6789/0 conn(0x555854e1dc00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=578 cs=1 l=0).process decode message failed 2017-12-15 17:05:26.860 7fb0e30ec700 1 -- 172.21.2.165:6789/0 <== mon.0 172.21.2.121:6789/0 204841642 ==== paxos(begin lc 2093 fc 0 pn 1100 opn 0) v4 ==== 11398+0+0 (2129739439 0 0) 0x5558549d5600 con 0x555854e1dc00 2017-12-15 17:05:26.860 7fb0e30ec700 20 mon.a@1(peon) e1 _ms_dispatch existing session 0x555854b2e4c0 for mon.0 172.21.2.121:6789/0 2017-12-15 17:05:26.860 7fb0e30ec700 20 mon.a@1(peon) e1 caps allow * 2017-12-15 17:05:26.860 7fb0e30ec700 20 is_capable service=mon command= read on cap allow * 2017-12-15 17:05:26.860 7fb0e30ec700 20 allow so far , doing grant allow * 2017-12-15 17:05:26.860 7fb0e30ec700 20 allow all 2017-12-15 17:05:26.860 7fb0e30ec700 20 is_capable service=mon command= exec on cap allow * 2017-12-15 17:05:26.860 7fb0e30ec700 20 allow so far , doing grant allow * 2017-12-15 17:05:26.860 7fb0e30ec700 20 allow all 2017-12-15 17:05:26.860 7fb0e30ec700 10 mon.a@1(peon).paxos(paxos active c 1507..2093) handle_begin paxos(begin lc 2093 fc 0 pn 1100 opn 0) v4 2017-12-15 17:05:26.860 7fb0e30ec700 10 mon.a@1(peon).paxos(paxos updating c 1507..2093) accepting value for 2094 pn 1100 2017-12-15 17:05:26.860 7fb0df0e4700 1 -- 172.21.2.165:6789/0 >> - conn(0x555854e6e000 :6789 s=STATE_ACCEPTING pgs=0 cs=0 l=0)._process_connection sd=25 - 2017-12-15 17:05:26.860 7fb0df0e4700 10 mon.a@1(peon) e1 ms_verify_authorizer 172.21.2.121:6789/0 mon protocol 2 2017-12-15 17:05:26.860 7fb0df0e4700 10 cephx: verify_authorizer decrypted service mon secret_id=18446744073709551615 2017-12-15 17:05:26.860 7fb0df0e4700 10 cephx: verify_authorizer global_id=0 2017-12-15 17:05:26.860 7fb0df0e4700 10 cephx: verify_authorizer ok nonce 2a487cb01d4ed43b reply_bl.length()=36 2017-12-15 17:05:26.860 7fb0df0e4700 0 -- 172.21.2.165:6789/0 >> 172.21.2.121:6789/0 conn(0x555854e6e000 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 2 vs existing csq=1 existing_state=STATE_STANDBY 2017-12-15 17:05:26.860 7fb0df0e4700 1 -- 172.21.2.165:6789/0 >> 172.21.2.121:6789/0 conn(0x555854e6e000 :6789 s=STATE_CLOSED pgs=0 cs=0 l=0).handle_connect_msg stop myself to swap existing 2017-12-15 17:05:26.864 7fb0df0e4700 10 mon.a@1(peon) e1 ms_verify_authorizer 172.21.2.121:6789/0 mon protocol 2 2017-12-15 17:05:26.864 7fb0df0e4700 10 cephx: verify_authorizer decrypted service mon secret_id=18446744073709551615 2017-12-15 17:05:26.864 7fb0df0e4700 10 cephx: verify_authorizer global_id=0 2017-12-15 17:05:26.864 7fb0df0e4700 10 cephx: verify_authorizer ok nonce 725a06fb2cd89a32 reply_bl.length()=36 2017-12-15 17:05:26.864 7fb0df0e4700 10 In get_auth_session_handler for protocol 2 2017-12-15 17:05:26.864 7fb0e30ec700 1 -- 172.21.2.165:6789/0 --> 172.21.2.121:6789/0 -- paxos(accept lc 2093 fc 0 pn 1100 opn 0) v4 -- 0x5558550ae600 con 0 2017-12-15 17:05:26.864 7fb0e30ec700 10 mon.a@1(peon) e1 ms_handle_reset 0x555854e6e000 172.21.2.121:6789/0 2017-12-15 17:05:26.864 7fb0df0e4700 10 _calc_signature seq 68 front_crc_ = 1210378228 middle_crc = 0 data_crc = 0 sig = 8133008694137864262 2017-12-15 17:05:26.864 7fb0df0e4700 20 Putting signature in client message(seq # 68): sig = 8133008694137864262 2017-12-15 17:05:26.864 7fb0df0e4700 0 can't decode unknown message type 1537 MSG_AUTH=17
and continuing:
2017-12-15 23:50:36.726 7fb0e00e6700 10 mon.a@1(probing) e1 ms_verify_authorizer 172.21.2.121:6790/0 mon protocol 2 2017-12-15 23:50:36.726 7fb0e00e6700 10 cephx: verify_authorizer decrypted service mon secret_id=18446744073709551615 2017-12-15 23:50:36.726 7fb0e00e6700 10 cephx: verify_authorizer global_id=0 2017-12-15 23:50:36.726 7fb0e00e6700 10 cephx: verify_authorizer ok nonce 2c69fc1306aa9050 reply_bl.length()=36 2017-12-15 23:50:36.726 7fb0e00e6700 10 In get_auth_session_handler for protocol 2 2017-12-15 23:50:36.726 7fb0df8e5700 10 mon.a@1(probing) e1 ms_verify_authorizer 172.21.2.121:6789/0 mon protocol 2 2017-12-15 23:50:36.726 7fb0df8e5700 10 cephx: verify_authorizer decrypted service mon secret_id=18446744073709551615 2017-12-15 23:50:36.726 7fb0df8e5700 10 cephx: verify_authorizer global_id=0 2017-12-15 23:50:36.726 7fb0df8e5700 10 cephx: verify_authorizer ok nonce 51d54ab542324674 reply_bl.length()=36 2017-12-15 23:50:36.726 7fb0df8e5700 10 In get_auth_session_handler for protocol 2 2017-12-15 23:50:36.726 7fb0df8e5700 0 can't decode unknown message type 1537 MSG_AUTH=17 2017-12-15 23:50:36.726 7fb0df8e5700 1 -- 172.21.2.165:6789/0 >> 172.21.2.121:6789/0 conn(0x555854e1dc00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=11470279 cs=11469271 l=0).process decode message failed 2017-12-15 23:50:36.730 7fb0e00e6700 0 can't decode unknown message type 1537 MSG_AUTH=17 2017-12-15 23:50:36.730 7fb0e00e6700 1 -- 172.21.2.165:6789/0 >> 172.21.2.121:6790/0 conn(0x555854e1e300 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=9317698 cs=9323132 l=0).process decode message failed 2017-12-15 23:50:36.730 7fb0df8e5700 1 -- 172.21.2.165:6789/0 >> - conn(0x555857416a00 :6789 s=STATE_ACCEPTING pgs=0 cs=0 l=0)._process_connection sd=11 - 2017-12-15 23:50:36.730 7fb0df8e5700 10 mon.a@1(probing) e1 ms_verify_authorizer 172.21.2.121:6789/0 mon protocol 2 2017-12-15 23:50:36.730 7fb0df8e5700 10 cephx: verify_authorizer decrypted service mon secret_id=18446744073709551615 2017-12-15 23:50:36.730 7fb0df8e5700 10 cephx: verify_authorizer global_id=0 2017-12-15 23:50:36.730 7fb0df8e5700 10 cephx: verify_authorizer ok nonce 288fc77d655f47a8 reply_bl.length()=36 2017-12-15 23:50:36.730 7fb0df8e5700 0 -- 172.21.2.165:6789/0 >> 172.21.2.121:6789/0 conn(0x555857416a00 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 11469272 vs existing csq=11469271 existing_state=STATE_STANDBY 2017-12-15 23:50:36.730 7fb0df0e4700 1 -- 172.21.2.165:6789/0 reap_dead start 2017-12-15 23:50:36.730 7fb0e30ec700 10 mon.a@1(probing) e1 ms_handle_reset 0x555857416a00 172.21.2.121:6789/0 2017-12-15 23:50:36.730 7fb0df8e5700 1 -- 172.21.2.165:6789/0 >> 172.21.2.121:6789/0 conn(0x555857416a00 :6789 s=STATE_CLOSED pgs=0 cs=0 l=0).handle_connect_msg stop myself to swap existing 2017-12-15 23:50:36.730 7fb0df8e5700 1 -- 172.21.2.165:6789/0 >> - conn(0x55585790ce00 :6789 s=STATE_ACCEPTING pgs=0 cs=0 l=0)._process_connection sd=25 - 2017-12-15 23:50:36.730 7fb0df8e5700 10 mon.a@1(probing) e1 ms_verify_authorizer 172.21.2.121:6789/0 mon protocol 2 2017-12-15 23:50:36.730 7fb0df8e5700 10 cephx: verify_authorizer decrypted service mon secret_id=18446744073709551615 2017-12-15 23:50:36.730 7fb0df8e5700 10 cephx: verify_authorizer global_id=0 2017-12-15 23:50:36.730 7fb0df8e5700 10 cephx: verify_authorizer ok nonce 2a382f157676ea4 reply_bl.length()=36 2017-12-15 23:50:36.730 7fb0df8e5700 10 In get_auth_session_handler for protocol 2 2017-12-15 23:50:36.730 7fb0df8e5700 10 mon.a@1(probing) e1 ms_verify_authorizer 172.21.2.121:6790/0 mon protocol 2 2017-12-15 23:50:36.730 7fb0df8e5700 10 cephx: verify_authorizer decrypted service mon secret_id=18446744073709551615 2017-12-15 23:50:36.730 7fb0df8e5700 10 cephx: verify_authorizer global_id=0 2017-12-15 23:50:36.730 7fb0df8e5700 10 cephx: verify_authorizer ok nonce 33d1f4010539dc16 reply_bl.length()=36 2017-12-15 23:50:36.730 7fb0df8e5700 0 -- 172.21.2.165:6789/0 >> 172.21.2.121:6790/0 conn(0x55585790ce00 :6789 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg accept connect_seq 9323133 vs existing csq=9323132 existing_state=STATE_STANDBY 2017-12-15 23:50:36.730 7fb0e30ec700 10 mon.a@1(probing) e1 ms_handle_reset 0x55585790ce00 172.21.2.121:6790/0 2017-12-15 23:50:36.730 7fb0df8e5700 1 -- 172.21.2.165:6789/0 >> 172.21.2.121:6790/0 conn(0x55585790ce00 :6789 s=STATE_CLOSED pgs=0 cs=0 l=0).handle_connect_msg stop myself to swap existing 2017-12-15 23:50:36.730 7fb0df8e5700 0 can't decode unknown message type 1537 MSG_AUTH=17 2017-12-15 23:50:36.730 7fb0df8e5700 1 -- 172.21.2.165:6789/0 >> 172.21.2.121:6789/0 conn(0x555854e1dc00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=11470281 cs=11469273 l=0).process decode message failed
versions were: 12.2.2-37-g5bef6fa and 13.0.0-4056-g5043e4b
Actions