Actions
Bug #38247
closedmsg/async: v2 reconnect_seq assert due to interfering connections
% Done:
0%
Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Description
-1> 2019-02-08 20:36:09.240 7f974fe13700 -1 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.1-3532-ga41eb63/rpm/el7/BUILD/ceph-14.0.1-3532-ga41eb63/src/msg/async/ProtocolV2.cc: In function 'Ct<ProtocolV2> * ProtocolV2::handle_message_complete()' thread 7f974fe13700 time 2019-02-08 20:36:09.234149 /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.0.1-3532-ga41eb63/rpm/el7/BUILD/ceph-14.0.1-3532-ga41eb63/src/msg/async/ProtocolV2.cc: 1914: FAILED ceph_assert(0 == "old msgs despite reconnect_seq feature") ceph version 14.0.1-3532-ga41eb63 (a41eb634749f7f6ff52903dcc1051060d132d6f0) nautilus (dev) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x55e97358c77e] 2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x55e97358c94c] 3: (ProtocolV2::handle_message_complete()+0x1664) [0x55e974018c84] 4: (ProtocolV2::read_message_data()+0x1a7) [0x55e974018e67]
/a/sage-2019-02-08_15:26:54-rados-wip-sage-testing-2019-02-08-0710-distro-basic-smithi/3564801
osd.3 sees
2019-02-08 20:36:09.212 7f972e8f1700 1 -- [v2:172.21.15.29:6828/114228,v1:172.21.15.29:6829/114228] --> [v2:172.21.15.29:6820/114039,v1:172.21.15.29:6821/114039] -- pg_query(2.b epoch 30) v4 -- 0x55e97e5edc20 con 0x55e97e701400 2019-02-08 20:36:09.212 7f972a8e9700 1 -- [v2:172.21.15.29:6828/114228,v1:172.21.15.29:6829/114228] --> [v2:172.21.15.29:6820/114039,v1:172.21.15.29:6821/114039] -- pg_query(2.3 epoch 30) v4 -- 0x55e97e752000 con 0x55e97e701400 2019-02-08 20:36:09.221 7f972b8eb700 1 -- [v2:172.21.15.29:6828/114228,v1:172.21.15.29:6829/114228] --> [v2:172.21.15.29:6820/114039,v1:172.21.15.29:6821/114039] -- pg_notify((query:30 sent:30 2.1( v 19'34 (0'0,19'34] local-lis/les=17/18 n=1 ec=17/17 lis/c 17/17 les/c/f 18/18/0 30/30/27))=([17,29] intervals=([27,28] acting 2,3)) epoch 30) v6 -- 0x55e97e5ecb40 con 0x55e97e701400 2019-02-08 20:36:09.226 7f974fe13700 1 -- [v2:172.21.15.29:6828/114228,v1:172.21.15.29:6829/114228] <== osd.2 v2:172.21.15.29:6820/114039 1 ==== pg_notify((query:30 sent:30 2.3( empty local-lis/les=17/18 n=0 ec=17/17 lis/c 17/17 les/c/f 18/18/0 30/30/30))=([17,29] intervals=([17,25] acting 2,3)) epoch 30) v6 ==== 1023+0+0 (868179085 0 0) 0x55e97e7532c0 con 0x55e97e701400 2019-02-08 20:36:09.227 7f974fe13700 1 -- [v2:172.21.15.29:6828/114228,v1:172.21.15.29:6829/114228] <== osd.2 v2:172.21.15.29:6820/114039 2 ==== pg_query(2.1 epoch 30) v4 ==== 162+0+0 (1286079452 0 0) 0x55e97e7532c0 con 0x55e97e701400 2019-02-08 20:36:09.227 7f972f8f3700 1 -- [v2:172.21.15.29:6828/114228,v1:172.21.15.29:6829/114228] --> [v2:172.21.15.29:6820/114039,v1:172.21.15.29:6821/114039] -- pg_notify((query:30 sent:30 2.1( v 19'34 (0'0,19'34] local-lis/les=17/18 n=1 ec=17/17 lis/c 17/17 les/c/f 18/18/0 30/30/27))=([17,29] intervals=([27,28] acting 2,3)) epoch 30) v6 -- 0x55e97e753680 con 0x55e97e701400 2019-02-08 20:36:09.227 7f974fe13700 1 -- [v2:172.21.15.29:6828/114228,v1:172.21.15.29:6829/114228] <== osd.2 v2:172.21.15.29:6820/114039 3 ==== pg_notify((query:30 sent:30 2.b( v 19'26 (0'0,19'26] local-lis/les=17/18 n=1 ec=17/17 lis/c 17/17 les/c/f 18/18/0 30/30/30))=([17,29] intervals=([17,25] acting 2,3)) epoch 30) v6 ==== 1023+0+0 (1773979790 0 0) 0x55e97e7532c0 con 0x55e97e701400
then crash.
interestingly, on osd.2, we have some weirdness too:
2019-02-08 20:36:09.200 7f7db54c2700 1 -- [v2:172.21.15.29:6820/114039,v1:172.21.15.29:6821/114039] --> [v2:172.21.15.29:6828/114228,v1:172.21.15.29:6829/114228] -- pg_notify((query:30 sent:30 2.3( empty local-lis/les=17/18 n=0 ec=17/17 lis/c 17/17 les/c/f 18/18/0 30/30/30))=([17,29] intervals=([17,25] acting 2,3)) epoch 30) v6 -- 0x55cfa041f680 con 0x55cfa0762000 2019-02-08 20:36:09.200 7f7dba4cc700 1 -- [v2:172.21.15.29:6820/114039,v1:172.21.15.29:6821/114039] --> [v2:172.21.15.29:6828/114228,v1:172.21.15.29:6829/114228] -- pg_query(2.1 epoch 30) v4 -- 0x55cfa058c000 con 0x55cfa0762000 2019-02-08 20:36:09.200 7f7db94ca700 1 -- [v2:172.21.15.29:6820/114039,v1:172.21.15.29:6821/114039] --> [v2:172.21.15.29:6828/114228,v1:172.21.15.29:6829/114228] -- pg_notify((query:30 sent:30 2.b( v 19'26 (0'0,19'26] local-lis/les=17/18 n=1 ec=17/17 lis/c 17/17 les/c/f 18/18/0 30/30/30))=([17,29] intervals=([17,25] acting 2,3)) epoch 30) v6 -- 0x55cfa058d2c0 con 0x55cfa0762000 2019-02-08 20:36:09.228 7f7dda1eb700 1 -- [v2:172.21.15.29:6820/114039,v1:172.21.15.29:6821/114039] <== osd.3 v2:172.21.15.29:6828/114228 1 ==== pg_query(2.b epoch 30) v4 ==== 162+0+0 (4288771879 0 0) 0x55cfa058c960 con 0x55cfa0762000 2019-02-08 20:36:09.228 7f7db54c2700 1 -- [v2:172.21.15.29:6820/114039,v1:172.21.15.29:6821/114039] --> [v2:172.21.15.29:6828/114228,v1:172.21.15.29:6829/114228] -- pg_notify((query:30 sent:30 2.b( v 19'26 (0'0,19'26] local-lis/les=17/18 n=1 ec=17/17 lis/c 17/17 les/c/f 18/18/0 30/30/30))=([17,29] intervals=([17,25] acting 2,3)) epoch 30) v6 -- 0x55cfa0723a40 con 0x55cfa0762000 2019-02-08 20:36:09.232 7f7dda1eb700 1 -- [v2:172.21.15.29:6820/114039,v1:172.21.15.29:6821/114039] <== osd.3 v2:172.21.15.29:6828/114228 1 ==== pg_query(2.b epoch 30) v4 ==== 162+0+0 (4288771879 0 0) 0x55cfa0723a40 con 0x55cfa0762000 2019-02-08 20:36:09.232 7f7db94ca700 1 -- [v2:172.21.15.29:6820/114039,v1:172.21.15.29:6821/114039] --> [v2:172.21.15.29:6828/114228,v1:172.21.15.29:6829/114228] -- pg_notify((query:30 sent:30 2.b( v 19'26 (0'0,19'26] local-lis/les=17/18 n=1 ec=17/17 lis/c 17/17 les/c/f 18/18/0 30/30/30))=([17,29] intervals=([17,25] acting 2,3)) epoch 30) v6 -- 0x55cfa058c1e0 con 0x55cfa0762000 2019-02-08 20:36:09.232 7f7dda1eb700 1 -- [v2:172.21.15.29:6820/114039,v1:172.21.15.29:6821/114039] <== osd.3 v2:172.21.15.29:6828/114228 2 ==== pg_query(2.3 epoch 30) v4 ==== 162+0+0 (3368727633 0 0) 0x55cfa0723a40 con 0x55cfa0762000 2019-02-08 20:36:09.232 7f7db94ca700 1 -- [v2:172.21.15.29:6820/114039,v1:172.21.15.29:6821/114039] --> [v2:172.21.15.29:6828/114228,v1:172.21.15.29:6829/114228] -- pg_notify((query:30 sent:30 2.3( empty local-lis/les=17/18 n=0 ec=17/17 lis/c 17/17 les/c/f 18/18/0 30/30/30))=([17,29] intervals=([17,25] acting 2,3)) epoch 30) v6 -- 0x55cfa058cb40 con 0x55cfa0762000 2019-02-08 20:36:09.232 7f7dda1eb700 1 -- [v2:172.21.15.29:6820/114039,v1:172.21.15.29:6821/114039] <== osd.3 v2:172.21.15.29:6828/114228 3 ==== pg_notify((query:30 sent:30 2.1( v 19'34 (0'0,19'34] local-lis/les=17/18 n=1 ec=17/17 lis/c 17/17 les/c/f 18/18/0 30/30/27))=([17,29] intervals=([27,28] acting 2,3)) epoch 30) v6 ==== 1023+0+0 (4208337317 0 0) 0x55cfa0723a40 con 0x55cfa0762000 2019-02-08 20:36:09.233 7f7dda1eb700 1 -- [v2:172.21.15.29:6820/114039,v1:172.21.15.29:6821/114039] <== osd.3 v2:172.21.15.29:6828/114228 4 ==== pg_notify((query:30 sent:30 2.1( v 19'34 (0'0,19'34] local-lis/les=17/18 n=1 ec=17/17 lis/c 17/17 les/c/f 18/18/0 30/30/27))=([17,29] intervals=([27,28] acting 2,3)) epoch 30) v6 ==== 1023+0+0 (4208337317 0 0) 0x55cfa0723a40 con 0x55cfa0762000 2019-02-08 20:36:10.205 7f7db54c2700 1 -- [v2:172.21.15.29:6820/114039,v1:172.21.15.29:6821/114039] --> [v2:172.21.15.29:6828/114228,v1:172.21.15.29:6829/114228] -- pg_notify((query:31 sent:31 2.3( empty local-lis/les=17/18 n=0 ec=17/17 lis/c 17/17 les/c/f 18/18/0 30/30/30))=([17,29] intervals=([17,25] acting 2,3)) epoch 31) v6 -- 0x55cfa0723c20 con 0x55cfa0762000 2019-02-08 20:36:10.205 7f7db94ca700 1 -- [v2:172.21.15.29:6820/114039,v1:172.21.15.29:6821/114039] --> [v2:172.21.15.29:6828/114228,v1:172.21.15.29:6829/114228] -- pg_notify((query:31 sent:31 2.b( v 19'26 (0'0,19'26] local-lis/les=17/18 n=1 ec=17/17 lis/c 17/17 les/c/f 18/18/0 30/30/30))=([17,29] intervals=([17,25] acting 2,3)) epoch 31) v6 -- 0x55cfa058c960 con 0x55cfa0762000 2019-02-08 20:36:10.206 7f7db64c4700 1 -- [v2:172.21.15.29:6820/114039,v1:172.21.15.29:6821/114039] --> [v2:172.21.15.29:6828/114228,v1:172.21.15.29:6829/114228] -- pg_info((query:31 sent:31 2.1( v 19'34 (0'0,19'34] local-lis/les=30/31 n=1 ec=17/17 lis/c 17/17 les/c/f 18/18/0 30/30/27))=([17,29] intervals=([27,28] acting 2,3)) epoch 31) v5 -- 0x55cfa058c3c0 con 0x55cfa0762000
...where the first message is delivered twice!
Actions