Project

General

Profile

Actions

Bug #19878

closed

msg/simple: reports lossy con on osd->osd connection

Added by Sage Weil about 7 years ago. Updated almost 7 years ago.

Status:
Can't reproduce
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2017-05-06 06:43:21.889123 7fb9953cf700  1 -- 172.21.15.24:6806/6238 <== osd.4 172.21.15.24:6810/1092 2 ==== pg_query(0.6 epoch 31) v4 ==== 158+0+0 (2521712471 0 0) 0x7fb9b72a0400 con 0x7fb9b692c700
2017-05-06 06:43:21.916744 7fb98c3bd700  1 -- 172.21.15.24:6806/6238 --> 172.21.15.24:6810/1092 -- pg_notify((query:29 sent:29 0.6( empty local-lis/les=5/6 n=0 ec=1 lis/c 5/5 les/c/f 6/6/0 20/20/5))=([5,19] intervals=([5,19] acting 0,4,5)) (query:29 sent:29 1.3( v 19'32 (0'0,19'32] local-lis/les=8/9 n=20 ec=8 lis/c 8/8 les/c/f 9/10/0 20/20/8))=([8,19] intervals=([8,19] acting 3,4,5)) epoch 31) v6 -- ?+0 0x7fb9b7480780 con 0x7fb9b692c700
2017-05-06 06:43:21.916760 7fb98c3bd700  0 -- 172.21.15.24:6806/6238 submit_message pg_notify((query:29 sent:29 0.6( empty local-lis/les=5/6 n=0 ec=1 lis/c 5/5 les/c/f 6/6/0 20/20/5))=([5,19] intervals=([5,19] acting 0,4,5)) (query:29 sent:29 1.3( v 19'32 (0'0,19'32] local-lis/les=8/9 n=20 ec=8 lis/c 8/8 les/c/f 9/10/0 20/20/8))=([8,19] intervals=([8,19] acting 3,4,5)) epoch 31) v6 remote, 172.21.15.24:6810/1092, failed lossy con, dropping message 0x7fb9b7480780
2017-05-06 06:43:21.925242 7fb98c3bd700  1 -- 172.21.15.24:6806/6238 --> 172.21.15.24:6810/1092 -- pg_notify((query:31 sent:31 1.3( v 19'32 (0'0,19'32] local-lis/les=8/9 n=20 ec=8 lis/c 8/8 les/c/f 9/10/0 31/31/8))=([8,30] intervals=([20,30] acting 3,4)) (query:31 sent:31 0.6( empty local-lis/les=5/6 n=0 ec=1 lis/c 5/5 les/c/f 6/6/0 31/31/5))=([5,30] intervals=([20,30] acting 0,4)) epoch 31) v6 -- ?+0 0x7fb9b6433de0 con 0x7fb9b692c700
2017-05-06 06:43:21.925263 7fb98c3bd700  0 -- 172.21.15.24:6806/6238 submit_message pg_notify((query:31 sent:31 1.3( v 19'32 (0'0,19'32] local-lis/les=8/9 n=20 ec=8 lis/c 8/8 les/c/f 9/10/0 31/31/8))=([8,30] intervals=([20,30] acting 3,4)) (query:31 sent:31 0.6( empty local-lis/les=5/6 n=0 ec=1 lis/c 5/5 les/c/f 6/6/0 31/31/5))=([5,30] intervals=([20,30] acting 0,4)) epoch 31) v6 remote, 172.21.15.24:6810/1092, failed lossy con, dropping message 0x7fb9b6433de0
2017-05-06 06:43:21.926811 7fb98c3bd700  1 -- 172.21.15.24:6806/6238 --> 172.21.15.24:6810/1092 -- pg_notify((query:31 sent:31 0.6( empty local-lis/les=5/6 n=0 ec=1 lis/c 20/20 les/c/f 21/21/0 31/31/5))=([5,30] intervals=([20,30] acting 0,4)) (query:31 sent:31 1.3( v 19'32 (0'0,19'32] local-lis/les=8/9 n=20 ec=8 lis/c 20/20 les/c/f 21/21/0 31/31/8))=([8,30] intervals=([20,30] acting 3,4)) epoch 31) v6 -- ?+0 0x7fb9b66e9c00 con 0x7fb9b692c700
2017-05-06 06:43:21.926830 7fb98c3bd700  0 -- 172.21.15.24:6806/6238 submit_message pg_notify((query:31 sent:31 0.6( empty local-lis/les=5/6 n=0 ec=1 lis/c 20/20 les/c/f 21/21/0 31/31/5))=([5,30] intervals=([20,30] acting 0,4)) (query:31 sent:31 1.3( v 19'32 (0'0,19'32] local-lis/les=8/9 n=20 ec=8 lis/c 20/20 les/c/f 21/21/0 31/31/8))=([8,30] intervals=([20,30] acting 3,4)) epoch 31) v6 remote, 172.21.15.24:6810/1092, failed lossy con, dropping message 0x7fb9b66e9c00
...

that is from the osd.5 log.

the other end (osd.4) injected a single failure:

2017-05-06 06:43:21.886236 7f8a2e7ec700  1 -- 172.21.15.24:6810/1092 --> 172.21.15.24:6806/6238 -- pg_query(1.3 epoch 31) v4 -- ?+0 0x7f8a5a5cc000 con 0x7f8a59ccbc00
2017-05-06 06:43:21.886905 7f8a2efed700  1 -- 172.21.15.24:6810/1092 --> 172.21.15.24:6806/6238 -- pg_query(0.6 epoch 31) v4 -- ?+0 0x7f8a59b7bc00 con 0x7f8a59ccbc00
2017-05-06 06:43:21.887498 7f8a2079f700  0 -- 172.21.15.24:6810/1092 >> 172.21.15.24:6806/6238 pipe(0x7f8a5b1e2000 sd=152 :35128 s=1 pgs=0 cs=0 l=0 c=0x7f8a59ccbc00).injecting socket failure
2017-05-06 06:43:21.887541 7f8a2079f700  0 -- 172.21.15.24:6810/1092 >> 172.21.15.24:6806/6238 pipe(0x7f8a5b1e2000 sd=152 :35128 s=1 pgs=0 cs=0 l=0 c=0x7f8a59ccbc00).fault
2017-05-06 06:43:21.888751 7f8a2079f700 10 osd.4 31  new session (outgoing) 0x7f8a59b79800 con=0x7f8a59ccbc00 addr=172.21.15.24:6806/6238
2017-05-06 06:43:22.957287 7f8a2e7ec700  1 -- 172.21.15.24:6810/1092 --> 172.21.15.24:6806/6238 -- pg_notify((query:32 sent:32 0.10( empty local-lis/les=20/21 n=0 ec=1 lis/c 20/20 les/c/f 21/21/0 31/32/32))=([20,31] intervals=([20,30] acting 0,4)) epoch 32) v6 -- ?+0 0x7f8a5a137c00 con 0x7f8a59ccbc00
2017-05-06 06:43:24.010697 7f8a2efed700  1 -- 172.21.15.24:6810/1092 --> 172.21.15.24:6806/6238 -- pg_notify((query:33 sent:33 0.10( empty local-lis/les=20/21 n=0 ec=1 lis/c 20/20 les/c/f 21/21/0 31/32/32))=([20,31] intervals=([20,30] acting 0,4)) epoch 33) v6 -- ?+0 0x7f8a5a216760 con 0x7f8a59ccbc00
2017-05-06 06:43:26.562970 7f8a2efed700 10 osd.4 34 send_incremental_map 33 -> 34 to 0x7f8a59ccbc00 172.21.15.24:6806/6238
2017-05-06 06:43:26.562976 7f8a2efed700  1 -- 172.21.15.24:6810/1092 --> 172.21.15.24:6806/6238 -- osd_map(34..34 src has 1..34) v3 -- ?+0 0x7f8a59c55840 con 0x7f8a59ccbc00
2017-05-06 06:43:26.562984 7f8a2efed700  1 -- 172.21.15.24:6810/1092 --> 172.21.15.24:6806/6238 -- pg_notify((query:34 sent:34 0.10( empty local-lis/les=20/21 n=0 ec=1 lis/c 20/20 les/c/f 21/21/0 31/32/32))=([20,31] intervals=([20,30] acting 0,4)) epoch 34) v6 -- ?+0 0x7f8a59ae9680 con 0x7f8a59ccbc00
2017-05-06 06:43:27.655483 7f8a2e7ec700 10 osd.4 35 send_incremental_map 34 -> 35 to 0x7f8a59ccbc00 172.21.15.24:6806/6238
2017-05-06 06:43:27.655561 7f8a2e7ec700  1 -- 172.21.15.24:6810/1092 --> 172.21.15.24:6806/6238 -- osd_map(35..35 src has 1..35) v3 -- ?+0 0x7f8a59ba6f40 con 0x7f8a59ccbc00
2017-05-06 06:43:27.655590 7f8a2e7ec700  1 -- 172.21.15.24:6810/1092 --> 172.21.15.24:6806/6238 -- pg_notify((query:35 sent:35 0.10( empty local-lis/les=20/21 n=0 ec=1 lis/c 20/20 les/c/f 21/21/0 31/32/32))=([20,31] intervals=([20,30] acting 0,4)) epoch 35) v6 -- ?+0 0x7f8a5a216b20 con 0x7f8a59ccbc00
...

/a/sage-2017-05-06_05:54:44-rados-wip-sage-testing2---basic-smithi/1108881

Actions

Also available in: Atom PDF