Project

General

Profile

Actions

Bug #18525

closed

simple messenger hung message delivery

Added by Samuel Just over 7 years ago. Updated about 3 years ago.

Status:
Won't Fix
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

2017-01-13 07:57:31.873285 7f392f818700 20 osd.0 26 should_share_map osd.2 172.21.15.6:6811/26019 26
2017-01-13 07:57:31.873288 7f3930323700 1 -- 172.21.15.6:6801/25316 >> 172.21.15.6:6811/26019 pipe(0x7f3966a8a800 sd=41 :6801 s=2 pgs=516 cs=954 l=0 c=0x7f3960605200).do_sendmsg error (32) Broken pipe
2017-01-13 07:57:31.873303 7f3930323700 1 -- 172.21.15.6:6801/25316 >> 172.21.15.6:6811/26019 pipe(0x7f3966a8a800 sd=41 :6801 s=2 pgs=516 cs=954 l=0 c=0x7f3960605200).writer error sending 0x7f3963fa8a00, (32) Broken pipe
2017-01-13 07:57:31.873319 7f3930323700 0 -- 172.21.15.6:6801/25316 >> 172.21.15.6:6811/26019 pipe(0x7f3966a8a800 sd=41 :6801 s=2 pgs=516 cs=954 l=0 c=0x7f3960605200).fault, initiating reconnect
2017-01-13 07:57:31.873421 7f1993d22700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20053 1.a 1:5afc5b82:::benchmark_data_smithi006_26794_object20052:head v 26'1223) v1 -- ?+4923 0x7f19bd501800 con 0x7f19bc686100
2017-01-13 07:57:31.873467 7f198fd1a700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20051 1.d 1:b45c2612:::benchmark_data_smithi006_26794_object20050:head v 26'1202) v1 -- ?+4923 0x7f19bca9ca00 con 0x7f19bc686100
2017-01-13 07:57:31.873543 7f199151d700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20054 1.5 1:a70912a0:::benchmark_data_smithi006_26794_object20053:head v 26'1283) v1 -- ?+4923 0x7f19bdd08600 con 0x7f19bc686100
2017-01-13 07:57:31.873829 7f1989600700 10 osd.2 26 new session 0x7f19c2e3e000 con=0x7f19c24bd000 addr=172.21.15.6:6801/25316
2017-01-13 07:57:31.873852 7f1989600700 0 -- 172.21.15.6:6811/26019 >> 172.21.15.6:6801/25316 pipe(0x7f19ba630000 sd=40 :6811 s=0 pgs=0 cs=0 l=0 c=0x7f19c24bd000).accept connect_seq 955 vs existing 955 state connecting
2017-01-13 07:57:31.874453 7f199a52f700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop_reply(client.4212.0:20045 1.2 ondisk, result = 0) v1 -- ?+0 0x7f19c2e36a00 con 0x7f19bc686100
2017-01-13 07:57:31.874863 7f1991d1e700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20055 1.e 1:7974fbc7:::benchmark_data_smithi006_26794_object20054:head v 26'1220) v1 -- ?+4923 0x7f19bdcc9200 con 0x7f19bc686100
2017-01-13 07:57:31.874914 7f1990d1c700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20056 1.1 1:8d6b2be4:::benchmark_data_smithi006_26794_object20055:head v 26'1268) v1 -- ?+4923 0x7f19bded4600 con 0x7f19bc686100
2017-01-13 07:57:31.875743 7f199a52f700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop_reply(client.4212.0:20052 1.0 ondisk, result = 0) v1 -- ?+0 0x7f19c2e36780 con 0x7f19bc686100
2017-01-13 07:57:31.876485 7f199251f700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20057 1.d 1:ba44587f:::benchmark_data_smithi006_26794_object20056:head v 26'1203) v1 -- ?+4923 0x7f19be38d200 con 0x7f19bc686100
2017-01-13 07:57:31.876648 7f199151d700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20059 1.5 1:acdf531c:::benchmark_data_smithi006_26794_object20058:head v 26'1284) v1 -- ?+4923 0x7f19bd500000 con 0x7f19bc686100
2017-01-13 07:57:31.876681 7f198f519700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20060 1.e 1:72847a88:::benchmark_data_smithi006_26794_object20059:head v 26'1221) v1 -- ?+4923 0x7f19bcc83200 con 0x7f19bc686100
2017-01-13 07:57:31.877910 7f199251f700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20061 1.8 1:1d7a3b5f:::benchmark_data_smithi006_26794_object20060:head v 26'1228) v1 -- ?+4923 0x7f19be078000 con 0x7f19bc686100
2017-01-13 07:57:31.878196 7f198fd1a700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20062 1.d 1:b9e4901e:::benchmark_data_smithi006_26794_object20061:head v 26'1204) v1 -- ?+4923 0x7f19bdfc7600 con 0x7f19bc686100
2017-01-13 07:57:31.934168 7f198e517700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- pg_update_log_missing(1.8 epoch 26 rep_tid 12529 entries 26'1229 (15'1) l_delete 1:19febd45:::new_8:head by unknown.0.0:0 2017-01-13 07:57:31.933494 0) v1 -- ?+0 0x7f19bdab1d40 con 0x7f19bc686100
2017-01-13 08:01:41.564544 7f1989903700 0 -- 172.21.15.6:6811/26019 >> 172.21.15.6:6801/25316 pipe(0x7f19ba5ed400 sd=39 :52546 s=1 pgs=757 cs=955 l=0 c=0x7f19bc686100).fault
2017-01-13 08:01:41.738454 7f199a52f700 1 -- 172.21.15.6:6811/26019 mark_down 172.21.15.6:6801/25316 -- 0x7f19ba5ed400

Actions #1

Updated by Samuel Just over 7 years ago

samuelj@teuthology:/a/samuelj-2017-01-13_06:30:16-rados-wip-sam-testing---basic-smithi/714124

Actions #2

Updated by Samuel Just over 7 years ago

2017-01-13 07:57:31.867423 7f3940848700 1 -- 172.21.15.6:6801/25316 --> 172.21.15.6:6811/26019 -- osd_repop_reply(client.4212.0:20038 1.8 ondisk, result = 0) v1 -- ?+0 0x7f3968f03680 con 0x7f3960605200
2017-01-13 07:57:31.867834 7f392f818700 1 -- 172.21.15.6:6801/25316 <== osd.2 172.21.15.6:6811/26019 20297 ==== osd_repop_reply(client.4212.0:20028 1.0) v1 ==== 83+0+0 (2475162598 0 0) 0x7f3968f02780 con 0x7f3960605200
2017-01-13 07:57:31.867849 7f392f818700 20 osd.0 26 should_share_map osd.2 172.21.15.6:6811/26019 26
2017-01-13 07:57:31.868229 7f392f818700 1 -- 172.21.15.6:6801/25316 <== osd.2 172.21.15.6:6811/26019 20298 ==== osd_repop(client.4212.0:20046 1.a) v1 ==== 998+0+4923 (337853331 0 1929454765) 0x7f396423b800 con 0x7f3960605200
2017-01-13 07:57:31.868247 7f392f818700 20 osd.0 26 should_share_map osd.2 172.21.15.6:6811/26019 26
2017-01-13 07:57:31.869295 7f3940848700 1 -- 172.21.15.6:6801/25316 --> 172.21.15.6:6811/26019 -- osd_repop_reply(client.4212.0:20039 1.3 ondisk, result = 0) v1 -- ?+0 0x7f3968f03900 con 0x7f3960605200
2017-01-13 07:57:31.869834 7f393a03b700 1 -- 172.21.15.6:6801/25316 --> 172.21.15.6:6811/26019 -- osd_repop(client.4212.0:20052 1.0 1:06cb7c34:::benchmark_data_smithi006_26794_object20051:head v 26'1273) v1 -- ?+4923 0x7f396423b200 con 0x7f3960605200

<-- last message received -- > 2017-01-13 07:57:31.870860 7f3940848700 1 -- 172.21.15.6:6801/25316 --> 172.21.15.6:6811/26019 -- osd_repop_reply(client.4212.0:20046 1.a ondisk, result = 0) v1 -- ?+0 0x7f3968f03b80 con 0x7f3960605200

2017-01-13 07:57:31.871042 7f392f818700 1 -- 172.21.15.6:6801/25316 <== osd.2 172.21.15.6:6811/26019 20299 ==== osd_repop(client.4212.0:20048 1.3) v1 ==== 998+0+4923 (1748955297 0 3122996100) 0x7f3963f83200 con 0x7f3960605200
2017-01-13 07:57:31.871061 7f392f818700 20 osd.0 26 should_share_map osd.2 172.21.15.6:6811/26019 26
2017-01-13 07:57:31.872231 7f392f818700 1 -- 172.21.15.6:6801/25316 <== osd.2 172.21.15.6:6811/26019 20300 ==== osd_repop_reply(client.4212.0:20030 1.2) v1 ==== 83+0+0 (3115902518 0 0) 0x7f3968f02780 con 0x7f3960605200
2017-01-13 07:57:31.872245 7f392f818700 20 osd.0 26 should_share_map osd.2 172.21.15.6:6811/26019 26
2017-01-13 07:57:31.872321 7f392f818700 1 -- 172.21.15.6:6801/25316 <== osd.2 172.21.15.6:6811/26019 20301 ==== osd_repop_reply(client.4212.0:20033 1.2) v1 ==== 83+0+0 (3586235134 0 0) 0x7f3968f02000 con 0x7f3960605200
2017-01-13 07:57:31.872335 7f392f818700 20 osd.0 26 should_share_map osd.2 172.21.15.6:6811/26019 26
2017-01-13 07:57:31.872393 7f392f818700 1 -- 172.21.15.6:6801/25316 <== osd.2 172.21.15.6:6811/26019 20302 ==== osd_repop_reply(client.4212.0:20031 1.4) v1 ==== 83+0+0 (3649600484 0 0) 0x7f3968f02500 con 0x7f3960605200
2017-01-13 07:57:31.872405 7f392f818700 20 osd.0 26 should_share_map osd.2 172.21.15.6:6811/26019 26
2017-01-13 07:57:31.872528 7f392fa1a700 10 osd.0 26 new session 0x7f3968583520 con=0x7f3968f22000 addr=172.21.15.6:6811/26019
2017-01-13 07:57:31.872906 7f392f818700 1 -- 172.21.15.6:6801/25316 <== osd.2 172.21.15.6:6811/26019 20303 ==== osd_repop_reply(client.4212.0:20040 1.2) v1 ==== 83+0+0 (260871935 0 0) 0x7f3968f02500 con 0x7f3960605200
2017-01-13 07:57:31.872920 7f392f818700 20 osd.0 26 should_share_map osd.2 172.21.15.6:6811/26019 26
2017-01-13 07:57:31.872919 7f3940848700 1 -- 172.21.15.6:6801/25316 --> 172.21.15.6:6811/26019 -- osd_repop_reply(client.4212.0:20048 1.3 ondisk, result = 0) v1 -- ?+0 0x7f3968f03e00 con 0x7f3960605200
2017-01-13 07:57:31.873202 7f3936834700 1 -- 172.21.15.6:6801/25316 --> 172.21.15.6:6811/26019 -- osd_repop(client.4212.0:20058 1.c 1:3c164cb6:::benchmark_data_smithi006_26794_object20057:head v 26'1224) v1 -- ?+4923 0x7f3963fa8a00 con 0x7f3960605200

<-- last message received --> 2017-01-13 07:57:31.873270 7f392f818700 1 -- 172.21.15.6:6801/25316 <== osd.2 172.21.15.6:6811/26019 20304 ==== osd_repop_reply(client.4212.0:20041 1.2) v1 ==== 83+0+0 (2072148477 0 0) 0x7f3968f02500 con 0x7f3960605200

2017-01-13 07:57:31.873285 7f392f818700 20 osd.0 26 should_share_map osd.2 172.21.15.6:6811/26019 26
2017-01-13 07:57:31.873288 7f3930323700 1 -- 172.21.15.6:6801/25316 >> 172.21.15.6:6811/26019 pipe(0x7f3966a8a800 sd=41 :6801 s=2 pgs=516 cs=954 l=0 c=0x7f3960605200).do_sendmsg error (32) Broken pipe
2017-01-13 07:57:31.873303 7f3930323700 1 -- 172.21.15.6:6801/25316 >> 172.21.15.6:6811/26019 pipe(0x7f3966a8a800 sd=41 :6801 s=2 pgs=516 cs=954 l=0 c=0x7f3960605200).writer error sending 0x7f3963fa8a00, (32) Broken pipe
2017-01-13 07:57:31.873319 7f3930323700 0 -- 172.21.15.6:6801/25316 >> 172.21.15.6:6811/26019 pipe(0x7f3966a8a800 sd=41 :6801 s=2 pgs=516 cs=954 l=0 c=0x7f3960605200).fault, initiating reconnect

> ceph-osd.2.log.filtered <
<-- last message received --> 2017-01-13 07:57:31.868701 7f199a52f700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop_reply(client.4212.0:20041 1.2 ondisk, result = 0) v1 -- ?+0 0x7f19c2cb9200 con 0x7f19bc686100

2017-01-13 07:57:31.868751 7f199a52f700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop_reply(client.4212.0:20042 1.2 ondisk, result = 0) v1 -- ?+0 0x7f19c2d72a00 con 0x7f19bc686100
2017-01-13 07:57:31.868985 7f198fd1a700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20049 1.8 1:1b017329:::benchmark_data_smithi006_26794_object20048:head v 26'1227) v1 -- ?+4923 0x7f19bc8a4000 con 0x7f19bc686100
2017-01-13 07:57:31.869470 7f199a52f700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop_reply(client.4212.0:20044 1.4 ondisk, result = 0) v1 -- ?+0 0x7f19bfec1980 con 0x7f19bc686100
2017-01-13 07:57:31.869897 7f1989c06700 1 -- 172.21.15.6:6811/26019 <== osd.0 172.21.15.6:6801/25316 20146 ==== osd_repop(client.4212.0:20045 1.2) v1 ==== 998+0+4923 (2500886376 0 2164161025) 0x7f19bcad8a00 con 0x7f19bc686100
2017-01-13 07:57:31.869914 7f1989c06700 20 osd.2 26 should_share_map osd.0 172.21.15.6:6801/25316 26
2017-01-13 07:57:31.869952 7f1989c06700 1 -- 172.21.15.6:6811/26019 <== osd.0 172.21.15.6:6801/25316 20147 ==== osd_repop_reply(client.4212.0:20036 1.f) v1 ==== 83+0+0 (2357915620 0 0) 0x7f19c2de8080 con 0x7f19bc686100
2017-01-13 07:57:31.869964 7f1989c06700 20 osd.2 26 should_share_map osd.0 172.21.15.6:6801/25316 26
2017-01-13 07:57:31.869990 7f1989c06700 1 -- 172.21.15.6:6811/26019 <== osd.0 172.21.15.6:6801/25316 20148 ==== osd_repop_reply(client.4212.0:20037 1.1) v1 ==== 83+0+0 (3540462945 0 0) 0x7f19c1f7b680 con 0x7f19bc686100
2017-01-13 07:57:31.870000 7f1989c06700 20 osd.2 26 should_share_map osd.0 172.21.15.6:6801/25316 26
2017-01-13 07:57:31.870275 7f1989c06700 1 -- 172.21.15.6:6811/26019 <== osd.0 172.21.15.6:6801/25316 20149 ==== osd_repop_reply(client.4212.0:20038 1.8) v1 ==== 83+0+0 (3910556885 0 0) 0x7f19c2e36000 con 0x7f19bc686100
2017-01-13 07:57:31.870287 7f1989c06700 20 osd.2 26 should_share_map osd.0 172.21.15.6:6801/25316 26
2017-01-13 07:57:31.870632 7f1989c06700 1 -- 172.21.15.6:6811/26019 <== osd.0 172.21.15.6:6801/25316 20150 ==== osd_repop_reply(client.4212.0:20039 1.3) v1 ==== 83+0+0 (731612832 0 0) 0x7f19c2c93e00 con 0x7f19bc686100
2017-01-13 07:57:31.870647 7f1989c06700 20 osd.2 26 should_share_map osd.0 172.21.15.6:6801/25316 26
2017-01-13 07:57:31.870899 7f1989c06700 1 -- 172.21.15.6:6811/26019 <== osd.0 172.21.15.6:6801/25316 20151 ==== osd_repop(client.4212.0:20052 1.0) v1 ==== 998+0+4923 (2529046440 0 3311209141) 0x7f19bdd0b000 con 0x7f19bc686100
2017-01-13 07:57:31.870907 7f199a52f700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop_reply(client.4212.0:20043 1.2 ondisk, result = 0) v1 -- ?+0 0x7f19c2d77400 con 0x7f19bc686100
2017-01-13 07:57:31.870944 7f1989c06700 20 osd.2 26 should_share_map osd.0 172.21.15.6:6801/25316 26
2017-01-13 07:57:31.871096 7f199a52f700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop_reply(client.4212.0:20047 1.6 ondisk, result = 0) v1 -- ?+0 0x7f19c2d72780 con 0x7f19bc686100
2017-01-13 07:57:31.871734 7f199251f700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20050 1.d 1:bafc8def:::benchmark_data_smithi006_26794_object20049:head v 26'1201) v1 -- ?+4923 0x7f19bc8a9800 con 0x7f19bc686100

<-- last message received --> 2017-01-13 07:57:31.872095 7f1989c06700 1 -- 172.21.15.6:6811/26019 <== osd.0 172.21.15.6:6801/25316 20152 ==== osd_repop_reply(client.4212.0:20046 1.a) v1 ==== 83+0+0 (129685172 0 0) 0x7f19c2e36000 con 0x7f19bc686100

2017-01-13 07:57:31.872111 7f1989c06700 20 osd.2 26 should_share_map osd.0 172.21.15.6:6801/25316 26

<-- fault --> 2017-01-13 07:57:31.872127 7f1989c06700 0 -- 172.21.15.6:6811/26019 >> 172.21.15.6:6801/25316 pipe(0x7f19ba5ed400 sd=39 :52544 s=2 pgs=757 cs=954 l=0 c=0x7f19bc686100).injecting socket failure
<-- reconnect --> 2017-01-13 07:57:31.872157 7f1989c06700 0 -- 172.21.15.6:6811/26019 >> 172.21.15.6:6801/25316 pipe(0x7f19ba5ed400 sd=39 :52544 s=2 pgs=757 cs=954 l=0 c=0x7f19bc686100).fault, initiating reconnect

2017-01-13 07:57:31.873421 7f1993d22700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20053 1.a 1:5afc5b82:::benchmark_data_smithi006_26794_object20052:head v 26'1223) v1 -- ?+4923 0x7f19bd501800 con 0x7f19bc686100
2017-01-13 07:57:31.873467 7f198fd1a700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20051 1.d 1:b45c2612:::benchmark_data_smithi006_26794_object20050:head v 26'1202) v1 -- ?+4923 0x7f19bca9ca00 con 0x7f19bc686100
2017-01-13 07:57:31.873543 7f199151d700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20054 1.5 1:a70912a0:::benchmark_data_smithi006_26794_object20053:head v 26'1283) v1 -- ?+4923 0x7f19bdd08600 con 0x7f19bc686100
2017-01-13 07:57:31.873829 7f1989600700 10 osd.2 26 new session 0x7f19c2e3e000 con=0x7f19c24bd000 addr=172.21.15.6:6801/25316

<-- race? --> 2017-01-13 07:57:31.873852 7f1989600700 0 -- 172.21.15.6:6811/26019 >> 172.21.15.6:6801/25316 pipe(0x7f19ba630000 sd=40 :6811 s=0 pgs=0 cs=0 l=0 c=0x7f19c24bd000).accept connect_seq 955 vs existing 955 state connecting

2017-01-13 07:57:31.874453 7f199a52f700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop_reply(client.4212.0:20045 1.2 ondisk, result = 0) v1 -- ?+0 0x7f19c2e36a00 con 0x7f19bc686100
2017-01-13 07:57:31.874863 7f1991d1e700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20055 1.e 1:7974fbc7:::benchmark_data_smithi006_26794_object20054:head v 26'1220) v1 -- ?+4923 0x7f19bdcc9200 con 0x7f19bc686100
2017-01-13 07:57:31.874914 7f1990d1c700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20056 1.1 1:8d6b2be4:::benchmark_data_smithi006_26794_object20055:head v 26'1268) v1 -- ?+4923 0x7f19bded4600 con 0x7f19bc686100
2017-01-13 07:57:31.875743 7f199a52f700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop_reply(client.4212.0:20052 1.0 ondisk, result = 0) v1 -- ?+0 0x7f19c2e36780 con 0x7f19bc686100
2017-01-13 07:57:31.876485 7f199251f700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20057 1.d 1:ba44587f:::benchmark_data_smithi006_26794_object20056:head v 26'1203) v1 -- ?+4923 0x7f19be38d200 con 0x7f19bc686100
2017-01-13 07:57:31.876648 7f199151d700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20059 1.5 1:acdf531c:::benchmark_data_smithi006_26794_object20058:head v 26'1284) v1 -- ?+4923 0x7f19bd500000 con 0x7f19bc686100
2017-01-13 07:57:31.876681 7f198f519700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20060 1.e 1:72847a88:::benchmark_data_smithi006_26794_object20059:head v 26'1221) v1 -- ?+4923 0x7f19bcc83200 con 0x7f19bc686100
2017-01-13 07:57:31.877910 7f199251f700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20061 1.8 1:1d7a3b5f:::benchmark_data_smithi006_26794_object20060:head v 26'1228) v1 -- ?+4923 0x7f19be078000 con 0x7f19bc686100
2017-01-13 07:57:31.878196 7f198fd1a700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- osd_repop(client.4212.0:20062 1.d 1:b9e4901e:::benchmark_data_smithi006_26794_object20061:head v 26'1204) v1 -- ?+4923 0x7f19bdfc7600 con 0x7f19bc686100
2017-01-13 07:57:31.934168 7f198e517700 1 -- 172.21.15.6:6811/26019 --> 172.21.15.6:6801/25316 -- pg_update_log_missing(1.8 epoch 26 rep_tid 12529 entries 26'1229 (15'1) l_delete 1:19febd45:::new_8:head by unknown.0.0:0 2017-01-13 07:57:31.933494 0) v1 -- ?+0 0x7f19bdab1d40 con 0x7f19bc686100

<-- run over after this -->

2017-01-13 08:01:41.564544 7f1989903700 0 -- 172.21.15.6:6811/26019 >> 172.21.15.6:6801/25316 pipe(0x7f19ba5ed400 sd=39 :52546 s=1 pgs=757 cs=955 l=0 c=0x7f19bc686100).fault
2017-01-13 08:01:41.738454 7f199a52f700 1 -- 172.21.15.6:6811/26019 mark_down 172.21.15.6:6801/25316 -- 0x7f19ba5ed400

One side definitely detected a race, but I don't see any evidence of either side retrying.

Actions #3

Updated by Samuel Just over 7 years ago

  • Priority changed from Immediate to Urgent
Actions #4

Updated by Greg Farnum over 7 years ago

  • Status changed from New to Need More Info
  • Priority changed from Urgent to High

The connect_seq being identical is expected behavior IIRC — each side increments by one, because it's the next connection after a socket failure.

I looked at the raw logs a bit and don't see anything obvious — just that osd.0 claims to be reconnecting, but then nothing happens, and osd.2 sees it in accept().

I also looked at the patches from Jewel to Kraken and don't see anything which I think could have caused this. Unless it turns up again I think it's not new and we'd need more logging or a core dump or something to work it out.

Actions #5

Updated by Sage Weil about 3 years ago

  • Status changed from Need More Info to Won't Fix
Actions

Also available in: Atom PDF