Project

General

Profile

Support #37300

osd handle_connect_msg accept replacing existing

Added by Jiaying Ren over 2 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Tags:
Reviewed:
Affected Versions:
Pull request ID:

Description

env: OpenStack+Ceph v10.2.9(enable async msgr)

issue:

The log file of osd.179 is flooded by the following log:

2018-11-16 14:59:58.449543 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.170:0/354816366 conn(0x557457c11800 sd=53632 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:00:04.440369 7f6add9d7700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.142:0/1479828853 conn(0x557458ee8800 sd=53638 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:00:10.656384 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.173:0/127357824 conn(0x55744ff37800 sd=52112 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:00:24.228010 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.168:0/986497073 conn(0x557457c70000 sd=53635 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:00:26.936781 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.163:0/3287209790 conn(0x557446cd7800 sd=2110 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:00:30.261136 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.179:0/3566666434 conn(0x557449968000 sd=2058 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:00:38.776642 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.174:0/4077945972 conn(0x55745eea7800 sd=2265 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:01:38.320772 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.132:0/2498729021 conn(0x557448fcb800 sd=53499 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:01:48.430206 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.184:0/157573435 conn(0x55744c81b000 sd=538 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:02:08.955684 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.165:0/840468506 conn(0x557460246800 sd=2543 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:02:54.961321 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.103:0/1108341066 conn(0x55744f3aa800 sd=53634 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:02:59.793247 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.178:0/2066944419 conn(0x55745794c800 sd=1130 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:03:26.236023 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.130:0/4106283891 conn(0x55744a5c2000 sd=51822 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:03:35.891133 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.158:0/847689455 conn(0x557469954000 sd=52794 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:05:00.807296 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.187:0/4074531340 conn(0x557461c67000 sd=53485 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:05:16.628683 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.127:0/2687272820 conn(0x5574624a2800 sd=1889 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:05:20.207862 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.154:0/3609325474 conn(0x557456129800 sd=824 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:05:26.874485 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.188:0/2709870980 conn(0x55745d9a0800 sd=2999 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:06:02.539731 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.155:0/1913539433 conn(0x557457692800 sd=52422 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:06:53.750921 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.180:0/199822062 conn(0x557457de4000 sd=3250 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:07:03.086726 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.121:0/3084646980 conn(0x55745bc57000 sd=1897 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:07:11.897467 7f6ac382b700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.156:0/3726521212 conn(0x557461230800 sd=52122 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:07:29.500696 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.101:0/4026043526 conn(0x557446a6b800 sd=51914 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:07:34.859488 7f6add9d7700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.167:0/1375327626 conn(0x557465cd9800 sd=2471 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:08:10.092340 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.160:0/252480539 conn(0x5574647d5000 sd=2475 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:08:30.934497 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.140:0/1493706979 conn(0x557457c5d800 sd=2714 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:09:32.761921 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.133:0/3289261769 conn(0x557462fde000 sd=53617 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:10:09.981282 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.130:0/1537691294 conn(0x55744e2ae000 sd=3572 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:10:33.367172 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.138:0/3538892117 conn(0x55745e93d000 sd=52115 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:10:39.073323 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.126:0/1551145417 conn(0x557455504000 sd=53555 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:10:41.654218 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.159:0/1801633322 conn(0x5574469f4800 sd=879 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:10:41.731488 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.135:0/1091767424 conn(0x55745b78b800 sd=1546 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:10:42.806070 7f6add9d7700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.162:0/2437625566 conn(0x557458cdb800 sd=307 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:10:53.808076 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.157:0/2720599930 conn(0x557448a38800 sd=52413 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:11:09.459859 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.183:0/692055394 conn(0x55744a5cb000 sd=52347 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:11:23.551825 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.184:0/490344573 conn(0x557458a2a800 sd=1398 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:11:30.674680 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.163:0/3982739524 conn(0x55745bc61000 sd=823 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:11:39.869622 7f6add9d7700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.132:0/1216332332 conn(0x55745f82f000 sd=3045 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:11:48.208395 7f6ade1d8700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.164:0/3456226754 conn(0x55747a4c2800 sd=52549 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:12:00.824901 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.144:0/3773456578 conn(0x55745c8ca000 sd=52629 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:12:09.807702 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.153:0/3924385954 conn(0x557458b56800 sd=3145 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:12:11.297344 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.136:0/1483491980 conn(0x55746904f000 sd=792 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:12:50.578033 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.110:0/2005830698 conn(0x557465cd8000 sd=53151 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:13:18.062979 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.161:0/3311940461 conn(0x55746904c000 sd=2091 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:13:21.654159 7f6add9d7700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.142:0/1873731462 conn(0x5574607d6000 sd=52405 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:13:34.907419 7f6add9d7700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.161:0/3043413561 conn(0x557465ceb800 sd=2437 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:13:46.756211 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.164:0/2664598064 conn(0x55744ce7b800 sd=2631 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:14:19.609246 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.183:0/2603540436 conn(0x557462c75000 sd=53422 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:14:30.624402 7f6add9d7700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.172:0/1546508245 conn(0x55745fc89000 sd=1453 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:14:33.123765 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.132:0/1595911357 conn(0x55745e25b000 sd=1450 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:14:36.198086 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.172:0/2659306931 conn(0x5574620a6800 sd=1454 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:14:37.284095 7f6ac382b700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.104:0/2387322250 conn(0x557446cd6000 sd=3442 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:14:46.645628 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.174:0/1828668731 conn(0x5574511e5800 sd=1243 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:14:46.938979 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.151:0/1047517806 conn(0x557462f03800 sd=2060 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:14:52.084556 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.183:0/3405075819 conn(0x55744db84800 sd=1819 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:14:54.320343 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.118:0/2836108883 conn(0x5574542be800 sd=2501 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:14:55.404994 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.163:0/1885068222 conn(0x557463d18800 sd=1013 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:14:55.724284 7f6ade1d8700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.150:0/2417198406 conn(0x55745fb8d000 sd=1449 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:14:58.497838 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.170:0/354816366 conn(0x55745652d000 sd=1978 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:15:01.192109 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.155:0/3950081354 conn(0x557459f14000 sd=53632 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:15:04.419147 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.110:0/2895458130 conn(0x55744a3aa000 sd=1487 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:15:04.541308 7f6ac382b700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.142:0/1479828853 conn(0x557446994800 sd=53592 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:15:05.410230 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.117:0/593571005 conn(0x557462b1c000 sd=53636 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:15:07.041679 7f6ac382b700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.152:0/1343439055 conn(0x55744bdce000 sd=2071 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:15:08.610216 7f6ade1d8700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.165:0/3951059420 conn(0x55745fb5a800 sd=966 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:15:10.757638 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.173:0/127357824 conn(0x55745487e800 sd=1992 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:15:24.327818 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.168:0/986497073 conn(0x55745fb7f000 sd=52112 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:15:27.038379 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.163:0/3287209790 conn(0x557455453800 sd=53635 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:15:30.361168 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.179:0/3566666434 conn(0x557462fdf800 sd=2110 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:15:38.777044 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.174:0/4077945972 conn(0x55744bd69800 sd=2058 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:15:42.700535 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.119:0/2536291490 conn(0x5574554f1800 sd=2265 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:15:52.179968 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.170:0/3414972543 conn(0x557462716800 sd=2531 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:15:58.160205 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.167:0/3554311482 conn(0x55744ff37800 sd=1387 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:16:02.548863 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.183:0/2799338045 conn(0x55745938d800 sd=2520 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:16:07.380868 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.138:0/3609165562 conn(0x55745126c000 sd=3052 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:16:11.606618 7f6add9d7700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.146:0/2820273089 conn(0x5574500d3800 sd=2048 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:16:38.367945 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.132:0/2498729021 conn(0x5574469bb800 sd=2492 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:16:48.530583 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.184:0/157573435 conn(0x55745bd0e800 sd=53499 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:17:08.986744 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.165:0/840468506 conn(0x557458efd800 sd=1881 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:17:34.473633 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.143:0/1996870271 conn(0x557462fe8800 sd=2543 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:17:41.125657 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.146:0/39661395 conn(0x55744de91000 sd=1564 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:17:54.963319 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.103:0/1108341066 conn(0x5574646e6000 sd=1698 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:17:59.894771 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.178:0/2066944419 conn(0x557462fbd000 sd=53634 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:18:07.329548 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.133:0/2310059898 conn(0x55744d55d000 sd=1415 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:18:07.898392 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.133:0/2248300035 conn(0x55745477b000 sd=712 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:18:26.339628 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.130:0/4106283891 conn(0x55744af0f800 sd=1984 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:18:35.893409 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.158:0/847689455 conn(0x55745bd73000 sd=51822 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:19:08.739670 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.189:0/1436282329 conn(0x557454f77800 sd=52794 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:20:00.907539 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.187:0/4074531340 conn(0x5574566f5000 sd=3470 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:20:16.731165 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.127:0/2687272820 conn(0x557462aea800 sd=53485 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:20:18.814517 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.140:0/972657170 conn(0x55745f5f9000 sd=1889 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:20:20.306754 7f6ade1d8700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.154:0/3609325474 conn(0x55744ba80800 sd=173 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:20:20.984518 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.159:0/3227037867 conn(0x557456129800 sd=824 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:20:26.975282 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.188:0/2709870980 conn(0x55744f3aa800 sd=1891 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:20:46.456642 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.130:0/1244016501 conn(0x557464448800 sd=2999 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:20:47.415251 7f6add1d6700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.185:0/3937316415 conn(0x55744d55e800 sd=2006 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:21:02.540394 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.155:0/1913539433 conn(0x55745abeb000 sd=1972 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:21:07.363109 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.106:0/4017633752 conn(0x557462367000 sd=52422 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:21:10.629840 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.159:0/1699129749 conn(0x55745c239800 sd=1834 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)
2018-11-16 15:21:27.784546 7f6ad404c700  0 -- 10.134.130.13:6840/12698 >> 10.134.140.110:0/418630820 conn(0x557452b3c000 sd=2460 :6840 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy) channel (new one lossy=1)

which had triggered the alerting(the count of this warning log) ,but the
ceph -s output is HEALTH_OK.

Interesting finding, if we reboot the VM and run dd cmd to issue IO
to the rbd image.At first it works,and after several times,
the VM will hang finally(we didn't have any log from librbd side)

After restart the OSD.179, OSD.179 didn't print the error log
again,and the VMs are recovered.

History

#1 Updated by Jiaying Ren over 2 years ago

My bad, the issue shouldn't belong to rgw. would to msgr or async-msgr related?

#2 Updated by Sage Weil over 2 years ago

  • Project changed from rgw to RADOS

#3 Updated by Sage Weil over 2 years ago

  • Status changed from New to Won't Fix

This looks like a bug in asyncmessenger that wasn't fixed in jewel. We aren't backporting fixes to async to jewel... please upgrade to luminous or switch to simplemessenger.

#4 Updated by Greg Farnum over 2 years ago

  • Tracker changed from Bug to Support
  • Status changed from Won't Fix to Closed

#5 Updated by Jiaying Ren over 2 years ago

Thx! We've verified the issue by switching between simple & async messenger for load testing that lasts several days,and confirmed the issue and misc strange logs are caused by async messenger.

#6 Updated by Greg Farnum over 2 years ago

  • Project changed from RADOS to Messengers

#7 Updated by yite gu almost 2 years ago

Sage Weil wrote:

This looks like a bug in asyncmessenger that wasn't fixed in jewel. We aren't backporting fixes to async to jewel... please upgrade to luminous or switch to simplemessenger.

hi,have you manual solution about this problem.

#8 Updated by Darren Wen over 1 year ago

Unfortunately, I had a similar problem. need help...
1.ceph version 10.2.5,ms_type=simple.
2.The following log lasted for nearly two days, osd down.

2019-10-17 20:08:01.162022 7fadc3b5d700  0 -- 172.168.24.42:6822/726639 >> 172.168.24.84:0/3505143052 pipe(0x7fb379a7e000 sd=10992 :6822 s=0 pgs=0 cs=0 l=1 c=0x7fb353ebe780).accept replacing existing (lossy) channel (new one lossy=1)
2019-10-17 20:08:14.716318 7fadc395b700  0 -- 172.168.24.42:6822/726639 >> 172.168.24.69:0/1294123119 pipe(0x7fb384684800 sd=10642 :6822 s=0 pgs=0 cs=0 l=1 c=0x7fb35521ff80).accept replacing existing (lossy) channel (new one lossy=1)
2019-10-17 20:08:15.747573 7fadc3557700  0 -- 172.168.24.42:6822/726639 >> 172.168.24.71:0/4050657060 pipe(0x7fb3865e8800 sd=10994 :6822 s=0 pgs=0 cs=0 l=1 c=0x7fb346056700).accept replacing existing (lossy) channel (new one lossy=1)
2019-10-17 20:08:15.890249 7fada5f83700  0 -- 172.168.24.42:6822/726639 >> 172.168.24.29:0/2365991687 pipe(0x7fb385a04000 sd=11112 :6822 s=0 pgs=0 cs=0 l=1 c=0x7fb358cd7600).accept replacing existing (lossy) channel (new one lossy=1)
2019-10-17 20:08:36.062014 7fae1b80e700  0 -- 172.168.24.42:6822/726639 >> 172.168.24.34:0/662409768 pipe(0x7fb38523a800 sd=10059 :6822 s=0 pgs=0 cs=0 l=1 c=0x7fb349dbd600).accept replacing existing (lossy) channel (new one lossy=1)
2019-10-17 20:08:39.507773 7fada5b7f700  0 -- 172.168.24.42:6822/726639 >> 172.168.24.65:0/1296294713 pipe(0x7fb37ca16000 sd=11114 :6822 s=0 pgs=0 cs=0 l=1 c=0x7fb362708000).accept replacing existing (lossy) channel (new one lossy=1)
2019-10-17 20:08:50.241548 7fade0660700  0 -- 172.168.24.42:6822/726639 >> 172.168.24.66:0/2788383120 pipe(0x7fb37a787400 sd=10647 :6822 s=0 pgs=0 cs=0 l=1 c=0x7fb37df46c00).accept replacing existing (lossy) channel (new one lossy=1)
2019-10-17 20:08:52.012244 7fada577b700  0 -- 172.168.24.42:6822/726639 >> 172.168.24.27:0/2780090797 pipe(0x7fb37a788800 sd=11116 :6822 s=0 pgs=0 cs=0 l=1 c=0x7fb3584e1500).accept replacing existing (lossy) channel (new one lossy=1)
2019-10-17 20:08:56.221314 7fada5579700  0 -- 172.168.24.42:6822/726639 >> 172.168.24.31:0/1854864375 pipe(0x7fb38662c000 sd=11117 :6822 s=0 pgs=0 cs=0 l=1 c=0x7fb34ed92400).accept replacing existing (lossy) channel (new one lossy=1)
2019-10-17 20:08:58.006022 7fae467ee700  0 -- 172.168.24.42:6822/726639 >> 172.168.24.72:0/336164766 pipe(0x7fb38662e800 sd=11001 :6822 s=0 pgs=0 cs=0 l=1 c=0x7fb3407e4280).accept replacing existing (lossy) channel (new one lossy=1)

3.osd assert info:
2019-10-18 09:35:19.206665 7fab097b4700  0 -- 172.168.24.42:6822/726639 >> 172.168.24.46:0/325169422 pipe(0x7fb39183c800 sd=16160 :6822 s=0 pgs=0 cs=0 l=1 c=0x7fb352f4aa00).accept replacing existing (lossy) channel (new one lossy=1)
2019-10-18 09:35:32.538303 7fac0bb7a700  0 -- 172.168.24.42:6822/726639 >> 172.168.24.83:0/2751373157 pipe(0x7fb38f97f400 sd=15975 :6822 s=0 pgs=0 cs=0 l=1 c=0x7fb3497ce100).accept replacing existing (lossy) channel (new one lossy=1)
2019-10-18 09:35:38.537926 7fb31e8f3700 -1 common/Thread.cc: In function 'void Thread::create(const char*, size_t)' thread 7fb31e8f3700 time 2019-10-18 09:35:38.518253
common/Thread.cc: 160: FAILED assert(ret == 0)

 ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7fb332f638a5]
 2: (Thread::create(char const*, unsigned long)+0xaf) [0x7fb332f4642f]
 3: (SimpleMessenger::add_accept_pipe(int)+0x6f) [0x7fb332f3b42f]
 4: (Accepter::entry()+0x375) [0x7fb333004bc5]
 5: (()+0x7dc5) [0x7fb330e91dc5]
 6: (clone()+0x6d) [0x7fb32f51c73d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

4.During the period, the ceph -s output is HEALTH_OK. But business vms was badly affected. From the log analysis, there was no problem with heartbeat between osd, but there was a problem with connection between osd and client. OSD finally assert because the pthread_create creation failed.

#9 Updated by Darren Wen over 1 year ago

add info:
1. The cluster is large, with hundreds of osds, but only one osd has this phenomenon;
2. No abnormalities were found in the logs of cluster nodes;However, there is a dropped problem with the failed osd node public network card. If it is a network problem, why is there no problem with other osds on this node;

#10 Updated by Lei Liu over 1 year ago

Darren Wen wrote:

add info:
1. The cluster is large, with hundreds of osds, but only one osd has this phenomenon;
2. No abnormalities were found in the logs of cluster nodes;However, there is a dropped problem with the failed osd node public network card. If it is a network problem, why is there no problem with other osds on this node;

Mainly check whether the NIC's dropped package count is increasing. The storage cluster in our production environment has encountered similar problems and also needs to check the status of the switch optical module.

#11 Updated by Darren Wen over 1 year ago

Lei Liu wrote:

Darren Wen wrote:

add info:
1. The cluster is large, with hundreds of osds, but only one osd has this phenomenon;
2. No abnormalities were found in the logs of cluster nodes;However, there is a dropped problem with the failed osd node public network card. If it is a network problem, why is there no problem with other osds on this node;

Mainly check whether the NIC's dropped package count is increasing. The storage cluster in our production environment has encountered similar problems and also needs to check the status of the switch optical module.

Yes, I remember that the problem node NIC's dropped package count is increasing on it, but there is a question, why is there only one osd on this node in this state and no other osds on that node in this state? That's a little bit confusing......

Also available in: Atom PDF