Bug #23329
openasync messager lost session when IO performance testing, not recover util to restart
0%
Description
server is listening on port 6830
client output no reply from server.
from ss -np check, the tcp connect is not exits.
why async can't to recover it util to restart.
ceph version 12.2.2
centos 7.3
default ceph.conf
ms_type = async+posix
2018-03-13 14:58:14.647389 7f9c9c55c700 1 -- 10.0.30.26:6848/1004050 --> 10.0.30.27:0/4036 -- osd_ping(ping_reply e339 stamp 2018-03-13 14:58:14.642824) v4 -- 0x7f9cbdc68000 con 0
2018-03-13 14:58:14.647435 7f9c9cd5d700 1 -- 10.0.30.26:6846/1004050 <== osd.44 10.0.30.27:0/4036 26192 ==== osd_ping(ping e339 stamp 2018-03-13 14:58:14.642824) v4 ==== 2004+0+0 (230542212 0 0) 0x7f9cca254200 con 0x7f9cba551800
2018-03-13 14:58:14.647473 7f9c9cd5d700 1 -- 10.0.30.26:6846/1004050 --> 10.0.30.27:0/4036 -- osd_ping(ping_reply e339 stamp 2018-03-13 14:58:14.642824) v4 -- 0x7f9cd11f2c00 con 0
2018-03-13 14:58:14.713552 7f9c9c55c700 1 -- 10.0.30.26:6846/1004050 <== osd.19 10.0.30.25:0/3012 26102 ==== osd_ping(ping e339 stamp 2018-03-13 14:58:14.707686) v4 ==== 2004+0+0 (2179933530 0 0) 0x7f9cd7427e00 con 0x7f9cbb2e4000
2018-03-13 14:58:14.713469 7f9c9cd5d700 1 -- 10.0.30.26:6848/1004050 <== osd.19 10.0.30.25:0/3012 26102 ==== osd_ping(ping e339 stamp 2018-03-13 14:58:14.707686) v4 ==== 2004+0+0 (2179933530 0 0) 0x7f9cbdc6ba00 con 0x7f9cba458000
2018-03-13 14:58:14.713627 7f9c9c55c700 1 -- 10.0.30.26:6846/1004050 --> 10.0.30.25:0/3012 -- osd_ping(ping_reply e339 stamp 2018-03-13 14:58:14.707686) v4 -- 0x7f9cbdc68000 con 0
2018-03-13 14:58:14.713786 7f9c9cd5d700 1 -- 10.0.30.26:6848/1004050 --> 10.0.30.25:0/3012 -- osd_ping(ping_reply e339 stamp 2018-03-13 14:58:14.707686) v4 -- 0x7f9cd11f2c00 con 0
2018-03-13 14:58:14.714629 7f9c9d55e700 1 -- 10.0.30.26:6846/1004050 <== osd.0 10.0.30.25:0/3016 26185 ==== osd_ping(ping e339 stamp 2018-03-13 14:58:14.708800) v4 ==== 2004+0+0 (3781320873 0 0) 0x7f9ce455da00 con 0x7f9cba561000
2018-03-13 14:58:14.714679 7f9c9d55e700 1 -- 10.0.30.26:6846/1004050 --> 10.0.30.25:0/3016 -- osd_ping(ping_reply e339 stamp 2018-03-13 14:58:14.708800) v4 -- 0x7f9cc09afc00 con 0
2018-03-13 14:58:14.714821 7f9c9cd5d700 1 -- 10.0.30.26:6848/1004050 <== osd.0 10.0.30.25:0/3016 26185 ==== osd_ping(ping e339 stamp 2018-03-13 14:58:14.708800) v4 ==== 2004+0+0 (3781320873 0 0) 0x7f9cca254200 con 0x7f9cba698800
2018-03-13 14:58:14.714863 7f9c9cd5d700 1 -- 10.0.30.26:6848/1004050 --> 10.0.30.25:0/3016 -- osd_ping(ping_reply e339 stamp 2018-03-13 14:58:14.708800) v4 -- 0x7f9cd11f2c00 con 0
2018-03-13 14:58:14.766279 7f9c9d55e700 1 -- 10.0.30.26:6846/1004050 <== osd.46 10.0.30.27:0/4040 26123 ==== osd_ping(ping e339 stamp 2018-03-13 14:58:14.761398) v4 ==== 2004+0+0 (3946795183 0 0) 0x7f9cc76eb400 con 0x7f9cba553000
2018-03-13 14:58:14.766317 7f9c9d55e700 1 -- 10.0.30.26:6846/1004050 --> 10.0.30.27:0/4040 -- osd_ping(ping_reply e339 stamp 2018-03-13 14:58:14.761398) v4 -- 0x7f9cc09afc00 con 0
2018-03-13 14:58:14.766445 7f9c9c55c700 1 -- 10.0.30.26:6848/1004050 <== osd.46 10.0.30.27:0/4040 26123 ==== osd_ping(ping e339 stamp 2018-03-13 14:58:14.761398) v4 ==== 2004+0+0 (3946795183 0 0) 0x7f9cd9eef600 con 0x7f9cba8e9800
2018-03-13 14:58:14.766496 7f9c9c55c700 1 -- 10.0.30.26:6848/1004050 --> 10.0.30.27:0/4040 -- osd_ping(ping_reply e339 stamp 2018-03-13 14:58:14.761398) v4 -- 0x7f9cbdc68000 con 0
netstat -tanp|grep 4026 |grep 6830
tcp 0 0 10.0.30.27:6830 0.0.0.0:* LISTEN 4026/ceph-osd
tcp 0 0 10.0.30.27:6830 10.0.30.25:55486 ESTABLISHED 4026/ceph-osd
Updated by Greg Farnum about 6 years ago
Can you explain the issue in more detail? Did the accepter thread break or something?
Updated by Greg Farnum about 5 years ago
- Project changed from RADOS to Messengers
- Category deleted (
Administration/Usability)