Project

General

Profile

Actions

Bug #48895

closed

ceph: krbd client con get stuck in CON_SOCK_STATE_CONNECTING state

Added by fan chen over 3 years ago. Updated over 2 years ago.

Status:
Duplicate
Priority:
Normal
Assignee:
Category:
libceph
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

hi,

I've been working on krbd client and found an issue that would appear now and then, 
we also have been investigating this issue for a very long time and have no any idea,
wish for someone's help, the kernel client version is 4.18 and the ceph version is L(12.2.12).
At first, I can briefly describe this issue's situation, in our k8s environment, 
krbd devices are used as POD's volumes, when PODs restart, the krbd device must
be remapped to a new PODs, Once in a while we found that the krbd device unmap reported
"Device busy". and this busy status could remain for one or two days, and was not able to
recover automatically, this situation can not be accepted.
After checking the krbd device status, there were many BLOCKED processes there,
we also saw many inflight IOes on that krbd devices by getting /sys/kernel/debug/ceph/.../osdc.

REQUESTS 18 homeless 0
11276122 osd6 7.2264aa9b 7.9b [6,35,29]/17 [6,35,29]/17 e5824 rbd_data.2f8496b8b4567.0000000000000000 0x400024 1 set-alloc-hint,write
11276123 osd6 7.114cb213 7.13 [6,11,29]/17 [6,11,29]/17 e5824 rbd_data.2f8496b8b4567.0000000000000001 0x400024 1 set-alloc-hint,write
11276127 osd6 7.8459213 7.13 [6,11,29]/17 [6,11,29]/17 e5824 rbd_data.2f8496b8b4567.000000000000a822 0x400024 1 set-alloc-hint,write
11276234 osd6 7.ea3de642 7.42 [6,23,29]/17 [6,23,29]/17 e5824 rbd_header.2f8496b8b4567 0x400024 1 watch-watch
11277114 osd6 7.fe0fb09b 7.9b [6,35,29]/17 [6,35,29]/17 e5824 rbd_data.13536b8b4567.00000000000004a0 0x400024 1 set-alloc-hint,write
[...]

and survey the syslog for OSD6 we also could find some pervious information like:

Aug 20 20:42:10 node-2 kernel: libceph: osd6 up
Aug 20 20:44:36 node-2 kernel: libceph: osd6 down
Aug 20 20:45:12 node-2 kernel: libceph: osd6 up
Aug 20 20:48:01 node-2 kernel: libceph: osd6 down
Aug 20 20:48:05 node-2 kernel: libceph: osd6 up
Aug 20 20:49:55 node-2 kernel: libceph: osd6 192.168.30.6:6804 socket closed (con state OPEN)
Aug 20 20:55:23 node-2 kernel: libceph: osd6 192.168.30.6:6804 socket closed (con state CONNECTING)
Aug 20 21:01:26 node-2 kernel: libceph: osd6 192.168.30.6:6804 socket closed (con state OPEN)
Aug 20 21:01:44 node-2 kernel: libceph: osd6 down
Aug 20 21:17:51 node-2 kernel: libceph: osd6 up
Aug 20 21:20:03 node-2 kernel: libceph: osd6 192.168.30.6:6804 socket closed (con state CONNECTING)
Aug 20 21:22:16 node-2 kernel: libceph: osd6 192.168.30.6:6804 socket closed (con state CONNECTING)

In addition, we can work around this issue by restart OSD.6.

Luckily, we also had dumped a kcore, we found that the krbd client connection got stuck in CON_SOCK_STATE_CONNECTING state:

crash> tree -t rbtree -o ceph_osd.o_node -N 0xffff8683cbe88818 -s ceph_osd.o_osd,o_con

fff87830f7d9800
o_osd = 6
o_con = {
private = 0xffff87830f7d9800,
ops = 0xffff000009e717a8 <osd_con_ops>,
msgr = 0xffff8483c2e34080,
sock_state = {
counter = 3
},
sock = 0xffff8383a8eec340,
peer_addr = {
type = 0,
nonce = 15937,
in_addr = {
ss_family = 2,
__data = "\032\224\300\250\036\006\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
}
},
peer_addr_for_me = {
type = 0,
nonce = 0,
in_addr = {
ss_family = 0,
__data = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
}
},
flags = 2,
state = 3,

8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8<

crash> tree -t rbtree -o ceph_osd.o_node -N 0xffff8683cbe88818 -s ceph_osd.o_osd,o_con.in_banner
ffff8683cbe88800
o_osd = 8
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff87830f7d9800
o_osd = 6
o_con.in_banner = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff8783d47be000
o_osd = 4
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff8783d943d800
o_osd = 7
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff87830471f000
o_osd = 13
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff8783d9289800
o_osd = 11
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff878314a6b000
o_osd = 12
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff878314a62800
o_osd = 14
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",

we could confirm that the con->state was getting stuck in CON_STATE_CONNECTING state,
In the flowchart refs form CEPH source code:

send_server_banner                             send_client_banner
        |                                              |
v v
wait_client_banner wait_server_banner | | | v
v handle_server_banner_and_identify
wait_connect_message <---------\ | | | | v | wait_connect_message_auth | send_connect_message <----------\ | | | | |
v v | | |
handle_connect_message_2 | v | | | | wait_connect_reply |
v v | | | |
replace -> send_connect_message_reply | V | | | wait_connect_reply_auth | | | | |
v v v |
open ---\ handle_connect_reply_2 --------/

then we can suspect that the ceph krbd client hadn't got any "banner" messages from
OSD server, the ceph client connection to OSD.6 was not able to get connected all the time.
But from the TCP sock connection state we could ensure that the sock connection was OK.

Maybe this issue related to the network subsystem, but I am not familiar with the
network subsystem, I hope someone can give some advices for this issue, thanks in advance.

Related issues 1 (1 open0 closed)

Is duplicate of Linux kernel client - Bug #7139: dead tcp connections during connection negotiation may lead to hangsIn ProgressIlya Dryomov

Actions
Actions #1

Updated by fan chen over 3 years ago

hi,

I've been working on krbd client and found an issue that would appear now and then,
we also have been investigating this issue for a very long time and have no any idea,
wish for someone's help, the kernel client version is 4.18 and the ceph version is L(12.2.12).

At first, I can briefly describe this issue's situation, in our k8s environment,
krbd devices are used as POD's volumes, when PODs restart, the krbd device must
be remapped to a new PODs, Once in a while we found that the krbd device unmap reported
"Device busy". and this busy status could remain for one or two days, and was not able to
recover automatically, this situation can not be accepted.

After checking the krbd device status, there were many BLOCKED processes there,
we also saw many inflight IOes on that krbd devices by getting /sys/kernel/debug/ceph/.../osdc.
```
REQUESTS 18 homeless 0
11276122 osd6 7.2264aa9b 7.9b [6,35,29]/17 [6,35,29]/17 e5824 rbd_data.2f8496b8b4567.0000000000000000 0x400024 1 set-alloc-hint,write
11276123 osd6 7.114cb213 7.13 [6,11,29]/17 [6,11,29]/17 e5824 rbd_data.2f8496b8b4567.0000000000000001 0x400024 1 set-alloc-hint,write
11276127 osd6 7.8459213 7.13 [6,11,29]/17 [6,11,29]/17 e5824 rbd_data.2f8496b8b4567.000000000000a822 0x400024 1 set-alloc-hint,write
11276234 osd6 7.ea3de642 7.42 [6,23,29]/17 [6,23,29]/17 e5824 rbd_header.2f8496b8b4567 0x400024 1 watch-watch
11277114 osd6 7.fe0fb09b 7.9b [6,35,29]/17 [6,35,29]/17 e5824 rbd_data.13536b8b4567.00000000000004a0 0x400024 1 set-alloc-hint,write
[...]
```

and survey the syslog for OSD6 we also could find some pervious information like:
```
Aug 20 20:42:10 node-2 kernel: libceph: osd6 up
Aug 20 20:44:36 node-2 kernel: libceph: osd6 down
Aug 20 20:45:12 node-2 kernel: libceph: osd6 up
Aug 20 20:48:01 node-2 kernel: libceph: osd6 down
Aug 20 20:48:05 node-2 kernel: libceph: osd6 up
Aug 20 20:49:55 node-2 kernel: libceph: osd6 192.168.30.6:6804 socket closed (con state OPEN)
Aug 20 20:55:23 node-2 kernel: libceph: osd6 192.168.30.6:6804 socket closed (con state CONNECTING)
Aug 20 21:01:26 node-2 kernel: libceph: osd6 192.168.30.6:6804 socket closed (con state OPEN)
Aug 20 21:01:44 node-2 kernel: libceph: osd6 down
Aug 20 21:17:51 node-2 kernel: libceph: osd6 up
Aug 20 21:20:03 node-2 kernel: libceph: osd6 192.168.30.6:6804 socket closed (con state CONNECTING)
Aug 20 21:22:16 node-2 kernel: libceph: osd6 192.168.30.6:6804 socket closed (con state CONNECTING)
```
In addition, we can work around this issue by restart OSD.6.

Luckily, we also had dumped a kcore, we found that the krbd client connection got stuck in CON_SOCK_STATE_CONNECTING state:
```
crash> tree -t rbtree -o ceph_osd.o_node -N 0xffff8683cbe88818 -s ceph_osd.o_osd,o_con

fff87830f7d9800
o_osd = 6
o_con = {
private = 0xffff87830f7d9800,
ops = 0xffff000009e717a8 <osd_con_ops>,
msgr = 0xffff8483c2e34080,
sock_state = {
counter = 3
},
sock = 0xffff8383a8eec340,
peer_addr = {
type = 0,
nonce = 15937,
in_addr = {
ss_family = 2,
_data = "\032\224\300\250\036\006\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
}
},
peer_addr_for_me = {
type = 0,
nonce = 0,
in_addr = {
ss_family = 0,
_data = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
}
},
flags = 2,
state = 3,

8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8<

crash> tree -t rbtree -o ceph_osd.o_node -N 0xffff8683cbe88818 -s ceph_osd.o_osd,o_con.in_banner
ffff8683cbe88800
o_osd = 8
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff87830f7d9800
o_osd = 6
o_con.in_banner = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff8783d47be000
o_osd = 4
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff8783d943d800
o_osd = 7
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff87830471f000
o_osd = 13
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff8783d9289800
o_osd = 11
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff878314a6b000
o_osd = 12
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff878314a62800
o_osd = 14
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
```

we could confirm that the con->state was getting stuck in CON_STATE_CONNECTING state,
In the flowchart refs form CEPH source code:

```
send_server_banner send_client_banner | |
v v
wait_client_banner wait_server_banner | | | v
v handle_server_banner_and_identify
wait_connect_message <---------\ | | | | v | wait_connect_message_auth | send_connect_message <----------\ | | | | |
v v | | |
handle_connect_message_2 | v | | | | wait_connect_reply |
v v | | | |
replace -> send_connect_message_reply | V | | | wait_connect_reply_auth | | | | |
v v v |
open ---\ handle_connect_reply_2 --------/

```
then we can suspect that the ceph krbd client hadn't got any "banner" messages from
OSD server, the ceph client connection to OSD.6 was not able to get connected all the time.
But from the TCP sock connection state we could ensure that the sock connection was OK.

Maybe this issue related to the network subsystem, but I am not familiar with the
network subsystem, I hope someone can give some advices for this issue, thanks in advance.

Actions #2

Updated by fan chen over 3 years ago

I can imagine that we maybe need a heartbeat/keepalive mechanism while establishing ceph connection,
just liking osd keepalive after ceph connection established. how do you think?

Thanks,
Chen

fan chen wrote:

hi,

I've been working on krbd client and found an issue that would appear now and then,
we also have been investigating this issue for a very long time and have no any idea,
wish for someone's help, the kernel client version is 4.18 and the ceph version is L(12.2.12).

At first, I can briefly describe this issue's situation, in our k8s environment,
krbd devices are used as POD's volumes, when PODs restart, the krbd device must
be remapped to a new PODs, Once in a while we found that the krbd device unmap reported
"Device busy". and this busy status could remain for one or two days, and was not able to
recover automatically, this situation can not be accepted.

After checking the krbd device status, there were many BLOCKED processes there,
we also saw many inflight IOes on that krbd devices by getting /sys/kernel/debug/ceph/.../osdc.

REQUESTS 18 homeless 0
11276122 osd6 7.2264aa9b 7.9b [6,35,29]/17 [6,35,29]/17 e5824 rbd_data.2f8496b8b4567.0000000000000000 0x400024 1 set-alloc-hint,write
11276123 osd6 7.114cb213 7.13 [6,11,29]/17 [6,11,29]/17 e5824 rbd_data.2f8496b8b4567.0000000000000001 0x400024 1 set-alloc-hint,write
11276127 osd6 7.8459213 7.13 [6,11,29]/17 [6,11,29]/17 e5824 rbd_data.2f8496b8b4567.000000000000a822 0x400024 1 set-alloc-hint,write
11276234 osd6 7.ea3de642 7.42 [6,23,29]/17 [6,23,29]/17 e5824 rbd_header.2f8496b8b4567 0x400024 1 watch-watch
11277114 osd6 7.fe0fb09b 7.9b [6,35,29]/17 [6,35,29]/17 e5824 rbd_data.13536b8b4567.00000000000004a0 0x400024 1 set-alloc-hint,write
[...]

and survey the syslog for OSD6 we also could find some pervious information like:

Aug 20 20:42:10 node-2 kernel: libceph: osd6 up
Aug 20 20:44:36 node-2 kernel: libceph: osd6 down
Aug 20 20:45:12 node-2 kernel: libceph: osd6 up
Aug 20 20:48:01 node-2 kernel: libceph: osd6 down
Aug 20 20:48:05 node-2 kernel: libceph: osd6 up
Aug 20 20:49:55 node-2 kernel: libceph: osd6 192.168.30.6:6804 socket closed (con state OPEN)
Aug 20 20:55:23 node-2 kernel: libceph: osd6 192.168.30.6:6804 socket closed (con state CONNECTING)
Aug 20 21:01:26 node-2 kernel: libceph: osd6 192.168.30.6:6804 socket closed (con state OPEN)
Aug 20 21:01:44 node-2 kernel: libceph: osd6 down
Aug 20 21:17:51 node-2 kernel: libceph: osd6 up
Aug 20 21:20:03 node-2 kernel: libceph: osd6 192.168.30.6:6804 socket closed (con state CONNECTING)
Aug 20 21:22:16 node-2 kernel: libceph: osd6 192.168.30.6:6804 socket closed (con state CONNECTING)

In addition, we can work around this issue by restart OSD.6.

Luckily, we also had dumped a kcore, we found that the krbd client connection got stuck in CON_SOCK_STATE_CONNECTING state:

crash> tree -t rbtree -o ceph_osd.o_node -N 0xffff8683cbe88818 -s ceph_osd.o_osd,o_con

fff87830f7d9800
o_osd = 6
o_con = {
private = 0xffff87830f7d9800,
ops = 0xffff000009e717a8 <osd_con_ops>,
msgr = 0xffff8483c2e34080,
sock_state = {
counter = 3
},
sock = 0xffff8383a8eec340,
peer_addr = {
type = 0,
nonce = 15937,
in_addr = {
ss_family = 2,
_data = "\032\224\300\250\036\006\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
}
},
peer_addr_for_me = {
type = 0,
nonce = 0,
in_addr = {
ss_family = 0,
_data = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
}
},
flags = 2,
state = 3,

8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8<

crash> tree -t rbtree -o ceph_osd.o_node -N 0xffff8683cbe88818 -s ceph_osd.o_osd,o_con.in_banner
ffff8683cbe88800
o_osd = 8
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff87830f7d9800
o_osd = 6
o_con.in_banner = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff8783d47be000
o_osd = 4
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff8783d943d800
o_osd = 7
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff87830471f000
o_osd = 13
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff8783d9289800
o_osd = 11
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff878314a6b000
o_osd = 12
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
ffff878314a62800
o_osd = 14
o_con.in_banner = "ceph v027\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",

we could confirm that the con->state was getting stuck in CON_STATE_CONNECTING state,
In the flowchart refs form CEPH source code:

send_server_banner send_client_banner | |
v v
wait_client_banner wait_server_banner | | | v
v handle_server_banner_and_identify
wait_connect_message <---------\ | | | | v | wait_connect_message_auth | send_connect_message <----------\ | | | | |
v v | | |
handle_connect_message_2 | v | | | | wait_connect_reply |
v v | | | |
replace -> send_connect_message_reply | V | | | wait_connect_reply_auth | | | | |
v v v |
open ---\ handle_connect_reply_2 --------/

then we can suspect that the ceph krbd client hadn't got any "banner" messages from
OSD server, the ceph client connection to OSD.6 was not able to get connected all the time.
But from the TCP sock connection state we could ensure that the sock connection was OK.

Maybe this issue related to the network subsystem, but I am not familiar with the
network subsystem, I hope someone can give some advices for this issue, thanks in advance.

Actions #3

Updated by Ilya Dryomov about 3 years ago

  • Category changed from rbd to libceph
  • Assignee set to Ilya Dryomov

But from the TCP sock connection state we could ensure that the sock connection was OK.

Hi Fan,

Did you check both sides of the connection, i.e. both the kernel client socket and the corresponding ceph-osd socket? I'm wondering if the connection was half-open in RFC 793 sense. There is an old ticket [1] for that which I had a patch for (setting up a simple timeout mechanism IIRC) but it never merged.

[1] https://tracker.ceph.com/issues/7139

Actions #4

Updated by fan chen about 3 years ago

Ilya Dryomov wrote:

But from the TCP sock connection state we could ensure that the sock connection was OK.

Hi Fan,

Did you check both sides of the connection, i.e. both the kernel client socket and the corresponding ceph-osd socket? I'm wondering if the connection was half-open in RFC 793 sense. There is an old ticket [1] for that which I had a patch for (setting up a simple timeout mechanism IIRC) but it never merged.

[1] https://tracker.ceph.com/issues/7139

Unluckly, I didn't check that, seems the situation is not easily reproduced, Next time if it reproduces, I will check the both sides of the connection and report it here.
Do you have any suggestion to reproduce the situaction? we can do some efforts on that.

Actions #5

Updated by Ilya Dryomov about 3 years ago

No. If it reproduces, grab a full crashdump preferably including userspace state.

While tracking down the exact reason the socket got busted would be nice, it won't change the fact that we don't have any keepalive mechanism in place for early stages of the session negotiation. I'll mark this related to #7139 and prioritize it.

Actions #6

Updated by Ilya Dryomov about 3 years ago

  • Related to Bug #7139: dead tcp connections during connection negotiation may lead to hangs added
Actions #7

Updated by fan chen about 3 years ago

Today it reproduced, exactly it was a half-connection state.

the rbd client got stuck on node-3, and the connected OSD server was osd.11 on node-1.

[root@node-3 ~]# cat /sys/kernel/debug/ceph/853a3b62-bef9-4c57-84ae-f84e9db8ef89.client4367/osdc
REQUESTS 36 homeless 0
3020552 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020553 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020554 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020555 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020556 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020557 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020558 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020559 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020560 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020561 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020562 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020563 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020564 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020565 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020566 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020567 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020568 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020569 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020570 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020571 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020572 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020573 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020574 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020575 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020576 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020577 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020578 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020579 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020580 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020581 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020582 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020583 osd11 10.1988709f 10.9f [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12296b8b4567.0000000000000024 0x40002c 3 set-alloc-hint,write
3020584 osd11 10.d2088312 10.12 [11,3,7]/11 [11,3,7]/11 e139 rbd_data.12296b8b4567.0000000000000477 0x40002c 3 set-alloc-hint,write
3020653 osd11 10.37443fa 10.fa [11,3,7]/11 [11,3,7]/11 e139 rbd_data.12296b8b4567.0000000000000001 0x40002c 3 set-alloc-hint,write
3021076 osd11 10.5cb5723d 10.3d [11,3,7]/11 [11,3,7]/11 e139 rbd_data.12726b8b4567.000000000000a02a 0x400024 1 set-alloc-hint,write
3021357 osd11 10.b4b42743 10.43 [11,7,3]/11 [11,7,3]/11 e139 rbd_data.12726b8b4567.000000000000a025 0x400024 1 set-alloc-hint,write
LINGER REQUESTS
18446462598732840969 osd0 3.53a9cced 3.2d [0,4,8]/0 [0,4,8]/0 e139 rbd_header.10a66b8b4567 0x20 1 WC/0
18446462598732840974 osd1 3.ae9906af 3.2f [1,10,5]/1 [1,10,5]/1 e139 rbd_header.10ef6b8b4567 0x20 0 WC/0
18446462598732840975 osd3 10.b256f18e 10.8e [3,7,11]/3 [3,7,11]/3 e139 rbd_header.12726b8b4567 0x20 1 WC/0
18446462598732840976 osd3 10.6a8e0e39 10.39 [3,7,11]/3 [3,7,11]/3 e139 rbd_header.12296b8b4567 0x20 1 WC/0
18446462598732840962 osd4 3.fd0ff947 3.7 [4,1,9]/4 [4,1,9]/4 e139 rbd_header.11376b8b4567 0x20 10 WC/0
18446462598732840964 osd4 3.5f481787 3.7 [4,1,9]/4 [4,1,9]/4 e139 rbd_header.10fb6b8b4567 0x20 10 WC/0
18446462598732840971 osd8 3.cec10625 3.25 [8,2,5]/8 [8,2,5]/8 e139 rbd_header.13a06b8b4567 0x20 0 WC/0
BACKOFFS

[root@node-3 ~]# cat /sys/kernel/debug/ceph/853a3b62-bef9-4c57-84ae-f84e9db8ef89.client4367/osdmap
epoch 139 barrier 0 flags 0x40189000
pool 1 '.rgw.root' type 1 size 3 min_size 2 pg_num 8 pg_num_mask 7 flags 0x1 lfor 0 read_tier -1 write_tier -1
pool 2 'volumes' type 1 size 3 min_size 2 pg_num 256 pg_num_mask 255 flags 0x1 lfor 0 read_tier -1 write_tier -1
pool 3 'rbd' type 1 size 3 min_size 1 pg_num 64 pg_num_mask 63 flags 0x1 lfor 0 read_tier -1 write_tier -1
pool 4 'metrics' type 1 size 3 min_size 2 pg_num 8 pg_num_mask 7 flags 0x1 lfor 0 read_tier -1 write_tier -1
pool 5 'backups' type 1 size 3 min_size 2 pg_num 8 pg_num_mask 7 flags 0x1 lfor 0 read_tier -1 write_tier -1
pool 6 'compute' type 1 size 3 min_size 2 pg_num 64 pg_num_mask 63 flags 0x1 lfor 0 read_tier -1 write_tier -1
pool 7 'default.rgw.control' type 1 size 3 min_size 2 pg_num 8 pg_num_mask 7 flags 0x1 lfor 0 read_tier -1 write_tier -1
pool 8 'default.rgw.meta' type 1 size 3 min_size 2 pg_num 8 pg_num_mask 7 flags 0x1 lfor 0 read_tier -1 write_tier -1
pool 9 'default.rgw.log' type 1 size 3 min_size 2 pg_num 8 pg_num_mask 7 flags 0x1 lfor 0 read_tier -1 write_tier -1
pool 10 'ssdpool' type 1 size 3 min_size 1 pg_num 256 pg_num_mask 255 flags 0x1 lfor 0 read_tier -1 write_tier -1
pool 11 'default.rgw.buckets.index' type 1 size 3 min_size 2 pg_num 8 pg_num_mask 7 flags 0x1 lfor 0 read_tier -1 write_tier -1
pool 12 'default.rgw.buckets.data' type 1 size 3 min_size 2 pg_num 8 pg_num_mask 7 flags 0x1 lfor 0 read_tier -1 write_tier -1
osd0 192.168.30.3:6802 100% (exists, up) 100%
osd1 192.168.30.3:6804 100% (exists, up) 100%
osd2 192.168.30.3:6800 100% (exists, up) 100%
osd3 192.168.30.3:6806 100% (exists, up) 100%
osd4 192.168.30.4:6800 100% (exists, up) 100%
osd5 192.168.30.4:6804 100% (exists, up) 100%
osd6 192.168.30.4:6802 100% (exists, up) 100%
osd7 192.168.30.4:6806 100% (exists, up) 100%
osd8 192.168.30.2:6803 100% (exists, up) 100%
osd9 192.168.30.2:6801 100% (exists, up) 100%
osd10 192.168.30.2:6805 100% (exists, up) 100%
osd11 192.168.30.2:6807 100% (exists, up) 100%

then we found that the state of connections did not match on node-1 and node-3:

[root@node-3 ~]# dmesg -T | grep libceph | grep osd11
[Wed Mar 17 20:22:34 2021] libceph: osd11 weight 0x10000 (in)
[Wed Mar 17 20:22:34 2021] libceph: osd11 up
[Thu Mar 18 15:07:31 2021] libceph: osd11 192.168.30.2:6807 socket closed (con state OPEN)
[Thu Mar 18 15:09:13 2021] libceph: osd11 192.168.30.2:6807 socket closed (con state OPEN)

[root@node-3 ~]# netstat -nap | grep 192.168.30.2:6807 | grep ESTAB | grep "ESTABLISHED -"
tcp 0 0 192.168.30.4:51798 192.168.30.2:6807 ESTABLISHED -

[root@node-1 ~]# netstat -anp | grep 192.168.30.2:6807
tcp 0 0 192.168.30.2:6807 0.0.0.0:* LISTEN 34880/ceph-osd
tcp 0 0 192.168.30.2:6807 192.168.30.4:47636 ESTABLISHED 34880/ceph-osd
tcp 0 0 192.168.30.2:6807 10.232.1.57:58764 ESTABLISHED 34880/ceph-osd
tcp 0 0 192.168.30.2:6807 192.168.30.3:34068 ESTABLISHED 34880/ceph-osd
tcp 0 9 192.168.30.2:47522 192.168.30.2:6807 ESTABLISHED -
tcp 0 0 192.168.30.2:6807 192.168.30.3:47690 ESTABLISHED 34880/ceph-osd
tcp 0 0 192.168.30.2:6807 192.168.30.3:57292 ESTABLISHED 34880/ceph-osd
tcp 0 0 192.168.30.2:6807 192.168.30.2:47522 ESTABLISHED 34880/ceph-osd
tcp 0 0 192.168.30.2:6807 192.168.30.3:51334 ESTABLISHED 34880/ceph-osd

we cound not find the remote connection of the "192.168.30.4:51798" on node-1.

Actions #8

Updated by Ilya Dryomov about 3 years ago

Thanks for confirming! I'll try to push a patch for kernel 5.12. The backport to stable kernels won't be straightforward because the messenger has seen some major changes in 5.11, but it's definitely in the cards.

Actions #9

Updated by Ilya Dryomov about 3 years ago

  • Related to deleted (Bug #7139: dead tcp connections during connection negotiation may lead to hangs)
Actions #10

Updated by Ilya Dryomov about 3 years ago

  • Is duplicate of Bug #7139: dead tcp connections during connection negotiation may lead to hangs added
Actions #11

Updated by Ilya Dryomov about 3 years ago

  • Status changed from New to Duplicate
Actions #12

Updated by fan chen about 3 years ago

Ilya Dryomov wrote:

Thanks for confirming! I'll try to push a patch for kernel 5.12. The backport to stable kernels won't be straightforward because the messenger has seen some major changes in 5.11, but it's definitely in the cards.

Got it, thanks.

Actions #13

Updated by fan chen about 3 years ago

fan chen wrote:

Ilya Dryomov wrote:

Thanks for confirming! I'll try to push a patch for kernel 5.12. The backport to stable kernels won't be straightforward because the messenger has seen some major changes in 5.11, but it's definitely in the cards.

Got it, thanks.

Hi Ilya, If you sent out the patch, pls give me the patch link, thanks.

Actions #14

Updated by fan chen almost 3 years ago

hi Ilya,
the v5.12.9 kernel has been released already. were those patches pushed?

Actions #15

Updated by fan chen almost 3 years ago

hi Ilya, the v5.12.9 kernel has been released already. were those patches pushed?

fan chen wrote:

fan chen wrote:

Ilya Dryomov wrote:

Thanks for confirming! I'll try to push a patch for kernel 5.12. The backport to stable kernels won't be straightforward because the messenger has seen some major changes in 5.11, but it's definitely in the cards.

Got it, thanks.

Hi Ilya, If you sent out the patch, pls give me the patch link, thanks.

Actions #16

Updated by Ilya Dryomov almost 3 years ago

Hi Fan,

Sorry, not yet. I hit a locking issue which I didn't have time to investigate and rather than risking the stability of the messenger put the patch on hold.

I'll try to finish it in the next couple of weeks.

Actions #17

Updated by fan chen almost 3 years ago

Ilya Dryomov wrote:

Hi Fan,

Sorry, not yet. I hit a locking issue which I didn't have time to investigate and rather than risking the stability of the messenger put the patch on hold.

I'll try to finish it in the next couple of weeks.

Got it, I would wait for you :).

Actions #18

Updated by fan chen over 2 years ago

fan chen wrote:

Ilya Dryomov wrote:

Hi Fan,

Sorry, not yet. I hit a locking issue which I didn't have time to investigate and rather than risking the stability of the messenger put the patch on hold.

I'll try to finish it in the next couple of weeks.

Got it, I would wait for you :).

Hi Ilya, do you have any update about this?

Actions

Also available in: Atom PDF