Bug #48785
closedkceph: osd socket connection was closed
0%
Description
Ran against a vstart ceph cluster based the latest upstream code, then mount the kclient, by running the following test script serveral times:
# for i in {1..10000}; do echo $i > file$i.txt; done; for i in {1..10000}; do rm -f file$i.txt; done;
It will be stuck and I can see the following dmesg logs:
<4>[ 618.390897] libceph: osd0 (1)10.72.47.117:6801 socket closed (con state OPEN) <4>[ 618.394087] libceph: osd0 (1)10.72.47.117:6801 socket error on write <6>[ 618.445075] libceph: osd0 down <4>[ 637.991918] libceph: osd2 (1)10.72.47.117:6809 socket closed (con state OPEN) <4>[ 637.992565] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 638.251396] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 638.755376] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 639.923390] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 641.972401] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 645.939306] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 654.067008] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 668.914496] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 684.273877] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 700.145453] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 714.992791] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 730.352353] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 746.223853] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 762.095293] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 776.942800] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 792.302200] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 808.173644] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 824.045071] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 839.404610] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 855.276046] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 871.147474] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 885.995108] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 901.354415] libceph: osd2 (1)10.72.47.117:6809 socket error on write <6>[ 916.711242] libceph: osd1 weight 0x0 (out) <4>[ 917.225958] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 933.097298] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 947.944823] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 963.304275] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 979.176261] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 995.047232] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1010.406943] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1026.278149] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1042.149890] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1058.021045] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1073.380703] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1089.252613] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1105.123450] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1119.970876] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1135.330522] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1151.201866] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1167.073204] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1181.920785] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1197.280300] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1213.151667] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1229.023141] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1244.382513] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1260.253995] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1276.125480] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1290.972967] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1306.332381] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1322.203871] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1338.075323] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1352.923316] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1368.282278] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1384.153756] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1400.025264] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1415.384779] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1431.256134] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1447.127539] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1461.975087] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1477.334520] libceph: osd2 (1)10.72.47.117:6809 socket error on write <4>[ 1493.205982] libceph: osd2 (1)10.72.47.117:6809 socket error on write <6>[ 1507.428136] libceph: osd2 down
Updated by Xiubo Li over 3 years ago
The kceph was also using the lastest upstream code.
Updated by Ilya Dryomov about 3 years ago
Hi Xiubo,
Was messenger failure injection enabled on the server side?
Do you have the corresponding "debug ms" logs from the OSDs?
Updated by Xiubo Li about 3 years ago
Ilya Dryomov wrote:
Hi Xiubo,
Was messenger failure injection enabled on the server side?
I didn't change anything and just setup the cluster by using:
# MDS=3 MON=3 OSD=3 MGR=1 ../src/vstart.sh -n -X -G --msgr1
And the messenger failure injection kept the as default.
Do you have the corresponding "debug ms" logs from the OSDs?
I didn't save the logs, since it was very easy to reproduce, I am not sure whether it was due to run out of the disk space, which is 300G. Many times it could run out of the disk space for my cephfs tests in a loop.
data: pools: 3 pools, 192 pgs objects: 22 objects, 11 KiB usage: 3.0 GiB used, 300 GiB / 303 GiB avail pgs: 192 active+clean
Updated by Xiubo Li over 2 years ago
- Status changed from New to Won't Fix
I can reproduce this locally many times, finally I found all of them were due to running out of the disk, it's not a bug and closing it.